JPH08314975A

JPH08314975A - Information retrieving device

Info

Publication number: JPH08314975A
Application number: JP7145213A
Authority: JP
Inventors: Takamasa Koyama; 隆正小山; Tetsuya Kinoshita; 哲也木下; Hirofumi Shinoki; 裕文篠木; Chuichi Kikuchi; 忠一菊池
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-05-22
Filing date: 1995-05-22
Publication date: 1996-11-29
Anticipated expiration: 2020-05-18
Also published as: JP3649472B2

Abstract

PURPOSE: To provide an information retrieving device capable of retrieving data including a key word at a high speed. CONSTITUTION: The information retrieving device retrieving data including the key word is provided with a character information extracting means 33 calculating the number of the appearing times of characters constituting retrieving object data, a retrieving file generation means 35 preparing a retrieving file by coordinating the set of adjacent characters in retrieving object data and the appearing frequency pattern obtained by combining each number of the appearing times of this character and a retrieving means 39 retrieving the appearing frequency pattern corresponding to the set of the adjacent characters in the key word from the retrieving file and collating the obtained appearing frequency data to retrieve retrieving object data including the key word. As this device uses a system utilizing the appearing frequency, the size of the retrieving file is reduced compared with a conventional system which determines the existing place of the pattern by the position of the character. In addition, retrieval is quickened by disposing the appearing frequency in the retrieving file in a rising or falling order.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベースからキー
ワードで指定された文字列を含むデータを検索する情報
検索装置に関し、特に、検索処理及びデータベースへの
データの追加、更新、削除に関する処理を高速で行なえ
るようにしたものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval apparatus for retrieving data containing a character string designated by a keyword from a database, and particularly to high-speed retrieval processing and processing for adding, updating, and deleting data in the database. It can be done in.

【０００２】[0002]

【従来の技術】近年、情報検索では、利用者が思いつい
た言葉をキーワードとして入力すると、データベースに
登録されたデータを全文検索して、そのキーワードの文
字列を含むデータを自動検索するシステムが注目を集め
ている。特に、リレーショナルデータベース管理システ
ム（ＲＤＢＭＳ）などではその製品化が図られており、
多方面での応用が広まっている。2. Description of the Related Art In recent years, in information retrieval, when a user inputs a word that he / she came up with as a keyword, full-text retrieval of data registered in a database and automatic retrieval of data including a character string of the keyword are attracting attention. Are gathering. Especially, the relational database management system (RDBMS) is being commercialized,
Applications in many fields are widespread.

【０００３】この種の情報検索装置は、例えば特開昭６
３−１５５３２４、特開平１−２８２６３５、特開平２
−３０２８６９、特開平３−１１３６２６に示されてい
る。特開昭６３−１５５３２４（情報検索装置）では、
操作者が指定した検索キ−ワ−ドに対して、登録データ
を１サイクル毎に１ワ−ドづつシフトさせて比較し情報
検索を行なっている。また、特開平１−２８２６３５
（索引保守方式）では、登録データの検索に使用する索
引として、主索引と比較的小さな副索引とを保持し、追
加する登録データの索引レコ−ドを副索引に加えること
により、登録デ−タの更新・追加・削除を高速化してい
る。また、特開平３−１１３６２６（バツフア制御方式
及び装置）では、バツフア内のデ−タの検索と、補助記
憶装置からバツフアへのデ−タの転送とを並行して実行
することにより、タスクのスル−プツトの向上を図って
いる。また、特開平２−３０２８６９（フアイル編集方
式）では、メモリ領域の無駄を無くすとともに、メモリ
上の処理対象デ−タを高速に特定できるようにするた
め、メモリの空きエリアポインタを保持するテ−ブルを
作成し、このテ−ブルを参照してメモリ上の処理対象文
字列デ−タを特定している。An information retrieval device of this type is disclosed in, for example, Japanese Patent Laid-Open No.
3-155324, JP-A 1-282635, JP-A-2
-302869, Japanese Patent Laid-Open No. 3-113626. In Japanese Patent Laid-Open No. 63-155324 (information retrieval device),
Information is retrieved by shifting the registered data by one word for each cycle with respect to the retrieval keyword designated by the operator. In addition, Japanese Patent Laid-Open No. 1-282635
In the (index maintenance method), a main index and a comparatively small sub-index are retained as indexes used for searching registered data, and the index record of additional registered data is added to the sub-index to register the registered data. It speeds up the update / addition / deletion of data. Further, in Japanese Patent Laid-Open No. 3-113626 (buffer control system and device), a task search is performed in parallel with the retrieval of data in the buffer and the transfer of the data from the auxiliary storage device to the buffer. We are trying to improve the throughput. Further, in Japanese Patent Laid-Open No. 2-302869 (file editing method), in order to eliminate waste of the memory area and to identify the data to be processed on the memory at high speed, a table holding a free area pointer of the memory is provided. A table is created and the character string data to be processed on the memory is specified by referring to this table.

【０００４】これらの４つの従来の情報検索装置では、
記憶領域にデータ用領域とデータ更新用領域とをレコー
ド番号単位で確保し、データ更新時にはデータ更新用領
域にデータを蓄えて、データ用領域とデータ更新用領域
とを置き換える方式（特開昭６３−１５５４３２４、特
開平１−２８２６３５、特開平２−３０２８６９）や、
検索ファイルに格納された文字列データとそのレコード
との対応表（テーブル）を使ってデータの追加・更新・
削除を行なう方式（特開平２−３０２８６９）が採られ
ており、本発明は後者に対応する。In these four conventional information retrieval devices,
A method in which a data area and a data update area are secured in a storage area in units of record numbers, data is stored in the data update area at the time of data update, and the data area and the data update area are replaced (Japanese Patent Laid-Open No. Sho 63-63). -1554324, JP-A-1-282635, JP-A-2-302869),
Add / update / update data using the correspondence table (table) between the character string data stored in the search file and its records
A method of deleting (Japanese Patent Application Laid-Open No. Hei 2-302869) is adopted, and the present invention corresponds to the latter.

【０００５】次に、従来の情報検索装置における、デー
タの登録、変更、削除の処理について説明する。図１９
は従来の情報検索装置の処理図を、また、図２０は従来
の情報検索装置でのテーブルとデータの流れとを示して
いる。検索対象ファイルとしてテーブル図２０（ａ）の
３レコードを例として説明する。ここで、Ａレコードで
は２文字と２文字が各々データ構造として蓄えられ、Ｂ
レコードでは２文字と５文字が蓄えられるものとする。
この時、Ａレコードの「１２」とＡレコードの「あい」
が対応し、Ａレコードの「３４」とＡレコードの「う
え」が対応するエリアとなる。データの登録、変更、削
除の処理は、図１９に示すように、ステップ１：検索対象ファイルの図２０（ａ）のテーブ
ルを検索対象レコードとして順次読み込み、ステップ２：各文字の位置（レコードの何番目か）、文
字サイズ、同じアイテム（項目）へのポインタをセット
し、図２０（ａ）のテーブルを図２０（ｂ）のテーブル
に変換する処理をメモリ上で行なう。例えば、ポインタ
のセットは第一レコードＡの「１２」と第三レコードＡ
の「あい」が同じアイテムであることから「あい」への
ポインタｐ１をセットする。以下同様に第一レコードＡ
の「３４」が第三レコードＡの「うえ」へのポインタを
セットする処理をする。Next, data registration, change, and deletion processing in the conventional information retrieval apparatus will be described. FIG.
Shows a processing diagram of the conventional information retrieval apparatus, and FIG. 20 shows a table and a data flow in the conventional information retrieval apparatus. As a search target file, the three records in the table of FIG. 20A will be described as an example. Here, in the A record, two characters and two characters are respectively stored as a data structure, and
Records shall store 2 and 5 characters.
At this time, A record “12” and A record “Ai”
Corresponds to the area, and “34” of the A record corresponds to “up” of the A record. As shown in FIG. 19, the data registration, modification, and deletion processes are performed by sequentially reading the table of FIG. 20A of the search target file as a search target record, and step 2: the position of each character (record Whatever number), the character size, and a pointer to the same item (item) are set, and the processing of converting the table of FIG. 20A into the table of FIG. 20B is performed on the memory. For example, the pointer set is “12” of the first record A and the third record A.
Since "Ai" of is a same item, a pointer p1 to "Ai" is set. First record A
“34” of “3” performs the process of setting a pointer to “up” of the third record A.

【０００６】ステップ３：以上の処理を、図２０（ａ）
のテーブルの最後まで繰り返し、ステップ４：メモリの空きエリアにテーブルをセットす
ることにより図２０（ｂ）のテーブルが最終的に作成さ
れる。Step 3: The above processing is performed as shown in FIG.
20 is repeated until the end of the table, and the table of FIG. 20B is finally created by setting the table in the empty area of the memory.

【０００７】ステップ５：データを更新する場合、例え
ば図２０（ａ）で第一レコードＡの「３４」を「５６」
に更新する場合には、図２０（ｂ）のテーブルからレコ
ードＡを調べ、第３行目の内容「３４」を検出して、
「５６」に書き換える。サイズ、ポインタの変更は発生
しない。Step 5: When data is updated, for example, in FIG. 20A, "34" in the first record A is changed to "56".
To update to, the record A is checked from the table of FIG. 20B, the content “34” of the third line is detected,
Rewrite as "56". No size or pointer changes occur.

【０００８】また、削除の場合、例えば図２０（ａ）で
第三レコードＡの「うえ」を削除する場合には、図２０
（ｂ）のテーブルからレコードＡを調べ、このテーブル
から第３行目のｐ２へのポインタを辿って第７行目の内
容「うえ」を検出し、削除する。このとき、第３行目の
「ｐ２へ」のポインタも併せて削除する。Further, in the case of deletion, for example, in the case of deleting the "up" of the third record A in FIG.
The record A is examined from the table in (b), the pointer "p2" in the third line is traced from this table, and the content "up" in the seventh line is detected and deleted. At this time, the pointer to "to p2" on the third line is also deleted.

【０００９】このように、従来の情報検索装置では、先
頭番地、サイズ、ポインタを手掛かりにテーブルを辿
り、登録、更新、削除を実現する。As described above, in the conventional information retrieval apparatus, registration, update, and deletion are realized by tracing the table with the head address, size, and pointer as clues.

【００１０】[0010]

【発明が解決しようとする課題】しかし、従来の情報検
索装置では、データベースのデータ数が多くなると、キ
ーワードで指定された文字列を含むデータの全文検索に
長い時間が掛かり、そのため、より高速での検索方式が
求められている。However, in the conventional information retrieval apparatus, when the number of data in the database is large, it takes a long time to retrieve the full text of the data including the character string designated by the keyword. Is required.

【００１１】また、従来の情報検索装置では、データの
更新または削除時に、その処理のために更新または削除
するレコード分の拡張領域を確保する必要があり、膨大
な記憶領域を用意しなければならないという問題があっ
た。Further, in the conventional information retrieval apparatus, when updating or deleting data, it is necessary to secure an extended area for the record to be updated or deleted for the processing, and a huge storage area must be prepared. There was a problem.

【００１２】また、データの更新または削除を行なった
後に、レコードの順番を整えるためのソーティングの処
理が必要になり、そのため、データの更新、削除に時間
が掛かるという問題があった。Further, after updating or deleting the data, it is necessary to perform a sorting process to arrange the order of the records, which causes a problem that it takes time to update or delete the data.

【００１３】また、データの更新または削除時には、デ
ータがぎっしり詰まった検索ファイル中に更新後の情報
を挿入したり、このファイル中からデータを削除してい
るため、その挿入位置や削除位置より後方のデータにつ
いては全てシフトさせなければならず、更新や削除に時
間が掛かるいう問題があった。Further, at the time of updating or deleting the data, since the updated information is inserted into the search file in which the data is tightly packed or the data is deleted from this file, the position after the insertion position or the deletion position is deleted. However, there is a problem that it takes time to update and delete all the data in (1).

【００１４】また、削除するデータについて削除用のフ
ラグを立てて間に合わせる方式もあるが、この場合に
は、削除データの増加に伴って、検索ファイルが増大化
してしまうという問題点を有していた。There is also a method of setting a deletion flag for the data to be deleted so as to make it in time, but in this case, there is the problem that the number of search files increases as the deletion data increases. It was

【００１５】本発明は、こうした従来の問題点を解決す
るものであり、キーワードで指定された文字列などを含
むデータがデータベースに存在するか否かを高速で検索
することができ、また、データの更新、削除を迅速に行
なうことができる情報検索装置を提供することを目的と
している。The present invention solves the above-mentioned conventional problems, and it is possible to search at high speed whether or not data including a character string designated by a keyword exists in a database, and the data can be searched. It is an object of the present invention to provide an information retrieval device that can quickly update and delete.

【００１６】[0016]

【課題を解決するための手段】そこで、本発明では、キ
ーワードを含むデータを検索する情報検索装置におい
て、検索対象データを構成する文字の出現度数を計算す
る文字情報抽出手段と、検索対象データの隣接する文字
の組とこの文字の各出現度数を組合せた出現度数パター
ンとを対応させて検索ファイルを作成する検索ファイル
生成手段と、検索ファイルからキーワードにおける隣接
する文字の組に対応する出現度数パターンを検索し、得
られた出現度数パターンを照合してキーワードを含む検
索対象データを検索する検索手段とを設けている。Therefore, in the present invention, in an information retrieval apparatus for retrieving data containing a keyword, character information extraction means for calculating the frequency of appearance of characters constituting the retrieval object data, and retrieval object data. Search file generation means for creating a search file by associating a set of adjacent characters with an appearance frequency pattern combining the respective appearance frequencies of the characters, and an appearance frequency pattern corresponding to a set of adjacent characters in the keyword from the search file Is provided, and the obtained appearance frequency pattern is collated to search for search target data including a keyword.

【００１７】また、検索ファイル生成手段が、出現度数
パターンを検索ファイル上に昇順または降順に配列し、
検索手段が、検索ファイル上で昇順または降順を満たす
範囲にある出現度数パターンを照合に使用している。Further, the search file generating means arranges the appearance frequency patterns on the search file in ascending or descending order,
The search means uses the appearance frequency pattern in the range satisfying the ascending order or the descending order on the search file for matching.

【００１８】また、検索ファイル生成手段は、検索ファ
イルの作成に当たって、出現度数パターンの配列の終端
にその前に在る出現度数パターンと同じ出現度数パター
ンを付け加え、検索手段は、付け加えられた出現度数パ
ターンに照合の順番が達したときその照合を止める。Further, the search file generating means adds the same appearance frequency pattern as the preceding appearance frequency pattern to the end of the array of the appearance frequency pattern when creating the search file, and the searching means adds the added appearance frequency pattern. When the order of matching in the pattern is reached, the matching is stopped.

【００１９】また、検索ファイルにおける出現度数パタ
ーンの配列の終端に空白の拡張領域を付け加えるととも
に、検索対象データの変更に伴って出現度数パターンの
追加が必要になったときにこの拡張領域に出現度数パタ
ーンを追加する検索ファイル修正手段を設けている。Further, a blank extension area is added to the end of the array of appearance frequency patterns in the search file, and when it is necessary to add an appearance frequency pattern as the data to be searched changes, the appearance frequency is added to this extension area. A search file correction means for adding a pattern is provided.

【００２０】また、検索ファイルに記述された出現度数
パターンの内からキーワードの検索に使用する有効出現
度数パターンの範囲を特定するために、有効出現度数パ
ターンの配列の終端にその前に在る出現度数パターンと
同じ出現度数パターンを書き加え、有効でない出現度数
パターンだけの配列の前に空白の拡張領域を付け加える
検索ファイル修正手段を設けている。Further, in order to specify the range of the effective appearance frequency pattern to be used for the keyword search from the appearance frequency patterns described in the search file, the occurrence of the effective appearance frequency pattern at the end of the array before the appearance occurs. A search file correction means is provided in which the same appearance frequency pattern as the frequency pattern is added and a blank extension area is added before the array of only the invalid appearance frequency patterns.

【００２１】[0021]

【作用】この情報検索装置では、検索対象データの文字
種ごとの出現度数、つまり、各文字がその文字種におい
て何番目に出現した文字であるかを計算し、データの文
字列を１文字ずつずらしながら、その文字と隣接する文
字とから成る文字パターンの各文字の出現度数を組にし
た出現度数パターンを求め、この出現度数パターンを集
めて検索ファイルを構成する。検索時には、キーワード
を複数の文字パターンに分割し、各文字パターンに対応
する出現度数パターンをこの検索ファイルから求め、そ
の出現度数パターンを照合し、各文字パターンが検索対
象データにおいて連続していると判明したとき、そのデ
ータがキーワードを含むものとして検出している。In this information retrieval apparatus, the frequency of appearance of each character type of the retrieval target data, that is, the number of the character that appears in each character type is calculated, and the character string of the data is shifted by one character at a time. , An appearance frequency pattern in which the appearance frequency of each character of a character pattern consisting of the character and an adjacent character is set, and the appearance frequency pattern is collected to form a search file. At the time of search, the keyword is divided into a plurality of character patterns, the appearance frequency pattern corresponding to each character pattern is obtained from this search file, the appearance frequency pattern is collated, and each character pattern is continuous in the search target data. When it is found, the data is detected as containing a keyword.

【００２２】この情報検索装置では、出現度数を利用す
る方式であるために、パターンの所在場所を文字位置で
押さえる従来方式に比較して、検索ファイルのサイズを
小さくできる。また、キーワードの検索に当たっては、
検索対象データの文字列の中から、キーワードの文字パ
ターンに一致する文字パターンだけを検索ファイルより
読出し、その出現度数パターンを基にキーワードの包含
の有無を照合しているため、検索対象データの全ての文
字列を照合対象とする必要がないので、高速での検索が
可能となる。Since this information retrieval apparatus uses the appearance frequency, the size of the retrieval file can be reduced as compared with the conventional method in which the location of the pattern is held at the character position. Also, when searching for keywords,
From the character string of the search target data, only the character pattern that matches the character pattern of the keyword is read from the search file, and the inclusion or non-existence of the keyword is checked based on the appearance frequency pattern. Since it is not necessary to match the character string of, the search can be performed at high speed.

【００２３】また、検索ファイルでの出現度数パターン
の順番を昇順または降順に設定し、検索時には、この昇
順または降順を満たす範囲で出現度数パターンの照合を
進めることにより、照合の不必要な相手を高速で確認す
ることができる。そのため、照合回数を減らすことがで
き、検索を高速化することができる。Further, the order of the appearance frequency patterns in the search file is set in ascending or descending order, and at the time of retrieval, the matching of the appearance frequency patterns is proceeded within the range satisfying this ascending or descending order, so that the other party who does not need to be verified You can check at high speed. Therefore, the number of collations can be reduced and the search can be speeded up.

【００２４】データ更新時には、検索ファイルの出現度
数パターンまたは拡張領域の上に新たな出現度数パター
ンを上書きすることができるため、検索ファイルのデー
タを一つずつずらすなどの処理が不要であり、検索ファ
イルの更新を高速で行なうことができる。また、出現度
数パターンの書き加えが可能な拡張領域を、出現度数パ
ターンの各配列の後に設けることにより、検索ファイル
中での拡張領域の配置割合が一定化し、更新処理時間の
平均化を図ることができ、更新時間の目安を立てること
ができる。At the time of updating the data, a new appearance frequency pattern can be overwritten on the appearance frequency pattern of the search file or the extended area, so that it is not necessary to shift the data of the search file one by one. Files can be updated at high speed. Also, by providing an extension area to which an appearance frequency pattern can be added after each array of the appearance frequency pattern, the allocation ratio of the extension area in the search file is made constant, and the update processing time is averaged. You can set a standard for the update time.

【００２５】また、データの削除時には、検索ファイル
の関連データを全て削除する代わりに、無効となる出現
度数パターンの配列の前に、昇順または降順を崩す出現
度数パターンや空白の拡散領域を設けることによって、
有効出現度数パターンと区別することができる。そのた
め、検索ファイルを削除する必要がないので、短時間で
削除処理を実現することができる。When deleting data, instead of deleting all the related data of the search file, an appearance frequency pattern that breaks the ascending or descending order or a blank diffusion area is provided before the array of invalid appearance frequency patterns. By
It can be distinguished from the effective appearance frequency pattern. Therefore, since it is not necessary to delete the search file, the deletion process can be realized in a short time.

【００２６】なお、本明細書では、「文字」という用語
を、文字データだけでなく、パターン化または記号化さ
れたデータの全てを含むものとして用いている。本発明
の情報検索装置は、文字データだけでなく、パターン化
または記号化されたデータの全てを対象として、そのデ
ータの登録、更新、削除、検索を実現することができ
る。In the present specification, the term "character" is used to include not only character data but all patterned or symbolized data. The information retrieval device of the present invention can realize registration, update, deletion, and retrieval of not only character data but also all patterned or symbolized data.

【００２７】[0027]

【Example】

（第１実施例）実施例の情報検索装置では、テキストの
文字列が例えば「ああいああいあんんああんあい」であ
る場合に、図３に示すように、文字種（「あ」「い」
「ん」）別の出現度数、即ち、文字列のそれぞれの
「あ」が何番目の「あ」であるか、それぞれの「い」が
何番目の「い」であるか、また、それぞれの「ん」が何
番目の「ん」であるかを求めて文字パターン度数表（図
３）を作成し、次いで、文字列の連続する２文字の組を
文字パターンとして、その文字パターンに対応する出現
度数のパターンを求める。文字列の「ああいあ」に対し
て、文字パターンは（あ，あ）（あ，い）（い，あ）で
あり、各文字パターンの出現度数パターンは（１，２）
（２，１）（１，３）となる。次に、文字パターンごと
に出現度数パターンを集めて、図５に示す各テキストご
との出現度数リスト（インデックスファイル）を作成す
る。(First Embodiment) In the information retrieval apparatus of the first embodiment, when the text character string is, for example, "Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh"
"N") Another appearance frequency, that is, what number "a" is each "a" in the character string, what number "i" is each "i", and A character pattern frequency table (FIG. 3) is created by determining what number "n" is, and then a set of two consecutive characters in the character string is set as a character pattern, and the character pattern is corresponded to. Find the appearance frequency pattern. The character pattern is (a, a) (a, i) (i, a) with respect to “aaaa” in the character string, and the appearance frequency pattern of each character pattern is (1, 2)
It becomes (2,1) (1,3). Next, the appearance frequency patterns are collected for each character pattern to create an appearance frequency list (index file) for each text shown in FIG.

【００２８】一方、キーワードの文字列が「あいあ」で
ある場合に、この文字列の文字パターンは（あ，い）
（い，あ）である。キーワードとテキストとの照合に際
しては、まず、テキストの出現度数リストから、キーワ
ードの文字パターンに対応する出現度数パターンを得
る。図５の出現度数リストから得た（あ，い）の出現度
数パターンは（２，１）（４，２）（８，３）であり、
（い，あ）の出現度数パターンは（１，３）（２，５）
である。この（あ，い）の出現度数パターンの第２要素
と、（い，あ）の出現度数パターンの第１要素とが一致
するとき、テキストはキーワードの文字列「あいあ」を
含むことになる。この例の場合、（２，１）の第２要素
（１）と（１，３）の第１要素（１）とが一致し、ま
た、（４，２）の第２要素（２）と（２，５）の第１要
素（２）とが一致するから、このテキストは「あいあ」
というキーワードを含んでいる。On the other hand, when the character string of the keyword is "Aia", the character pattern of this character string is (A, I)
(I, ah). In matching the keyword with the text, first, an appearance frequency pattern corresponding to the character pattern of the keyword is obtained from the text appearance frequency list. The appearance frequency pattern of (A, I) obtained from the appearance frequency list of FIG. 5 is (2, 1) (4, 2) (8, 3),
The appearance frequency pattern of (i, a) is (1, 3) (2, 5)
Is. When the second element of the appearance frequency pattern of (A, A) and the first element of the appearance frequency pattern of (A, A) match, the text includes the character string "Aia" of the keyword. . In the case of this example, the second element (1) of (2,1) and the first element (1) of (1,3) are the same, and the second element (2) of (4,2) is Since the first element (2) of (2,5) matches, this text is "Aia".
Is included.

【００２９】実施例の情報検索装置は、基本的にこうし
た方式でキーワードを含むテキストを検索する。この装
置は、図１に示すように、検索対象のテキストデータを
収めたテキストデータベース31と、テキストデータベー
ス31からテキストを一つずつ読込んで各テキストごとの
テキストレコードを生成するテキスト生成部32と、検索
したいキーワードを読み込むキーワード読込部38と、テ
キスト修正用の修正テキストデータを読込む修正テキス
ト読取部312と、テキストレコードや修正用のテキスト
レコードから文字パターン度数表を作成し、またキーワ
ードから文字パターンを作成する連続文字情報抽出部33
と、文字パターン度数表から文字パターンとその出現度
数パターンとより成るインデックスファイルデータを作
成するインデックス生成部35と、連続文字情報抽出部33
で作成される文字パターン度数表やインデックス生成部
35で作成されるインデックスファイルを格納するインデ
ックスファイル記憶部34と、インデックスファイルの出
現度数パターンリストの最後に拡張領域を追加する拡張
領域生成部36と、インデックスファイルのデータとキー
ワードの文字パターンとを照合してキーワードの文字パ
ターンがテキスト中に連続する状態で含まれているかど
うかを検索するインデックス検索部39と、照合すべき出
現度数パターンが無くなったときにインデックス検索部
39に照合の停止を命令する文字度数存在検索部310と、
インデックス検索部39の検索結果を表示出力する検索結
果出力部311と、テキスト修正時にインデックスファイ
ルの出現度数パターンを更新する更新データ生成部37と
を備えている。The information retrieving apparatus of the embodiment basically retrieves the text containing the keyword in this manner. As shown in FIG. 1, the apparatus includes a text database 31 that stores text data to be searched, a text generation unit 32 that reads texts from the text database 31 one by one and generates a text record for each text. A keyword reading unit 38 that reads the keyword you want to search, a correction text reading unit 312 that reads the corrected text data for text correction, a character pattern frequency table from the text record and the correction text record, and the character pattern from the keyword Continuous character information extraction unit 33
And an index generation unit 35 that creates index file data including a character pattern and its appearance frequency pattern from a character pattern frequency table, and a continuous character information extraction unit 33.
Character pattern frequency table and index generator created by
The index file storage unit 34 that stores the index file created in 35, the expansion area generation unit 36 that adds an expansion area to the end of the appearance frequency pattern list of the index file, and the index file data and the character pattern of the keyword An index search unit 39 for matching and searching for whether or not the character pattern of the keyword is included in the text in a continuous state, and an index search unit when there is no occurrence frequency pattern to be matched
39, the character frequency existence search unit 310 that instructs the collation stop,
A search result output unit 311 for displaying and outputting the search result of the index search unit 39, and an update data generation unit 37 for updating the appearance frequency pattern of the index file at the time of text correction are provided.

【００３０】第１実施例では、テキストからインデック
スファイルを生成する動作と、このインデックスファイ
ルを使ってキーワードが含まれるテキストを検索する動
作とについて説明する。この場合、図１の情報検索装置
の修正テキスト読取部312及び更新データ生成部37は使
用しない。In the first embodiment, the operation of generating an index file from text and the operation of searching for text containing a keyword using this index file will be described. In this case, the modified text reading unit 312 and the update data generating unit 37 of the information retrieval device of FIG. 1 are not used.

【００３１】インデックスファイルの生成は図２の処理
フローに従って行なわれる。The index file is generated according to the processing flow of FIG.

【００３２】ステップ１：先ず、テキスト生成部32は、
テキストデータベース31からテキストを読込み、そのテ
キストのテキストレコードにレコード番号Niを付与す
る。いま、このテキストレコードが「ああいああいあん
んああんあい」であるとする。ステップ２：連続文字情報抽出部33は、テキストレコー
ドの先頭の文字にポインタｐをセットし、文字種の出現
度数N(A)のカウント値を初期化した後、ステップ３：ポインタ位置における文字のステップ４：文字種ごとの出現度数をカウントし、ステップ５：文字パターン度数表（図３）を作成して、
インデックスファイル記憶部34に格納する。この表には
文字種ごとの出現度数とレコード番号との組が記述され
る。Step 1: First, the text generator 32
The text is read from the text database 31 and the record number Ni is assigned to the text record of the text. Now, suppose this text record is "Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh" Step 2: The continuous character information extraction unit 33 sets the pointer p to the first character of the text record and initializes the count value of the appearance frequency N (A) of the character type, and then Step 3: the step of the character at the pointer position. 4: Count the appearance frequency for each character type, Step 5: Create a character pattern frequency table (Fig. 3),
It is stored in the index file storage unit 34. In this table, a set of appearance frequency and record number for each character type is described.

【００３３】ステップ６：ポインタ位置がテキストレコ
ードの最後の文字に達していないときは、ステップ７：ポインタを次の文字に移し、ステップ３に
戻って、その文字の文字種ごとの出現度数を調べる。Step 6: When the pointer position has not reached the last character of the text record, Step 7: The pointer is moved to the next character, and the procedure returns to Step 3 to check the appearance frequency of each character for each character type.

【００３４】この手順を繰り返して、レコード番号Niの
テキストレコードに関する文字パターン度数表の作成が
終了すると、ステップ８：インデックス生成部35は、この文字パター
ン度数表のテキストレコードの先頭に、文字パターン
（２文字ずつの文字の組）を指定するポインタｉをセッ
トし、ステップ９：ポインタｉの指し示す文字パターンの各文
字における出現度数を求め、ステップ10：この出現度数の組から成る出現度数パター
ンを、文字パターンと対応させてインデックスファイル
（図５）に記録する。また、インデックスファイルには
レコード番号Niを併せて記録する。When the creation of the character pattern frequency table for the text record of record number Ni is completed by repeating this procedure, Step 8: The index generation unit 35 adds the character pattern ( A pointer i designating a character set of two characters each) is set, and step 9: the appearance frequency of each character of the character pattern pointed to by the pointer i is obtained, and step 10: the appearance frequency pattern composed of the set of the appearance frequencies is set to It is recorded in the index file (FIG. 5) in association with the character pattern. The record number Ni is also recorded in the index file.

【００３５】ステップ11：ポインタｉがテキストレコー
ドの終端に達していなければ、ステップ12：ポインタｉを１文字分移動して、ステップ
９に戻り、文字パターン度数表からポインタの指し示す
文字パターンの出現度数パターンを求め、その文字パタ
ーンが未だインデックスファイルに記録されていないと
きは、その文字パターンと出現度数パターンとを対応さ
せてインデックスファイルに記録し、併せてレコード番
号Niを記録する。また、その文字パターンが既にインデ
ックスファイルに記録されているときは、その文字パタ
ーンに対応する出現度数パターンリストの末尾に、求め
た出現度数パターンを記録する。Step 11: If the pointer i has not reached the end of the text record, Step 12: Move the pointer i by one character and return to Step 9, and the appearance frequency of the character pattern pointed to by the pointer from the character pattern frequency table. A pattern is obtained, and when the character pattern is not yet recorded in the index file, the character pattern and the appearance frequency pattern are recorded in the index file in association with each other, and the record number Ni is also recorded. If the character pattern is already recorded in the index file, the obtained appearance frequency pattern is recorded at the end of the appearance frequency pattern list corresponding to the character pattern.

【００３６】こうした手順を繰り返し、ポインタｉがテ
キストレコードの終端にまで達すると、ステップ13：領域拡張生成部36は、インデックスファイ
ルの全ての出現度数パターンリストの最後尾に拡張領域
を設け、そこを空白領域としたり、またはそこに最後の
出現度数パターンと同じ出現度数パターンを記録する。When the pointer i reaches the end of the text record by repeating the above procedure, step 13: the area expansion generating unit 36 provides an expansion area at the end of all the appearance frequency pattern lists of the index file, and sets it there. It is used as a blank area, or the same appearance frequency pattern as the last appearance frequency pattern is recorded therein.

【００３７】図６に拡張領域を加えたインデックスファ
イルのデータ構造を示している。なお、このファイルの
出現度数パターンリストでは、出現度数パターンを構成
する個々の要素が昇順に並んでいる。即ち、出現度数リ
ストの要素を（Ai，Bi）とすると、A1＜A2＜‥＜Am、B1
＜B2＜‥＜Bmの関係が成り立つ。FIG. 6 shows the data structure of the index file added with the extension area. In the appearance frequency pattern list of this file, the individual elements forming the appearance frequency pattern are arranged in ascending order. That is, if the elements of the appearance frequency list are (Ai, Bi), A1 <A2 <... <Am, B1
The relationship <B2 <... <Bm holds.

【００３８】ステップ14：以上の処理を、テキストデー
タベース31から順番に読出したテキストレコードに対し
て繰り返し、ステップ15：テキストデータベース31に格納されたテキ
ストの全てに対する処理が終えると、インデックスファ
イルの生成を終了する。Step 14: The above processing is repeated for the text records sequentially read from the text database 31, and Step 15: When the processing for all the texts stored in the text database 31 is completed, the index file is generated. finish.

【００３９】次に、インデックスファイルを使ってキー
ワードが含まれるテキストを検索する動作について説明
する。これは図７の手順で行なわれる。ここでは、キー
ワードが「あいあ」であるとする。Next, the operation of searching the text including the keyword using the index file will be described. This is performed according to the procedure shown in FIG. Here, it is assumed that the keyword is “Aia”.

【００４０】ステップ１：先ず、キーワード読取部38は
キーワードを読み込み、ステップ２：連続文字情報抽出部33は、このキーワード
を２文字ずつの文字パターンの組（A[p]，B[p+1]）（B
[p+1]，C[p+2]）‥に分解する。ここでA[p]はキーワー
ドのｐ番目の文字Ａを示す。キーワード「あいあ」の場
合には（あ，い）と（い，あ）とに分解される。Step 1: First, the keyword reading unit 38 reads a keyword, and Step 2: the continuous character information extracting unit 33 uses this keyword as a set of two character patterns (A [p], B [p + 1). ]) (B
[p + 1], C [p + 2]). Here, A [p] indicates the p-th character A of the keyword. In the case of the keyword "Aia", it is decomposed into (A, I) and (I, A).

【００４１】ステップ３：連続文字情報抽出部33は、レ
コード番号Niをセットし、ステップ４：ポインタｐをキーワードの先頭文字位置に
セットして、ステップ５：キーワードの最初の２つの文字パターン、
実例では（あ，い）と（い，あ）、を照合の対象として
設定する。Step 3: The continuous character information extraction unit 33 sets the record number Ni, Step 4: sets the pointer p to the first character position of the keyword, and Step 5: the first two character patterns of the keyword,
In the example, (a, i) and (i, a) are set as the target of collation.

【００４２】ステップ６：インデックス検索部39は、イ
ンデックスファイル記憶部34に格納されたレコード番号
Niのインデックスファイル（図６）から、第１の文字パ
ターン（あ，い）の出現度数リスト（M[p][j]，N[p]
[j]）の先頭（ｊ＝１）の出現度数パターン（２，１）
を読取り、ステップ７：また、第２の文字パターン（い，あ）の出
現度数リスト（M[p+1][ｋ]，N[p+1][ｋ]）の先頭（ｋ＝
１）の出現度数パターン（１，３）を読取る。ここでキ
ーワード「あいあ」に対応するインデックスファイルを
図８に示す。文字パターン（A[p]，B[p+1]）＝（あ，
い）及び(B[p+1]，C[p+2]）＝（い，あ）に対して、出
現度数パターンは、それぞれ（MA[p][1]，MB[p+1][1]）
＝（２，１）‥(MA[p][n]，MB[p+1][n]）＝（８，３）
と（MB[p+1][1]，MC[p+2][1]）＝（１，３）‥(MB[p+1]
[n]，MC[p+2][n]）＝（２，５）とが対応する。このMA
[q][j]は文字Ａ、キーワード文字列のｑ番目の文字パタ
ーンに関するｊ番目の出現度数パターンを意味する。ま
た、各文字パターンに対する出現度数パターンの終端に
は拡張領域として空欄または最後の出現度数パターンと
同じ出現度数パターンが格納されている。Step 6: The index search unit 39 uses the record number stored in the index file storage unit 34.
From the Ni index file (Fig. 6), the appearance frequency list (M [p] [j], N [p] of the first character pattern (A, I))
[j]) first (j = 1) appearance frequency pattern (2, 1)
Step 7: Also, the head (k = k =) of the appearance frequency list (M [p + 1] [k], N [p + 1] [k]) of the second character pattern (i, a) is read.
The appearance frequency pattern (1, 3) of 1) is read. Here, an index file corresponding to the keyword "Aia" is shown in FIG. Character pattern (A [p], B [p + 1]) = (Oh,
I) and (B [p + 1], C [p + 2]) = (i, a), the appearance frequency patterns are (MA [p] [1], MB [p + 1] [ 1])
= (2, 1) (MA [p] [n], MB [p + 1] [n]) = (8, 3)
And (MB [p + 1] [1], MC [p + 2] [1]) = (1,3) (MB [p + 1]
[n], MC [p + 2] [n]) = (2,5) correspond. This MA
[q] [j] means the letter A, the jth appearance frequency pattern with respect to the qth character pattern of the keyword character string. Further, at the end of the appearance frequency pattern for each character pattern, a blank or the same appearance frequency pattern as the last appearance frequency pattern is stored as an extension area.

【００４３】ステップ８：インデックス検索部39は、図
９に示すように、第１の文字パターンにおける出現度数
パターン（２，１）の第２要素と、第２の文字パターン
における出現度数パターン（１，３）の第１要素とが一
致するかどうかをチェックし、第１の文字パターンと第
２の文字パターンとが連続しているかどうかを照合す
る。それらが一致しているときは、ステップ９：検索結果出力部311に、連続する文字「あ
いあ」とインデックスファイルのレコード番号Niとを出
力する。Step 8: As shown in FIG. 9, the index search unit 39 causes the second element of the appearance frequency pattern (2,1) in the first character pattern and the appearance frequency pattern (1 in the second character pattern). , 3) and the first element of (3) are matched, and whether the first character pattern and the second character pattern are continuous is checked. If they match, step 9: The consecutive characters "Aia" and the record number Ni of the index file are output to the search result output unit 311.

【００４４】ステップ10：インデックス検査部39は、次
の連続性の照合のために、第１の文字パターンにおける
出現度数パターンの第２要素（N[p][j]）と第２の文字
パターンにおける出現度数パターンの第１要素（M[p+1]
[k])との大きさを比較し、第１要素（M[p+1][k])の方が
大きいとき、または、第２要素（N[p][j]）と第１要素
（M[p+1][k])とが等しいときには、ステップ11：ｊをインクリメントし、ステップ12：第１の文字パターンにおける出現度数パタ
ーンの第２要素（N[p][j]）の方が大きいとき、また
は、第２要素（N[p][j]）と第１要素（M[p+1][k])とが
等しいときには、ステップ13：ｋをインクリメントする。Step 10: The index checking unit 39 compares the second element (N [p] [j]) of the appearance frequency pattern with the second character pattern in the first character pattern for the next matching of continuity. The first element (M [p + 1] of the appearance frequency pattern in
[k]) and the first element (M [p + 1] [k]) is larger, or the second element (N [p] [j]) and the first element When (M [p + 1] [k]) is equal, step 11: j is incremented, and step 12: of the second element (N [p] [j]) of the appearance frequency pattern in the first character pattern. When it is larger, or when the second element (N [p] [j]) and the first element (M [p + 1] [k]) are equal, step 13: k is incremented.

【００４５】ステップ14：出現度数存在検索部310は、
第１の文字パターンにおける出現度数パターンの第２要
素（N[p][j]）と、その文字パターンにおける次の出現
度数パターンの第２要素（N[p][j+1]）との大小を比較
し、また、第２の文字パターンにおける出現度数パター
ンの第１要素（M[p+1][k]）と、その文字パターンにお
ける次の出現度数パターンの第１要素（M[p+1][k+1]）
との大小を比較し、昇順の関係にあるときは、ステップ15：ステップ８に戻って、更新した出現度数パ
ターン間の第２要素と第１要素とを照合する。Step 14: The appearance frequency existence search unit 310
The second element (N [p] [j]) of the appearance frequency pattern in the first character pattern and the second element (N [p] [j + 1]) of the next appearance frequency pattern in the character pattern The size is compared, and the first element (M [p + 1] [k]) of the appearance frequency pattern in the second character pattern is compared with the first element (M [p [p] of the next appearance frequency pattern in the character pattern. +1] [k + 1])
If they are in ascending order, the process returns to step 15: step 8 to check the second element and the first element between the updated appearance frequency patterns.

【００４６】また、ステップ14において、昇順の関係に
ないとき、つまり、次の出現度数パターンが、領域拡張
生成部36によって出現度数リストの末尾に生成された空
白領域または最後の出現度数パターンと同じ出現度数パ
ターンを記録した領域に達したときは、その文字パター
ンに関する照合を終了する。In step 14, when there is no ascending order relation, that is, the next appearance frequency pattern is the same as the blank area or the last appearance frequency pattern generated at the end of the appearance frequency list by the area expansion generating unit 36. When the area in which the appearance frequency pattern is recorded is reached, the collation for the character pattern ends.

【００４７】ここで、図１０にキーワード「あいあ」に
ついての検索の流れを示す。第１文字パターン（あ，
い）の出現度数リスト（２，１）（４，２）（８，３）
（８，３）の最初の出現度数パターン（２，１）と、第
２文字パターン（い，あ）の出現度数リスト（１，３）
（２，５）（２，５）の最初の出現度数パターン（１，
３）とを比較し、第１の出現度数パターンの第２要素
「１」が第２の出現度数パターンの第１要素「１」と一
致するので、その結果を検索結果出力部311に格納す
る。続いて第１文字パターンにおける２番目の出現度数
パターン（４，２）と、第２文字パターンにおける２番
目の出現度数パターン（２，５）とを比較し、第１の出
現度数パターンの第２要素「２」が第２の出現度数パタ
ーンの第１要素「２」と一致するので、その結果も検索
結果出力部311に格納する。続いて第１文字パターンに
おける３番目の出現度数パターン（８，３）と、第２文
字パターンにおける３番目の出現度数パターン（２，
５）とを比較することになるが、この第２文字パターン
における３番目の出現度数パターン（２，５）は、同じ
文字パターンにおける２番目の出現度数パターン（２，
５）に対して昇順の関係を満たしていない。そのため、
この文字パターンの照合処理をここで終了する。Here, FIG. 10 shows a flow of search for the keyword "Aia". First character pattern (a,
Frequency) list (2, 1) (4, 2) (8, 3)
First appearance frequency pattern (2,1) of (8,3) and appearance frequency list (1,3) of the second character pattern (i, a)
(2,5) The first appearance frequency pattern of (2,5) (1,
3) and the second element “1” of the first appearance frequency pattern matches the first element “1” of the second appearance frequency pattern, so the result is stored in the search result output unit 311. . Then, the second appearance frequency pattern (4, 2) in the first character pattern is compared with the second appearance frequency pattern (2, 5) in the second character pattern, and the second appearance frequency pattern of the first appearance frequency pattern is compared. Since the element “2” matches the first element “2” of the second appearance frequency pattern, the result is also stored in the search result output unit 311. Subsequently, the third appearance frequency pattern (8, 3) in the first character pattern and the third appearance frequency pattern (2, 3 in the second character pattern
5), the third appearance frequency pattern (2, 5) in the second character pattern is the second appearance frequency pattern (2, 5) in the same character pattern.
The relation of 5) is not satisfied in ascending order. for that reason,
The matching process of this character pattern ends here.

【００４８】ステップ16：ステップ14において、その文
字パターンに対する照合処理が終了したときは、キーワ
ードの他の文字パターンとの照合が済んでいるかどうか
を調べ、済んでいなければ、ステップ17：ステップ５以下の処理を行なう。Step 16: When the matching process for the character pattern is completed in Step 14, it is checked whether or not the matching with other character patterns of the keyword has been completed. If not, Step 17: Step 5 The following processing is performed.

【００４９】ステップ18：キーワードの全ての文字パタ
ーンについての連続性の照合が終了したときは、そのキ
ーワードの全ての文字パターンに対してステップ９の照
合結果が得られているかどうかを調べ、得られていると
きは、レコード番号Niのテキストを、キーワードを含む
ものとして表示する。Step 18: When the collation of the continuity of all the character patterns of the keyword is completed, it is checked whether the collation result of Step 9 is obtained for all the character patterns of the keyword, and the result is obtained. , The text of record number Ni is displayed as including the keyword.

【００５０】このように実施例の情報検索装置では、テ
キストの文字パターン（連続する文字種の組合せ）に、
その文字パターンの出現度数パターンを昇順に配列した
出現度数リストを対応させて、検索ファイルを作成し、
この検索ファイルからキーワードの文字パターンに対応
する各出現度数リストを読出し、そのリスト間の出現度
数パターンを照合することによって、キーワードの文字
列がテキストデータ中に存在しているかどうかを検索す
る。この出現度数パターンの照合では、出現度数パター
ンが昇順に並んでいるため、出現度数パターン同士を網
羅的に照合する必要が無く、少ない照合回数で検索する
ことができる。従って、高速での検索が可能になる。As described above, in the information retrieving apparatus of the embodiment, the text character pattern (combination of consecutive character types)
Create a search file by associating the appearance frequency list in which the appearance frequency patterns of the character patterns are arranged in ascending order,
Each appearance frequency list corresponding to the character pattern of the keyword is read from this search file, and the appearance frequency pattern between the lists is collated to search whether or not the character string of the keyword is present in the text data. In the matching of the appearance frequency patterns, since the appearance frequency patterns are arranged in ascending order, it is not necessary to comprehensively match the appearance frequency patterns with each other, and the search can be performed with a small number of matching times. Therefore, high-speed search is possible.

【００５１】なお、出現度数リストにおける出現度数パ
ターンの順序は降順にしてもよい。また、この情報検索
装置は、文字列データだけで無く、その他のパターン化
されたデータのパターン列を検索対象とすることができ
る。The appearance frequency patterns in the appearance frequency list may be in descending order. In addition, this information retrieval device can retrieve not only character string data, but also pattern strings of other patterned data.

【００５２】（第２実施例）第２実施例では、データベ
ースに格納されたテキストの変更に伴うインデックスフ
ァイルのデータ更新について説明する。(Second Embodiment) In the second embodiment, the data update of the index file due to the change of the text stored in the database will be described.

【００５３】図１の情報検索装置において、修正テキス
ト読込部312は、テキストデータ番号が付けられたテキ
スト更新用の修正テキストデータを読み込み、連続文字
情報抽出部33は、修正テキストデータから文字パターン
度数表を作成する。作成された文字パターン度数表はイ
ンデックスファイル記憶部34に格納される。インデック
ス生成部35は、この文字パターン度数表から文字パター
ン毎の出現度数パターンを作成し、更新データ生成部37
は、既にあるインデックスファイルの出現度数パターン
リストにこの出現度数パターンを上書きする。また、拡
張領域生成部36は、出現度数パターンの数が既にある出
現度数パターンリストを超えた場合に出現度数パターン
用の拡張領域を追加する。In the information retrieving apparatus of FIG. 1, the modified text reading unit 312 reads the modified text data for updating the text with the text data number, and the continuous character information extraction unit 33 extracts the character pattern frequency from the modified text data. Create a table. The created character pattern frequency table is stored in the index file storage unit 34. The index generating unit 35 creates an appearance frequency pattern for each character pattern from the character pattern frequency table, and the update data generating unit 37
Overwrites the appearance frequency pattern in the existing appearance frequency pattern list of the index file. Further, the extension area generation unit 36 adds an extension area for the appearance frequency pattern when the number of appearance frequency patterns exceeds the existing appearance frequency pattern list.

【００５４】このテキスト変更に伴うインデックスファ
イルデータの更新の動作を図１１の処理フローを用いて
説明する。ここではテキストレコードの「ああいああい
あんんああんあい」を「ああいあいいあんああいんあ
い」に変更する場合を例に説明をする。The operation of updating the index file data due to this text change will be described with reference to the processing flow of FIG. Here, the case where the text record “Ahhhhhhhhhhh” is changed to “Ahhhhhhhhhhhhhhhhhhhhhhhhhhh”

【００５５】ステップ１：先ず、修正テキスト読取部31
2は、変更後のテキストレコードを読み取り、レコード
番号をセットする。Step 1: First, the modified text reading unit 31
2 reads the changed text record and sets the record number.

【００５６】ステップ２：連続文字情報抽出部33は、ポ
インタｐをテキストレコードの先頭文字「あ」にセット
し、文字種の出現度数N(A)のカウント値を初期化した
後、ステップ３：ポインタ位置における文字のステップ４：文字種ごとの出現度数をカウントし、ステップ５：文字パターン度数表（図１２）を作成し
て、インデックスファイル記憶部34に格納する。この表
には更新後テキストレコードの文字種ごとの出現度数が
レコード番号とともに記述される。Step 2: The continuous character information extraction unit 33 sets the pointer p to the first character "a" of the text record, initializes the count value of the appearance frequency N (A) of the character type, and then the step 3: pointer Step 4: Count appearance frequency for each character type, and Step 5: Create character pattern frequency table (FIG. 12) and store it in the index file storage unit 34. In this table, the appearance frequency for each character type of the updated text record is described together with the record number.

【００５７】ステップ６：ポインタ位置が更新後テキス
トレコードの最後の文字に達していないときは、ステップ７：ポインタを次の文字に移し、ステップ３に
戻って、その文字の文字種ごとの出現度数を調べる。Step 6: When the pointer position has not reached the last character of the text record after updating, Step 7: The pointer is moved to the next character, and the procedure returns to Step 3 and the appearance frequency for each character type is determined. Find out.

【００５８】この手順を繰り返して、レコード番号Niの
更新後テキストレコードに関する文字パターン度数表の
作成が終了すると、ステップ８：インデックス生成部35は、テキストレコー
ド番号Niに対応するインデックスファイルをインデック
スファイル記憶部34から探索し、このファイルの先頭に
ポインタをセットした後、ステップ９：文字パターン度数表（図１２）の更新後テ
キストレコードの先頭に、文字パターン（２文字ずつの
文字の組）を指定するポインタｉをセットし、ステップ10：ポインタｉの指し示す文字パターンの各文
字における出現度数を求める。When this procedure is repeated and the creation of the character pattern frequency table relating to the updated text record of the record number Ni is completed, step 8: the index generating unit 35 stores the index file corresponding to the text record number Ni in the index file. After searching from the section 34 and setting the pointer at the beginning of this file, Step 9: Specify the character pattern (set of two characters each) at the beginning of the updated text record in the character pattern frequency table (FIG. 12). Step 10: The appearance frequency of each character of the character pattern pointed to by the pointer i is obtained.

【００５９】ステップ11：更新データ生成部37は、この
出現度数の組から成る出現度数パターンを、ポインタの
指し示すインデックスファイルの出現度数パターンリス
トに記述されている出現度数パターンや拡張領域の上に
上書きする。Step 11: The update data generation unit 37 overwrites the appearance frequency pattern consisting of the set of the appearance frequencies on the appearance frequency pattern and the extension area described in the appearance frequency pattern list of the index file pointed by the pointer. To do.

【００６０】ステップ12：出現度数パターンリストが一
杯になると、拡張領域生成部36は、リストの最後尾に拡
張領域を追加する。Step 12: When the appearance frequency pattern list is full, the extension area generator 36 adds an extension area to the end of the list.

【００６１】ステップ13：インデックス生成部35は、ポ
インタｉが更新後テキストレコードの終端にまで達して
いなければ、ステップ14：ポインタｉを１文字分移動して、ステップ
10に戻り、その文字パターンの出現度数パターンを求め
る動作を繰り返す。Step 13: If the pointer i does not reach the end of the updated text record, the index generator 35 moves the pointer i by one character,
Returning to 10, the operation of obtaining the appearance frequency pattern of the character pattern is repeated.

【００６２】ステップ15：ポインタｉがテキストレコー
ドの終端にまで達すると、更新データ生成部37は、全て
の出現度数パターンリストの最後尾にリスト最後の出現
度数パターンと同じ出現度数パターンを生成し、また、
拡張領域生成部36は、全ての出現度数パターンリストの
最後に拡張領域を追加する。また、更新データ生成部37
は、上書きがされなかった更新前のテキストレコードに
関する出現度数パターンの全てを拡張領域に更新する。Step 15: When the pointer i reaches the end of the text record, the update data generator 37 generates the same appearance frequency pattern as the last appearance frequency pattern in the list at the end of all appearance frequency pattern lists, Also,
The extension area generation unit 36 adds the extension area to the end of all the appearance frequency pattern lists. In addition, the update data generation unit 37
Updates all the appearance frequency patterns relating to the pre-updated text record that has not been overwritten in the extended area.

【００６３】図１３に更新用テキストレコードにおける
出現度数パターンリストとインデックスファイルとの関
係を示し、図１４に更新した後のインデックスファイル
の構成図を示している。出現度数パターンリストの最後
には同一の出現度数パターンと拡張領域とが追加されて
いる。また、文字パターン（ん，ん）のように、更新前
は出現度数パターンが存在したが更新後出現しないもの
については拡張領域に置き代わっている。FIG. 13 shows the relationship between the appearance frequency pattern list in the updating text record and the index file, and FIG. 14 shows the structure of the updated index file. The same appearance frequency pattern and extension area are added to the end of the appearance frequency pattern list. Further, like the character pattern (n, n), the appearance frequency pattern that exists before the update but does not appear after the update is replaced by the extended area.

【００６４】更新されたインデックスファイルを用いて
行なわれる検索の処理手順は図７と同様である。The processing procedure of the search performed using the updated index file is the same as that shown in FIG.

【００６５】このように、実施例の情報検索装置では、
インデックスファイルの出現度数パターンリストの最後
尾に空白の拡張領域を設けているため、更新される出現
度数パターンの数が、既に登録されている出現度数パタ
ーンの数より多くなる場合でも、その拡張領域に直ちに
上書きすることができるので、高速でのデータ更新が可
能である。また、この拡張領域は、出現度数パターンを
記録するエリアが不足しない程度に、その都度、拡張領
域を追加する構成であるため、追加の手間が少なくて済
む。また、膨大なインデックスファイルを予め用意する
場合には、メモリ容量が徒に大きくなるだけでなく、消
去や更新の対象となる出現度数パターンを調べるために
手間取ることになるが、実施例の装置の場合には、そう
した虞れが少なく、テキストデータの変更に伴うインデ
ックスファイルの更新を高速で行なうことができる。As described above, in the information retrieval apparatus of the embodiment,
Since a blank extension area is provided at the end of the appearance frequency pattern list of the index file, even if the number of updated appearance frequency patterns exceeds the number of registered appearance frequency patterns, the extension area Since it can be overwritten immediately, the data can be updated at high speed. Further, since the extension area is configured such that the extension area is added each time to the extent that the area for recording the appearance frequency pattern is not insufficient, the addition work can be done with little effort. In addition, when a huge index file is prepared in advance, not only the memory capacity becomes unnecessarily large, but it takes time to check the appearance frequency pattern to be erased or updated. In this case, there is little possibility of such a situation, and the index file can be updated at high speed when the text data is changed.

【００６６】（第３実施例）第３実施例では、テキスト
の一部が削除されたときのインデックスファイルの修正
について説明する。(Third Embodiment) In the third embodiment, the correction of the index file when a part of the text is deleted will be described.

【００６７】テキストレコード「ああいああいあんんあ
あんあい」の後半を削除して、「ああいああいあん」に
修正した場合を例として、その動作を説明する。このと
きの情報検索装置の動作手順を図１５に示している。こ
の動作手順の内、ステップ１からステップ10までは、第
２実施例で示したテキストレコードの更新時の動作（図
１１）と同じである。削除後のテキストレコードによ
り、図１６に示す文字パターン度数表が構成され、イン
デックス生成部35は、この文字パターン度数表から文字
パターンの各文字における出現度数を求める。The operation will be described by taking as an example the case where the latter half of the text record "Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh" is deleted at the end of the text record "Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh""are deleted. FIG. 15 shows the operation procedure of the information retrieval device at this time. In this operation procedure, steps 1 to 10 are the same as the operation when updating the text record shown in the second embodiment (FIG. 11). The deleted text record constitutes the character pattern frequency table shown in FIG. 16, and the index generation unit 35 determines the appearance frequency of each character of the character pattern from the character pattern frequency table.

【００６８】ステップ11：更新データ生成部37は、この
出現度数の組から成る出現度数パターンを、既存の同一
レコード番号を有するインデックスファイルの出現度数
パターンリストに上書きする。Step 11: The update data generator 37 overwrites the appearance frequency pattern list of the index file having the same record number with the appearance frequency pattern consisting of the set of the appearance frequencies.

【００６９】このとき、テキストレコードの一部削除に
よって出現度数パターンの数はそれまでより減少するの
で、出現度数パターンが拡張領域の上に上書きされるこ
とは無いし、また、拡張領域の追加が必要になる事態も
発生しない。At this time, since the number of appearance frequency patterns is reduced by deleting a part of the text record, the appearance frequency patterns are not overwritten on the extension area, and the extension area is not added. There is no need for it.

【００７０】図１６の文字パターン度数表を使って、文
字パターンに対応する出現度数パターンを求める動作を
削除後テキストレコードの終端まで行なったときに、図
１７に示す出現度数パターンリストが得られる。When the operation for obtaining the appearance frequency pattern corresponding to the character pattern is performed up to the end of the text record using the character pattern frequency table of FIG. 16, the appearance frequency pattern list shown in FIG. 17 is obtained.

【００７１】ステップ14：更新データ生成部37は、全て
の出現度数パターンリストの最後尾にリスト最後の出現
度数パターンと同じ出現度数パターンを追加し、また、
削除後テキストレコードにおいて出現しなかった文字パ
ターンについては出現度数パターンリストの先頭に拡張
領域を追加する。Step 14: The update data generator 37 adds the same appearance frequency pattern as the appearance frequency pattern at the end of the list to the end of all the appearance frequency pattern lists, and
For a character pattern that does not appear in the text record after deletion, an extension area is added to the beginning of the appearance frequency pattern list.

【００７２】図１８に削除した後のインデックスファイ
ルを示している。ここで、有効な出現度数パターンは太
線で囲った部分であり、その他は検索に使用されない無
効部分となる。FIG. 18 shows the index file after deletion. Here, the valid appearance frequency pattern is a portion surrounded by a thick line, and the others are invalid portions that are not used for the search.

【００７３】削除されたインデックスファイルの検索の
処理手順は図７と同様である。なお、インデックスファ
イルの検索は出現度数パターンの要素が昇順を保ってい
る範囲でのみ実行されるので、図１８に示すインデック
スファイルの無効部分での検索は行なわれない。The processing procedure for searching the deleted index file is the same as in FIG. Since the index file is searched only in the range in which the elements of the appearance frequency pattern maintain the ascending order, the index file shown in FIG. 18 is not searched in the invalid part.

【００７４】このように第３実施例の情報検索装置で
は、インデックスファイルを削除する時に、不要なイン
デックスファイルを全て削除するのでは無く、インデッ
クスファイルの最小限度の範囲を更新するだけで済ませ
ている。従って、削除の手間が削減され、削除を高速で
行なうことができる。As described above, in the information retrieval apparatus of the third embodiment, when deleting an index file, all unnecessary index files are not deleted, but only the minimum range of the index file is updated. . Therefore, the trouble of deletion is reduced and the deletion can be performed at high speed.

【００７５】[0075]

【発明の効果】以上の実施例の説明から明らかなよう
に、本発明の情報検索装置は、データベースから、指定
されたキーワードを高速に検索することができ、また、
データベースへのデータの追加、更新、削除の処理を高
速で行なうことができる。As is apparent from the above description of the embodiments, the information retrieval apparatus of the present invention can retrieve a designated keyword from a database at high speed.
Data can be added, updated, and deleted from the database at high speed.

【００７６】このデータベースへのデータの追加、更
新、削除の際には、既存のインデックスファイルに拡張
領域を設けたり、最後の出現度数パターンを再録するこ
とにより、インデックスファイルの修正を最小限に保ち
ながら、データの整理（ごみ削除）を行なうことができ
る。When adding, updating, or deleting data in this database, the expansion of the existing index file or re-recording of the last appearance frequency pattern minimizes the modification of the index file. You can organize data (remove garbage) while keeping it.

[Brief description of drawings]

【図１】本発明の実施例における情報検索装置の構成を
示すブロック図、FIG. 1 is a block diagram showing the configuration of an information search device according to an embodiment of the present invention,

【図２】第１実施例におけるインデックスファイル生成
の処理フロー、FIG. 2 is a processing flow of index file generation in the first embodiment,

【図３】第１実施例におけるテキストレコードに対する
文字パターン度数表、FIG. 3 is a character pattern frequency table for a text record in the first embodiment,

【図４】第１実施例における出現度数リストの説明図、FIG. 4 is an explanatory diagram of an appearance frequency list according to the first embodiment,

【図５】第１実施例における文字パターンと出現度数パ
ターン、FIG. 5 is a character pattern and appearance frequency pattern according to the first embodiment,

【図６】第１実施例におけるインデックスファイル、FIG. 6 is an index file according to the first embodiment,

【図７】第１実施例におけるインデックスファイル検索
の処理フロー、FIG. 7 is a processing flow of index file search in the first embodiment;

【図８】第１実施例における検索時の文字パターンと出
現度数パターンリスト、FIG. 8 is a character pattern and appearance frequency pattern list at the time of search in the first embodiment,

【図９】第１実施例における連続性照合の説明図、FIG. 9 is an explanatory diagram of continuity matching in the first embodiment,

【図１０】第１実施例における検索照合順の説明図、FIG. 10 is an explanatory diagram of a search collation order in the first embodiment,

【図１１】第２実施例におけるインデックスファイル更
新の処理フロー、FIG. 11 is a processing flow of updating an index file according to the second embodiment;

【図１２】第２実施例における更新テキストレコードに
対する文字パターン出現度数表、FIG. 12 is a character pattern appearance frequency table for updated text records in the second embodiment,

【図１３】第２実施例における文字パターンと出現度数
パターン、FIG. 13 is a character pattern and appearance frequency pattern according to the second embodiment,

【図１４】第２実施例におけるインデックスファイル、FIG. 14 is an index file according to the second embodiment,

【図１５】第３実施例におけるインデックスファイル削
除の処理フロー、FIG. 15 is a processing flow of deleting an index file according to the third embodiment;

【図１６】第３実施例における削除テキストレコードに
対する文字パターン出現度数表、FIG. 16 is a character pattern appearance frequency table for a deleted text record according to the third embodiment.

【図１７】第３実施例における文字パターンと出現度数
パターン、FIG. 17 is a character pattern and appearance frequency pattern according to the third embodiment,

【図１８】第３実施例におけるインデックスファイル、FIG. 18 is an index file according to the third embodiment,

【図１９】従来の情報検索装置の処理の流れを示す図、FIG. 19 is a diagram showing a processing flow of a conventional information retrieval device,

【図２０】従来の情報検索装置のデータテーブルであ
る。FIG. 20 is a data table of a conventional information search device.

[Explanation of symbols]

31 テキストデータベース 32 テキスト生成部 33 連続文字情報抽出部 34 インデックスファイル記憶部 35 インデックス生成部 36 拡張領域生成部 37 更新データ生成部 38 キーワード読取部 39 インデックス検索部 310 文字度数存在検索部 311 検索結果出力部 312 修正テキスト読取部 31 Text database 32 Text generation unit 33 Continuous character information extraction unit 34 Index file storage unit 35 Index generation unit 36 Extended area generation unit 37 Update data generation unit 38 Keyword reading unit 39 Index search unit 310 Character frequency existence search unit 311 Search result output Part 312 Modified text reading part

───────────────────────────────────────────────────── フロントページの続き (72)発明者菊池忠一大阪府門真市大字門真1006番地松下電器産業株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Tadakazu Kikuchi 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. An information retrieval apparatus for retrieving data including a keyword, character information extraction means for calculating an appearance frequency of characters constituting search target data, a set of adjacent characters of search target data, A search file generating unit that creates a search file by associating an appearance frequency pattern in which each appearance frequency is combined with each other, and an appearance frequency pattern corresponding to a set of adjacent characters in a keyword is searched and obtained from the search file. An information retrieving apparatus, comprising: retrieving means for retrieving retrieval target data including a keyword by collating an appearance frequency pattern.

2. The search file generation means arranges the appearance frequency patterns on the search file in ascending or descending order, and the search means has the appearance frequency in a range satisfying the ascending or descending order on the search file. The information search apparatus according to claim 1, wherein a pattern is used for the matching.

3. The search file generating means adds the same appearance frequency pattern as the preceding appearance frequency pattern to the end of the array of the appearance frequency pattern when creating the search file, and the search means The information retrieval apparatus according to claim 2, wherein the collation is stopped when the order of the collation has reached the added appearance frequency pattern.

4. A blank extension area is added to the end of the array of the appearance frequency pattern in the search file, and when the appearance frequency pattern needs to be added due to the change of the search target data, the extension area is added to the extension area. 4. The information search device according to claim 1, further comprising search file correction means for adding the appearance frequency pattern.

5. In order to specify the range of the effective appearance frequency pattern used for the keyword search from the appearance frequency patterns described in the search file, the effective appearance frequency pattern exists at the end of the array before it. 4. The search file correction means for adding the same appearance frequency pattern as that of the above-mentioned appearance frequency pattern, and adding a blank extension area in front of the array of only the invalid appearance frequency pattern is provided. Information retrieval device.