JPH0869476A

JPH0869476A - Retrieval system

Info

Publication number: JPH0869476A
Application number: JP6204793A
Authority: JP
Inventors: Hideaki Kasai; 秀昭葛西
Original assignee: HOKKAIDO NIPPON DENKI SOFTWARE KK; NEC Software Hokkaido Ltd
Current assignee: HOKKAIDO NIPPON DENKI SOFTWARE KK; NEC Solution Innovators Ltd
Priority date: 1994-08-30
Filing date: 1994-08-30
Publication date: 1996-03-12
Anticipated expiration: 2012-12-24
Also published as: JP2693914B2

Abstract

PURPOSE: To specify the object data including an optional character string by the logical operation of an index bit map only and to improve the retrieval processing efficiency for free words of a summary, etc. CONSTITUTION: A retrieval system includes a retrieval data file 1 which stores the retrieval object data consisting of many characters, a bit map file 3 which stores an index bit map 4 consisting of the bit strings that can identify the retrieval object data using the characters as the index keys and also identify the character positions, a bit map production processing part 2 which sets the corresponding bit of the map 4 at '1' when the retrieval object data are stored in the file 1, and a bit map retrieval processing part 5 which retrieves the map 4 by each character of the retrieval character string received from an input device 6 to input the corresponding bit string, shifts left the bit strings following the 2nd character by a bit number equivalent to a position difference to calculate an AND of each bit string, and specifies a data number including a bit having its AND equal to '1' to output this number to a display device 7.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は検索システムに関し、特
に多数の文字から成る検索対象データが大量に格納され
ているデータベースから、任意の文字列を含む検索対象
データを抽出する検索システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a search system, and more particularly to a search system for extracting search target data containing an arbitrary character string from a database in which a large amount of search target data consisting of many characters are stored.

【０００２】[0002]

【従来の技術】抄録文や解説文などの多数の文字から成
る検索対象データが蓄積されている検索データファイル
から、指定された任意の検索文字列を含む検索対象デー
タを選択し抽出するフリーワード検索の最も基本的な方
法は、検索データファイルの検索対象データを順次入力
し、検索文字列とすべての検索対象データとを照合する
ことにより、該当する検索文字列が含まれている検索対
象データを特定する方法である。しかしながら、この方
法では検索に時間がかかり過ぎるため、検索文字列と照
合する検索対象データの数を予備的な検索処理により限
定する方式が、特開昭６２―２１１７２８号公報に提案
されている。2. Description of the Related Art A free word for selecting and extracting search target data containing a specified arbitrary search character string from a search data file in which search target data consisting of a large number of characters such as abstract sentences and commentary sentences are accumulated. The most basic method of searching is to enter the search target data in the search data file in sequence and match the search character string with all the search target data to find the search target data containing the corresponding search character string. Is a method of identifying. However, since this method takes too much time for searching, a method of limiting the number of search target data to be matched with the search character string by preliminary search processing is proposed in Japanese Patent Laid-Open No. 62-217728.

【０００３】この方式は、文字ごとにその文字が使用さ
れている検索対象データのデータ番号を特定するための
ビット列を有する文字索引を有し、検索条件として指定
された検索文字列内の各文字でこの文字索引からビット
列を求め、ビット列間の論理積演算を行い、その結果か
ら検索文字列の全文字が含まれている検索対象データを
予備検索するものである。以上の予備検索で選択された
検索対象データは、検索文字列のすべての文字を含む
が、その文字の配列までは不明であり検索文字列の順序
に並んでいるとは限らないので、検索対象データの全文
を読み出して検索文字列との照合を行い、目的とする検
索対象データを決定する必要がある。従って、検索対象
データ数を限定することはできるが、なお検索対象デー
タと検索文字列の照合は不可欠である。This system has a character index having a bit string for specifying the data number of the search target data in which the character is used for each character, and each character in the search character string specified as the search condition Then, the bit string is obtained from this character index, the logical product operation is performed between the bit strings, and the search target data including all the characters of the search character string is preliminarily searched from the result. The search target data selected by the above preliminary search includes all characters in the search character string, but the character array is unknown and may not necessarily be arranged in the order of the search character string. It is necessary to read the entire text of the data and collate it with the search character string to determine the target search target data. Therefore, although the number of search target data can be limited, it is still essential to match the search target data with the search character string.

【０００４】上述の方式を更に改善し、検索文字列と照
合する検索対象データ数を確認目的のみに近い数まで限
定できる提案が、特開平３―１２５２６３号公報に記載
されている。この提案は、上述の文字索引に加え、文字
索引と同様な構成の２文字の連語を索引キーとする連語
索引を併用するものである。すなわち、検索時に検索文
字列を２語ずつの連語に分解し、この連語をキーとして
連語索引からビット列を求め、ビット列の論理演算によ
り検索文字列を含む検索対象データを更に限定しようと
するものである。A proposal that further improves the above-mentioned method and limits the number of search target data to be collated with a search character string to a number close to only a confirmation purpose is disclosed in Japanese Patent Laid-Open No. 3-125263. This proposal uses, in addition to the above-mentioned character index, a compound word index having a two-character compound word having the same structure as the character index as an index key. That is, a search character string is decomposed into two-word collocations at the time of search, a bit string is obtained from the collocation index using this collocation as a key, and the search target data including the search character string is further limited by logical operation of the bit string. is there.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述し
た連語索引を併用する従来の検索システムは、単文字の
検索を行うために連語索引のほかに文字索引も必要であ
り、索引の規模が大きくなるばかりでなく、最終的には
検索対象データと検索文字列との照合が必要であるとい
う問題点が残っている。However, the conventional search system that also uses the above-mentioned collocation index requires a character index in addition to the collocation index in order to search for a single character, which increases the scale of the index. Not only that, but finally, there remains a problem that it is necessary to collate the search target data with the search character string.

【０００６】まず、連語索引の索引項目数は、すべての
文字の連語を考えると文字数の自乗となるので、使用可
能な文字数が多いほど連語索引の規模は文字索引（索引
項目数は文字数と同じ）の規模に比べて大きくなり、索
引検索にも時間がかかることになる。更に、検索文字列
を２語ずつの連語に分解してすべての連語の存在を確認
しても、その位置や順序を規定しないため最終的には検
索対象データと検索文字列との照合による確認が必要で
ある。例えば、「情報の検索」を検索文字列とすると、
これを分解した「情報」「報の」「の検」「検索」の４
連語のすべての存在を確認しても、文字列「情報の検
索」が存在することを保証できないからである。すなわ
ち、一つの検索対象データに「情報の蓄積」と「索引の
検索」という二つの文字列が含まれていた場合、この検
索対象データには上記の４連語が存在し、「情報の検
索」という文字列が含まれないにもかかわらず抽出され
るので確認が必要となる。First, since the number of index items in a compound index is the square of the number of characters in consideration of compound words of all characters, the larger the number of characters that can be used, the larger the size of the compound index (the number of index items is the same as the number of characters). ) Will be larger than the scale of, and the index search will also take time. Furthermore, even if the search string is decomposed into two-word collocations and the existence of all collocations is confirmed, the position and order of the collocations are not specified, so the final check is made by matching the search target data with the search character string. is necessary. For example, if "search for information" is the search string,
4 of "information""report""inspection""search" which decomposed this
This is because even if all the collocations are confirmed, it cannot be guaranteed that the character string “search for information” exists. That is, when one search target data includes two character strings “accumulation of information” and “search of index”, the search target data has the above-mentioned quadruple word and “search for information”. Since it is extracted even if the character string is not included, confirmation is required.

【０００７】本発明の目的は、上述した問題点を解消
し、検索対象データと検索文字列との照合を必要とせ
ず、索引ビットマップの検索と論理演算とのみで検索条
件に合致した検索対象データを確実に特定できる検索シ
ステムを提供することにある。The object of the present invention is to solve the above-mentioned problems, to eliminate the need for collating the retrieval target data with the retrieval character string, and to retrieve the retrieval object matching the retrieval condition only by the retrieval of the index bitmap and the logical operation. It is to provide a search system that can reliably identify data.

【０００８】[0008]

【課題を解決するための手段】請求項１の検索システム
は、多数の文字から構成された検索対象データが複数格
納されている検索データファイルから、指定した任意の
検索文字列を含む検索対象データを抽出するフリーワー
ド方式の検索システムにおいて、（Ａ）文字を索引キー
としてその文字が使用されている検索対象データ及び当
該検索対象データ内の文字位置が識別できるよう構成さ
れたビット列が全文字数分配置されている索引ビットマ
ップと、（Ｂ）検索対象データを検索データファイルに
格納するときに当該検索対象データに使用されている全
文字を前記索引ビットマップの該当ビットに論理“１”
として登録するビットマップ作成処理部と、（Ｃ）検索
文字列に指定された各文字をキーとして順次前記索引ビ
ットマップを検索して該当するビット列を入力し、２文
字目以降の文字に対応するビット列をそれぞれ検索文字
列の先頭文字との位置差に等しいビット数ずつ各検索対
象データの範囲内で左シフトを行った後に論理積を求
め、論理“１”のビットを含む検索対象データを検索条
件に合致した検索対象データとして特定するビットマッ
プ検索処理部とを備えて構成されている。According to a first aspect of the present invention, there is provided a search system in which a search data file containing a plurality of search target data composed of a large number of characters contains a specified arbitrary search character string. In the search system of the free word method for extracting, the bit string configured to identify the search target data in which the character (A) is used as an index key and the character position in the search target data corresponds to the total number of characters. The arranged index bitmap and (B) all characters used for the search target data when storing the search target data in the search data file are logically “1” in the corresponding bit of the index bitmap.
(C) The index bitmap is sequentially searched with each character specified in the search character string as a key, and the corresponding bit string is input to correspond to the second and subsequent characters. Each bit string is left-shifted within the range of each search target data by the number of bits equal to the position difference from the first character of the search character string, and the logical product is calculated and the search target data including the bit of logical "1" is searched. And a bitmap search processing unit that specifies search target data that matches the conditions.

【０００９】請求項２の検索システムは、請求項１記載
の検索システムにおいて、前記索引ビットマップが、検
索対象データ群を複数のブロックに分割したブロックご
とに作成され、文字ごとに各ブロック内に当該文字が使
用されているか否かを識別するためのブロック数と等し
いビット数のビット列を配置したブロック索引ビットマ
ップと階層構成を形成し、前記ビットマップ作成処理部
は検索対象データを検索データファイルに格納するとき
に前記ブロック索引ビットマップにも必要情報を登録す
る機能を有し、前記ビットマップ検索処理部は検索時に
前記ブロック索引ビットマップを検索して該当するビッ
ト列の論理積から処理対象のブロックを決定して当該ブ
ロックの索引ビットマップに対して処理を行うことを特
徴としている。A search system according to a second aspect is the search system according to the first aspect, wherein the index bitmap is created for each block obtained by dividing a search target data group into a plurality of blocks, and for each character, in each block. A hierarchical structure is formed with a block index bitmap in which a bit string having a number of bits equal to the number of blocks for identifying whether or not the character is used is formed, and the bitmap creation processing unit sets the search target data to the search data file. Has a function of registering necessary information also in the block index bitmap when storing it in, and the bitmap search processing unit searches the block index bitmap at the time of search and determines a processing target from a logical product of corresponding bit strings. It is characterized in that a block is determined and the index bitmap of the block is processed.

【００１０】[0010]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１１】図１は本発明の一実施例の構成を示すブロ
ック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【００１２】本実施例の検索システムは、図１に示すよ
うに、多数の文字から成る検索対象データが複数格納さ
れている検索データファイル１と、検索対象データに使
用される文字を索引キーとしてその文字が使用されてい
る検索対象データ及び文字位置が識別できるように構成
されたビット列から成る索引ビットマップ４が格納され
たビットマップファイル３と、検索対象データを検索デ
ータファイル１に格納するときに索引ビットマップ４の
該当ビットを“１”とするビットマップ作成処理部２
と、入力装置６から指定された検索文字列の各文字で索
引ビットマップ４を検索して該当するビット列を入力
し、２文字目以降のビット列を位置差に相当するビット
数左シフトした後に各ビット列の論理積を求め、出力が
“１”のビットを含む検索対象データのデータ番号を検
索対象データと特定し表示装置７に出力するビットマッ
プ検索処理部５とを備えて構成されている。As shown in FIG. 1, the search system of this embodiment uses a search data file 1 in which a plurality of search target data consisting of a large number of characters are stored, and a character used for the search target data as an index key. When storing the search target data in which the character is used and the bitmap file 3 storing the index bitmap 4 composed of a bit string configured to identify the character position and the search target data in the search data file 1. Bitmap creation processing unit 2 that sets the corresponding bit of index bitmap 4 to "1"
Then, the index bit map 4 is searched with each character of the search character string specified from the input device 6, the corresponding bit string is input, and the bit strings subsequent to the second character are shifted to the left by the number of bits corresponding to the positional difference. The bit map search processing unit 5 is configured to obtain a logical product of bit strings, specify a data number of search target data including a bit whose output is “1” as search target data, and output the data number to the display device 7.

【００１３】本実施例の検索システムは、抄録文や解説
文などの文章形式の検索対象データから、指定した任意
の検索文字列を含む検索対象データを選択し抽出するフ
リーワード方式の検索システムであり、検索データファ
イル１には、検索対象データ以外にも付随するその他の
情報が格納されているが、図１には、各データ単位を識
別するためのデータ番号と、フリーワード検索の対象と
なる検索対象データのみを示した。更に、説明を簡単に
して理解しやすくするために、使用文字を英大文字で示
し、文頭の数文字のみを示してある。The search system of the present embodiment is a free word system search system that selects and extracts search target data containing a specified arbitrary search character string from search target data in the text format such as abstract sentences and commentary sentences. In addition, although the search data file 1 stores other information other than the search target data, FIG. 1 shows the data number for identifying each data unit and the free word search target. Only the data to be searched are shown. Furthermore, in order to simplify the explanation and make it easier to understand, the used characters are shown in uppercase letters, and only a few letters at the beginning of the sentence are shown.

【００１４】索引ビットマップ４は、検索対象データに
使用される文字を索引キーとして、その文字が使用され
ている検索対象データのデータ番号および各検索対象デ
ータ内の文字位置が識別できるように、データ番号順に
１データに許容される最大文字数分のビットを配列した
ビット列により構成されている。図１の例では、文字
「Ａ」はデータ番号＃１の１文字目とデータ番号＃２の
１文字目に、文字「Ｂ」はデータ番号＃１の２文字目と
データ番号＃２の２文字目に、文字「Ｃ」はデータ番号
＃１の３文字目とデータ番号＃２の３文字目とデータ番
号＃３の１文字目に、………文字「Ｇ」はデータ番号＃
３の５文字目に使用されていることを示している。索引
ビットマップ４は、検索対象データを検索データファイ
ル１に格納するときにビットマップ作成処理部２により
作成され、索引ビットマップ４の該当データ番号の該当
文字位置のビットに“１”が登録される。データを削除
する際には、索引ビットマップ４の該当データ番号の全
ビットが“０”に初期化される。なお、データ番号を追
加増設する場合には、索引ビットマップ４の各ビット列
に必要数（増設データ数×最大文字数）のビットが追加
される。The index bitmap 4 uses the character used for the search target data as an index key so that the data number of the search target data in which the character is used and the character position in each search target data can be identified. It is composed of a bit string in which bits for the maximum number of characters allowed in one data are arranged in the order of data numbers. In the example of FIG. 1, the character “A” is the first character of the data number # 1 and the first character of the data number # 2, and the character “B” is the second character of the data number # 1 and the second character of the data number # 2. The character “C” is the third character of the data number # 1, the third character of the data number # 2 and the first character of the data number # 3, and the character “G” is the data number #.
It indicates that it is used in the 5th character of 3. The index bitmap 4 is created by the bitmap creation processing unit 2 when the search target data is stored in the search data file 1, and “1” is registered in the bit of the corresponding character position of the corresponding data number of the index bitmap 4. It When deleting data, all bits of the corresponding data number in the index bitmap 4 are initialized to "0". In addition, when additional data numbers are added, the required number of bits (the number of additional data × the maximum number of characters) is added to each bit string of the index bitmap 4.

【００１５】ビットマップ検索処理部５は、入力装置６
から指定された検索文字列により、索引ビットマップ４
を検索して論理演算処理を行い、検索文字列が含まれる
検索対象データのデータ番号を特定して表示装置７に出
力する。更に要求があれば、そのデータ番号により検索
データファイル１から検索対象データの全文や他の情報
を読み出して表示することができる。以下、入力装置６
から検索文字列として「ＣＤ」が入力された場合を例と
して、ビットマップ検索処理部７の処理を具体的に説明
する。The bit map search processing unit 5 includes an input device 6
Index bitmap 4 by the search character string specified from
Is searched for and the logical operation processing is performed, the data number of the search target data including the search character string is specified and output to the display device 7. Further, if there is a request, it is possible to read and display the full text of the search target data or other information from the search data file 1 by the data number. Hereinafter, the input device 6
The process of the bitmap search processing unit 7 will be specifically described by taking as an example the case where "CD" is input as the search character string from.

【００１６】検索文字列「ＣＤ」が入力され検索開始が
指示されると、ビットマップ検索処理部５は、最初に、
索引ビットマップ４から索引文字「Ｃ」のビット列をす
べて入力する。次に、索引文字「Ｄ」のすべてのビット
列の入力を行い、図２に示すように、データ番号のビッ
ト列ごとに１ビット分ずつ左にシフトする。最後に、ビ
ットマップ検索処理部７は、入力済みの索引文字「Ｃ」
のビット列と、索引文字「Ｄ」のビット列のシフト結果
との論理積を求める。この論理演算の経過を図３に示
す。図３の演算結果のビット列から、データ番号ごとの
ビット列の中に論理“１”のビットを含むデータ番号を
特定し、そのデータ番号を表示装置７に表示する。図３
の場合、データ番号＃１と＃３のビット列には“１”の
ビットが存在するので、データ番号＃１と＃３の検索対
象データには文字列「ＣＤ」が含まれているが、データ
番号＃２のビット列には“１”が存在しないので、デー
タ番号＃２の検索対象データに文字列「ＣＤ」が含まれ
ないとを示している。When the search character string "CD" is input and search start is instructed, the bitmap search processing unit 5 firstly
All the bit strings of the index character “C” are input from the index bitmap 4. Next, all the bit strings of the index character “D” are input, and as shown in FIG. 2, each bit string of the data number is shifted to the left by one bit. Finally, the bitmap search processing unit 7 determines that the input index character “C” has been input.
And the shift result of the bit string of the index character “D” are obtained. The progress of this logical operation is shown in FIG. From the bit string of the operation result of FIG. 3, a data number including a bit of logical “1” is specified in the bit string for each data number, and the data number is displayed on the display device 7. FIG.
In this case, since the bit string of data numbers # 1 and # 3 has the bit “1”, the search target data of data numbers # 1 and # 3 includes the character string “CD”. Since "1" does not exist in the bit string of number # 2, it indicates that the character string "CD" is not included in the search target data of data number # 2.

【００１７】検索文字列が３文字以上の場合には、３文
字目のビット列は２ビットの左シフトを行い、４文字目
のビット列は３ビットの左シフトを行うというように、
検索文字列中の文字位置と対応したビット数（文字位置
−１）の左シフトを行い、各ビット列の論理積演算を行
うことにより、同様にして検索文字列を含むデータ番号
を特定することができる。この方法によれば、文字を索
引キーとする索引ビットマップの検索のみから、該当文
字の存在のみならずその配列順序までを検出することが
できるので、検索文字列と検索対象データとの照合を行
わずに確実に目的の検索対象データを特定できる。When the search character string has three or more characters, the bit string of the third character is left-shifted by 2 bits, and the bit string of the fourth character is left-shifted by 3 bits.
By shifting the number of bits (character position -1) corresponding to the character position in the search character string to the left and performing the logical product operation of each bit string, it is possible to specify the data number including the search character string in the same manner. it can. According to this method, it is possible to detect not only the existence of the corresponding character but also the order of arrangement of the character, only by searching the index bitmap with the character as the index key. The target search target data can be surely specified without performing the search.

【００１８】上述の実施例の説明では、索引文字に対応
するすべてのビット列を読み込み、論理演算を行うよう
に説明したが、検索対象データの数が多くなるに従い、
すべてのビット列を入力した場合、ビットシフト及び論
理演算の処理も膨大となるので、索引ビットマップをブ
ロック単位に分割して階層化することにより、検索文字
列が含まれる可能性のないブロックを、前処理により処
理対象から除外することが有効と考えられる。すなわ
ち、索引ビットマップをデータ番号＃１からデータ番号
＃１０までのブロック＃１と、データ番号＃１１からデ
ータ番号＃２０までのブロック＃２と、………というよ
うにｎ個のブロックに分割し、そのブロック内に当該文
字が使用されているか否かを識別するためのｎビットの
ビット列から成るブロック索引ビットマップとの階層構
成とする。検索の場合には、まずブロック索引ビットマ
ップを検索してビット列を入力し、単純に論理積を求め
て論理値“１”のビットに対応するブロックのみを以後
の処理対象とする。以上の前処理により、検索文字列の
すべての文字を含まないブロックを処理対象から除外す
ることができる。上記のブロック化を行う場合に、同一
分野の情報を集めてブロックを構成するようにすると効
果的である。なお、検索データファイルに検索対象デー
タを登録する際には、索引ビットマップと共にブロック
索引ビットマップにも登録するが、同一ブロックの他の
データ番号で既に使用されている場合には変更しなくて
よい。データを削除する際には、索引ビットマップの該
当データ番号の全ビットを“０”に初期化するが、ブロ
ック索引ビットマップはブロック内の他のデータ番号を
チェックして処理することになる。In the above description of the embodiments, all bit strings corresponding to index characters are read and logical operations are performed, but as the number of search target data increases,
When all bit strings are input, the amount of bit shift and logical operation processing becomes enormous, so by dividing the index bitmap into block units and layering them, blocks that may not contain the search character string are It is considered effective to exclude it from the processing target by preprocessing. That is, the index bitmap is divided into n blocks such as block # 1 from data number # 1 to data number # 10, block # 2 from data number # 11 to data number # 20, and so on. However, a hierarchical structure is formed with a block index bitmap formed of an n-bit bit string for identifying whether or not the character is used in the block. In the case of the search, first, the block index bitmap is searched, the bit string is input, the logical product is simply obtained, and only the block corresponding to the bit of the logical value “1” is set as the subsequent processing target. With the above pre-processing, blocks that do not include all the characters in the search character string can be excluded from the processing target. When the above-mentioned block formation is performed, it is effective to collect information of the same field to form a block. Note that when registering the search target data in the search data file, it is also registered in the block index bitmap along with the index bitmap, but if it is already used by another data number of the same block, do not change it. Good. When deleting data, all bits of the corresponding data number of the index bitmap are initialized to "0", but the block index bitmap will process by checking other data numbers in the block.

【００１９】[0019]

【発明の効果】以上説明したように、本発明の検索シス
テムは、文字が使用されている検索対象データのデータ
番号と文字位置とが識別できる索引ビットマップを設け
ることにより、索引ビットマップから検索したビット列
のシフトと論理演算とのみで、検索条件に該当する検索
対象データを特定することができ、検索対象データと検
索文字列との直接照合を全く必要としないため、検索シ
ステムの処理効率を大きく改善できる効果がある。な
お、検索対象データの数が多い場合には、索引ビットマ
ップをブロック分割して階層化することにより、論理演
算対象のビット列の長大化を防ぐことができる。As described above, the search system of the present invention is provided with an index bit map capable of identifying the data number and the character position of the search target data in which the character is used to search from the index bit map. The search target data that meets the search condition can be specified only by shifting the bit string and the logical operation, and direct matching between the search target data and the search character string is not required at all, thus improving the processing efficiency of the search system. There is an effect that can be greatly improved. When the number of search target data is large, the index bit map is divided into blocks and hierarchized to prevent the length of the bit string of the logical operation target from increasing.

[Brief description of drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】ビットマップ検索処理部におけるビット列シフ
ト動作の説明図である。FIG. 2 is an explanatory diagram of a bit string shift operation in a bitmap search processing unit.

【図３】ビットマップ検索処理部における論理積演算の
概念を示す説明図である。FIG. 3 is an explanatory diagram showing a concept of a logical product operation in a bitmap search processing unit.

[Explanation of symbols]

１検索データファイル２ビットマップ作成処理部３ビットマップファイル４索引ビットマップ５ビットマップ検索処理部６入力装置７表示装置 1 Search Data File 2 Bitmap Creation Processing Section 3 Bitmap File 4 Index Bitmap 5 Bitmap Search Processing Section 6 Input Device 7 Display Device

Claims

[Claims]

1. A free word system search system for extracting search target data containing a specified arbitrary search character string from a search data file in which a plurality of search target data composed of a large number of characters are stored, (A) An index bitmap in which a search target data in which the character is used as an index key and the character position in the search target data are identified by a bit string arranged for all the characters, and (B) A bitmap creation processing unit that registers all the characters used in the search target data when storing the search target data in the search data file as a logical "1" in the corresponding bit of the index bitmap;
(C) The index bitmap is sequentially searched by using each character specified in the search character string as a key, the corresponding bit string is input, and the bit strings corresponding to the second and subsequent characters are respectively set as the first character of the search character string. After performing a left shift within the range of each search target data by the number of bits equal to the position difference, the logical product is obtained, and the search target data including the bit of logical “1” is specified as the search target data that matches the search condition. A search system comprising a bitmap search processing unit.

2. The block for creating the index bitmap for each block obtained by dividing a search target data group into a plurality of blocks, and identifying for each character whether or not the character is used in each block. Form a hierarchical structure with a block index bitmap in which bit strings having the same number of bits are arranged, and the bitmap creation processing unit stores necessary information in the block index bitmap when storing search target data in a search data file. The bitmap search processing unit has a function of registering, searches the block index bitmap at the time of search, determines a block to be processed from a logical product of corresponding bit strings, and processes the index bitmap of the block. The search system according to claim 1, wherein