JPH0877188A

JPH0877188A - Full text search method

Info

Publication number: JPH0877188A
Application number: JP6211267A
Authority: JP
Inventors: Yoshihiro Shintani; 義弘新谷
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1994-09-05
Filing date: 1994-09-05
Publication date: 1996-03-22

Abstract

PURPOSE: To reduce the quantity of an index, and to retrieve a character as the same character or as a different character as well. CONSTITUTION: The index is generated by considering each pair of the capital letter and the lower case letter of an English letter to be the same character. Character information to distinguish the pair is generated as a part of a leaf in addition to the position information of a character string in a document. In a step S1, a retrieval character string is inputted. In the step S2, the retrieval character string is converted into the character string being used in the index. In the step S3, retrieval is executed by comparing the converted character string and the index. In the step S4, it is discriminated whether the leaf 2 is reached or not, and in the case that the leaf is not reached, in the step S10, 'no corresponding character string' is returned. In the step S7, the character information in the reached leaf is fetched. In the step S8, the character information is compared with the retrieval character string, and if they match with each other, in the step S9, place information is returned.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書中の文字列を検索
するフルテキストサーチ方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a full text search method for searching a character string in a document.

【０００２】[0002]

【従来の技術】ＣＤ−ＲＯＭ等の普及で大量の文書が電
子化されている。この電子化された文書には、代表的な
ものに、電子図書館、電子出版、コンピュータ機器に設
置されているオンラインマニュアル等がある。それに伴
い、これらの文書を効率良く適格に検索し、望む情報を
取り出す技術も必要になってきた。文書の本文全体を直
接検索して照合を行う検索方法をフルテキストサーチ方
法という。文書全体の検索には最初からシーケンシャル
に直接照合する方法と、文書中の文字の位置情報を表す
リーフとリーフを検索するためのインデックスなどで構
成されるデータベースをおきあらかじめ前処理により作
成しておき、このデータベースを検索する方法の大きく
２種類ある。2. Description of the Related Art With the spread of CD-ROM and the like, a large amount of documents have been digitized. Typical electronic documents include electronic libraries, electronic publications, online manuals installed in computer devices, and the like. Along with this, a technology for efficiently and properly searching these documents and extracting desired information has been required. The full text search method is a search method in which the entire body of a document is directly searched for matching. In order to search the entire document, a direct collation method is used from the beginning, and a database consisting of a leaf that indicates the positional information of the characters in the document and an index for searching the leaf is created in advance and pre-processed. There are two major methods of searching this database.

【０００３】フルテキストサーチにおいては、英字の大
文字と小文字、全角と半角、数字の全角と半角、数字の
アラビア数字と漢数字、カタカナとひらがな、カタカナ
の全角と半角をいかに扱うかが問題となる。同一の文字
とみなして検索する場合は、適合率（発見されたものの
中で望んだ結果に的中してする件数の割合）も問題とな
る。一般ユーザと専門家の両方が使用するようなシステ
ムにおいては、一般ユーザは、大文字と小文字などの区
別はなく同一の文字として検索した方がいい場合があ
り、逆に、専門家だと区別して検索した方が有り難い。
例えば、図書館の文献検索の場合、一般ユーザは、多少
の文字が大文字であろうが、小文字であろうが、無頓着
に検索するであろうし、専門家の場合は、大文字と小文
字などの区別により検索件数を減らし適合率を上げたい
と考えるであろう。In full-text search, how to handle uppercase and lowercase letters of English characters, full-width and half-width, full-width and half-width of numbers, Arabic and Chinese numerals of numbers, katakana and hiragana, full-width and half-width of katakana becomes a problem. . When searching for the same character, the matching rate (the ratio of the number of found items that match the desired result) is also a problem. In a system that is used by both general users and experts, it may be better for general users to search as the same character without distinguishing between uppercase and lowercase letters, and conversely distinguishing as an expert. Thank you for searching.
For example, in the case of a literature search in a library, general users will carelessly search whether some letters are in uppercase letters or lowercase letters. You will want to reduce the number of searches and increase the precision.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
フルテキストサーチ方法においては、次のような課題が
あった。従来のフルテキストサーチ方法では、大文字と
小文字などを全く違う文字とみなすか同一の文字とみな
して固定的にインディクスの作成を行っており、検索時
に選択できるものはなく不便である。また、違う文字と
みなす場合には、同一の文字とみなす場合と比較してイ
ンディクスが多く必要となり記憶容量が増大し、検索時
間もかかるという問題もある。However, the conventional full-text search method has the following problems. In the conventional full-text search method, the upper and lower case letters are regarded as completely different characters or the same characters, and the index is fixedly created, and there is nothing that can be selected at the time of search, which is inconvenient. In addition, when considering different characters, there is a problem that more indexes are required, storage capacity is increased, and search time is longer than when considered as identical characters.

【０００５】[0005]

【課題を解決するための手段】本発明は、前記課題を解
決するために、文書中の文字列の位置情報を表すリーフ
と前記リーフを検索するためのインデックスとを用いて
前記文書中の文字列を検索するフルテキストサーチ方法
において、以下の処理を実行する。すなわち、英字の大
文字と小文字、英字の全角と半角、数字の全角と半角、
数字のアラビア数字と漢数字、カタカナとひらがな、及
びカタカナの全角と半角とのペアについて全てのペアま
たは一部のペアを同一の文字とみなして前記インデック
スを作成するインデックス作成処理と、前記ペアについ
て全てのペアまたは一部のペアを区別する文字情報をリ
ーフの一部として作成するリーフ作成処理とを実行する
ようにしている。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention uses a leaf representing position information of a character string in a document and an index for searching the leaf to identify characters in the document. In the full-text search method for searching a column, the following processing is executed. That is, uppercase and lowercase letters, full-width and half-width of letters, full-width and half-width of numbers,
Regarding the pairing of Arabic numerals and Chinese numerals, Katakana and Hiragana, and full-width and half-width Katakana, all or part of the pairs are regarded as the same character and the index is created, and the index creation process A leaf creating process for creating character information that distinguishes all pairs or some of the pairs as a part of a leaf is executed.

【０００６】そして、検索文字列が指定された時、前記
ペアについて同一文字とみなす場合における前記インデ
ックス作成処理と同様にして前記検索文字列を変換する
検索文字列変換処理と、前記検索文字列変換処理によっ
て変換された文字列に基づいて前記インデックスを検索
するインデックス検索処理と、前記インデックス検索処
理によって前記リーフに辿り着いた場合、前記ペアを区
別せずに検索するモードの時はそのリーフの位置情報に
基づいて前記文書中の文字列を検索し、前記ペアを区別
して検索するモードの時はそのリーフの文字情報と前記
検索文字列とを比較し一致するリーフの位置情報に基づ
いて前記文書中の文字列を検索するリーフ検索処理とを
実行するようにしている。Then, when a search character string is designated, a search character string conversion process for converting the search character string is performed in the same manner as the index creation process in the case where the pair is regarded as the same character, and the search character string conversion process. Index search processing for searching the index based on the character string converted by the processing, and when the leaf is reached by the index search processing, the position of the leaf in the search mode without distinguishing the pair When the character string in the document is searched based on the information, and the mode in which the pair is distinguished and the search is performed, the character information of the leaf is compared with the search character string, and the document is based on the position information of the matching leaf. The leaf search process for searching the inside character string is executed.

【０００７】[0007]

【作用】本発明によれば、以上のようにフルテキストサ
ーチ方法を構成したので、インデックス作成処理により
英字の大文字と小文字、英字の全角と半角、数字の全角
と半角、数字のアラビア数字と漢字、カタカナとひらが
な、及びカタカナの全角と半角との各ペアについてこれ
ら全てのペアまたは一部のペアを同一の文字とみなして
インデックスを作成する。リーフ作成処理により文書中
の文字列の位置情報に加えてその文字列の英字の大文字
と小文字等の各ペアについてこれら全てのペアまたは一
部のペアを区別する文字情報をリーフの一部として作成
する。このリーフの文字情報によって大文字と小文字等
を区別する。検索文字列変換処理により、検索文字列が
指定された時、ペアについて同一文字とみなす場合にお
けるインデックス作成処理と同様にして検索文字列を変
換する。インデックス検索処理により検索文字列変換処
理によって変換された文字列に基づいてインデックスを
検索する。このインデックス検索処理では、英字の大文
字と小文字等を同一文字として扱う。インデックス検索
処理により、リーフに辿り着いた場合、リーフ検索処理
により、ペアを区別せずに検索するモードの時はそのリ
ーフに格納された文書中の文字の位置情報に基づいて文
書中の文字列を検索し、ペアを区別して検索するモード
の時はそのリーフの文字情報と検索文字列とを比較して
一致するリーフに格納されている位置情報に基づいて前
記文書中の文字列を検索する。従って、前記課題を解決
できるのである。According to the present invention, since the full-text search method is configured as described above, the upper and lower case letters of alphabets, full-width and half-widths of numbers, full-width and half-widths of numbers, Arabic numerals and kanji of numbers are created by the indexing process. , Katakana and Hiragana, and full-width and half-width Katakana pairs, all of these pairs or some of them are regarded as the same character and an index is created. By the leaf creation process, in addition to the position information of the character string in the document, character information that distinguishes all or some of these pairs of uppercase and lowercase letters of the character string of the character string is created as part of the leaf. To do. Upper case and lower case are distinguished by the character information of this leaf. When the search character string is specified by the search character string conversion process, the search character string is converted in the same manner as the index creation process when the pair is regarded as the same character. The index search process searches the index based on the character string converted by the search character string conversion process. In this index search process, uppercase and lowercase letters of English letters are treated as the same character. When the leaf is reached by the index search process, the character string in the document is stored based on the position information of the characters in the document stored in the leaf in the mode in which the leaf search process searches the pairs without distinguishing them. In the mode of searching for a pair and distinguishing the pair, the character information of the leaf is compared with the search character string, and the character string in the document is searched based on the position information stored in the matching leaf. . Therefore, the above problem can be solved.

【０００８】[0008]

【実施例】図２（ａ）〜（ｄ）は、本発明の実施例を示
すフルテキストサーチ方法を実施するためのデータベー
スの構造を示す図である。図２（ａ）は全体のデータベ
ースの構造を示す図、同図（ｂ）は同図（ａ）中のリー
フの構造を示す図、同図（ｃ）は同図（ｂ）の文字情報
の構造を示す図、及び同図（ｄ）は同図（ｃ）の全角・
半角ビット２ｂ−１及び大文字・小文字ビット２ｂ−２
の内容を示す図である。図２（ａ）に示すように、検索
文字列を検索するためのこのデータベースでは、インデ
ックス１とリーフ２とによって構成されている。インデ
ックス１は、リーフ２を検索するための情報が格納され
たレコードであり、リーフ２は文書中の文字列の位置な
どを格納するレコードである。インディクス１は、木構
造やハッシュ構造などによって実現される。図２（ｂ）
に示すように、図２（ａ）中のリーフ２は、場所情報２
ａと文字情報２ｂとによって構成される。場所情報２ａ
には、文書中での文字列の位置を示す情報が格納されて
いる。文字情報２ｂには、検索する過程の文字列に対応
する文字情報が格納されている。2 (a) to 2 (d) are diagrams showing the structure of a database for carrying out the full-text search method according to an embodiment of the present invention. 2A is a diagram showing the structure of the entire database, FIG. 2B is a diagram showing the structure of the leaf in FIG. 2A, and FIG. 2C is a diagram showing the character information of FIG. 2B. The figure showing the structure and the figure (d) are full-width and
Half-width bit 2b-1 and upper / lower case bit 2b-2
It is a figure which shows the content of. As shown in FIG. 2A, this database for searching a search character string is composed of an index 1 and a leaf 2. The index 1 is a record that stores information for searching the leaf 2, and the leaf 2 is a record that stores the position of a character string in a document. The index 1 is realized by a tree structure, a hash structure, or the like. Figure 2 (b)
As shown in FIG. 2, the leaf 2 in FIG.
It is composed of a and character information 2b. Location information 2a
Stores information indicating the position of the character string in the document. The character information 2b stores character information corresponding to the character string in the search process.

【０００９】図２（ｃ）に示すように、図２（ｂ）中の
文字情報２ｂは、全角・半角ビット２ｂ−１と大文字・
小文字ビット２ｂ−２とによって構成される。全角・半
角ビット２ｂ−１は、文書中の文字が全角であるか半角
であるかを識別するための情報であり、大文字・小文字
ビット２ｂ−２は、検索文字が全角の時、大文字である
か小文字であるかを識別するための情報である。この大
文字・小文字ビット２ｂ−２は、同時にカタカナ・ひら
がなビット、アラビア数字・漢字数字ビットにも用いら
れる。図２（ｄ）に示すように、全角・半角ビット２ｂ
−１は英字、数字、またはカタカナの場合には、全角の
時にon、半角の時にoff が設定される。大文字・小文字
ビット２ｂ−２は、英字の場合には、大文字の時にon、
小文字の時off が設定され、数字の場合には、アラビア
数字の時on、小文字の時off が設定され、カタカナの場
合には、カタカナの時にon、ひらがなの時にoff が設定
される。第１の実施例図１は、本発明の第１の実施例を示すフルテキストサー
チ方法のフローチャートである。As shown in FIG. 2 (c), the character information 2b in FIG. 2 (b) includes the full-width / half-width bit 2b-1 and capital letters.
It is composed of lowercase bits 2b-2. The full-width / half-width bit 2b-1 is information for identifying whether the character in the document is full-width or half-width, and the uppercase / lowercase bit 2b-2 is an uppercase letter when the search character is fullwidth. This is information for identifying whether it is lowercase or lowercase. The uppercase / lowercase bits 2b-2 are also used for Katakana / Hiragana bits and Arabic / Kanji numeral bits at the same time. As shown in FIG. 2D, full-width / half-width bit 2b
-1 is set to on for full-width characters and off for half-width characters in the case of letters, numbers, or katakana. Upper and lower case bits 2b-2 are on for uppercase letters in case of English letters,
It is set to off for lowercase letters, on for Arabic numerals, off for lowercase letters, off for Katakana, off for Hiragana. First Embodiment FIG. 1 is a flowchart of a full text search method showing a first embodiment of the present invention.

【００１０】以下、これらの図を参照しつつフルテキス
トサーチ方法の説明をする。本第１の実施例のフルテキ
ストサーチ方法では、インデッス作成処理、リーフ作成
処理、検索文字列変換処理、インデックス検索処理、及
びリーフ検索処理を実行する。［インデックス作成処理］インデックス作成処理では、
文書中の文字列の位置を示すリーフ２を検索するために
インディクス１（例えば、木構造）を以下のように作成
する。インデックス１は、キーとなる文字列と、リーフ
２の位置情報または下位レベルのインディックスの位置
情報によって構成される。インディクス１では、英字の大文字・小文字英字の全角・半角数字の全角・半角数字のアラビア数字・漢字カタカナとひらがなカタカナの全角・半角の各ペアは、区別せず同じものとして扱うためにそれぞ
れ共通の文字がキーとなる文字列に与えられる。［リーフ作成処理］リーフ作成処理では、まず、リーフ
２の位置情報２ａを作成した後、上記ペアをそれぞれ区
別するために、文書中の文字列を図１（ｄ）に示す規則
にしたがって情報化して文字情報２ｂをリーフ２に格納
する。［検索文字列変換処理］図１中のステップＳ１におい
て、検索文字列を入力する。ステップＳ２において、検
索文字列をインデックス１中で使用されている文字列に
変換し、インデックス検索処理へ進む。これは、インデ
ックス１中では全角と半角などを統一しているためであ
る。［インデックス検索処理］インデックス検索処理では、
ステップＳ３において、変換した文字列とインデックス
１とを比較して検索してリーフ検索処理へ進む。［リーフ検索処理］リーフ検索処理では、ステップＳ４
において、リーフ２へ辿り着いたかどうかを判別し、リ
ーフへ辿り着いた場合は、ステップＳ５へ進み、リーフ
２へ辿り着かない場合は、ステップＳ１０へ進む。ステ
ップＳ５において、辿り着いたリーフ２の情報を取り出
し、ステップＳ６へ進む。ステップＳ６において、ユー
ザによってあらかじめ指定された検索モードが同一文字
とみなすかどうかを判別し、同一文字とみなさない場合
はステップＳ７へ進み、同一文字とみなす場合はステッ
プＳ９へ進む。ステップＳ７において、ステップＳ５で
取り出したリーフ２の情報から文字情報２ｂを取り出
し、ステップＳ８へ進む。ステップＳ８において、取り
出した文字情報２ｂと検索文字列とを比較し、一致すれ
ばステップＳ９へ進み、不一致であればステップＳ１０
へ進む。ステップＳ９において、リーフ２の位置情報２
ａから文書中の文字列を返す。ステップＳ１０におい
て、「該当文字列なし」を返してフルテキストのサーチ
を終了する。The full-text search method will be described below with reference to these figures. In the full-text search method of the first embodiment, index creation processing, leaf creation processing, search character string conversion processing, index search processing, and leaf search processing are executed. [Index creation process] In the index creation process,
The index 1 (for example, a tree structure) is created as follows to search the leaf 2 that indicates the position of the character string in the document. The index 1 is composed of a character string that serves as a key and the position information of the leaf 2 or the position information of the lower level index. In Index 1, upper-case and lower-case alphabetic characters, full-width and half-width alphabets, full-width and half-width numbers, Arabic numerals, kanji Katakana and hiragana-katakana full-width and half-width pairs are treated as the same without distinction. Is given to the key string. [Leaf Creation Processing] In the leaf creation processing, first, the position information 2a of the leaf 2 is created, and then the character strings in the document are converted into information in accordance with the rules shown in FIG. The character information 2b is stored in the leaf 2. [Search Character String Conversion Processing] In step S1 in FIG. 1, a search character string is input. In step S2, the search character string is converted into the character string used in the index 1, and the process proceeds to the index search processing. This is because full-width and half-width are unified in the index 1. [Index search process] In the index search process,
In step S3, the converted character string is compared with the index 1 for searching, and the process proceeds to the leaf searching process. [Leaf Search Process] In the leaf search process, step S4
In, it is determined whether or not the leaf 2 is reached. If the leaf 2 is reached, the process proceeds to step S5. If the leaf 2 is not reached, the process proceeds to step S10. In step S5, the information on the reached leaf 2 is taken out, and the process proceeds to step S6. In step S6, it is determined whether or not the search mode previously designated by the user considers the same characters. If not, the process proceeds to step S7, and if the same characters are considered, the process proceeds to step S9. In step S7, the character information 2b is extracted from the information of the leaf 2 extracted in step S5, and the process proceeds to step S8. In step S8, the extracted character information 2b is compared with the search character string. If they match, the process proceeds to step S9, and if they do not match, step S10.
Go to. In step S9, the position information 2 of the leaf 2
Returns the character string in the document from a. In step S10, "no corresponding character string" is returned and the full-text search ends.

【００１１】以上のように、本第１の実施例では、イン
ディクス１においては、英字の大文字と小文字などを同
一文字として扱うので、記憶容量を節約することができ
るという利点がある。文字列の検索時には、大文字と小
文字などを同一文字として扱うモードと異なる文字とし
て扱うモードによりどちらでも検索できるので便利であ
り、しかも検索文字列を一度のみしか検索しないので、
実行速度も速いという利点がある。第２の実施例図３は、本発明の第２の実施例を示すフルテキストサー
チ方法のフローチャートである。本第２の実施例では、
大文字と小文字などを一律に区別するのではなく、英字
の大文字と小文字、英字の全角と半角、数字の全角と半
角、数字のアラビア数字と漢字、カタカナとひらがな、
カタカナの全角と半角の各ペアについて区別するかしな
いかを指定できるようにしている。簡単のために大文字
と小文字などを異なる文字として扱うモードとして説明
する。本第２の実施例においても、図２のデータベース
を使用する。As described above, in the first embodiment, in the index 1, uppercase letters and lowercase letters of alphabetic characters are treated as the same character, so that there is an advantage that the storage capacity can be saved. When searching for a character string, it is convenient because you can search in either a mode that treats uppercase and lowercase letters as the same character or a mode that treats it as a different character, and since the search string is searched only once,
It has the advantage of high execution speed. Second Embodiment FIG. 3 is a flowchart of a full-text search method showing a second embodiment of the present invention. In the second embodiment,
Rather than distinguishing uppercase and lowercase letters uniformly, uppercase and lowercase letters, full-width and half-width of letters, full-width and half-width of numbers, Arabic numbers and kanji of numbers, katakana and hiragana,
It is possible to specify whether to distinguish between full-width and half-width pairs of katakana. For the sake of simplicity, it will be explained as a mode in which upper and lower case letters are treated as different characters. The database of FIG. 2 is also used in the second embodiment.

【００１２】以下、図２及び図３を参照しつつ本第２の
実施例のフルテキストサーチ方法について具体的な検索
文字列を挙げて説明する。第１の実施例と同様にインデ
ックス作成処理とリーフ作成処理を実行し、その後、検
索文字列変換処理とインデックス検索処理とリーフ検索
処理を順次実行する。［検索文字列変換処理］検索文字列変換処理では、図３
中のステップＳ２０において、検索文字列が“A b Ｃ1
２三あイ”を入力したとする。ステップＳ２１におい
て、検索文字マスク情報"1111111111111111"を作成す
る。検索文字マスク情報は、大文字と小文字などを区別
する文字種の場合は"11"、区別しない文字種の場合は"0
0"として文字種毎に作成される。本第２の実施例では、
全ての文字種を区別するとしている。区別するかしない
かは、文字種毎に利用者によって指定される。ステップ
Ｓ２２において、検索文字列“A b Ｃ1 ２三あイ”に対
応する文字情報を図１（ｄ）に示す規則に従って作成す
る。半角大文字「A 」、半角小文字「b 」、全角大文字
「Ｃ」、半角アラビア数字「1 」、全角アラビア数字
「２」、全角漢字数字「三」、全角ひらがな「あ」、全
角カタカナ「イ」に対応する文字情報は、"01001101111
01011"になる。The full-text search method of the second embodiment will be described below with reference to specific search character strings with reference to FIGS. 2 and 3. Similar to the first embodiment, the index creation process and the leaf creation process are executed, and then the search character string conversion process, the index search process, and the leaf search process are sequentially executed. [Search character string conversion processing] In the search character string conversion processing,
In step S20, the search character string is "A b C1.
It is assumed that the user has input "2 Sanai". In step S21, the search character mask information "1111111111111111" is created. The search character mask information is "11" when the character type distinguishes between uppercase and lowercase, etc. In case of "0
0 "is created for each character type. In the second embodiment,
It is supposed to distinguish all character types. Whether to distinguish is specified by the user for each character type. In step S22, character information corresponding to the search character string "A b C1 2 3 Ai" is created according to the rule shown in FIG. 1 (d). Half-width capital letter “A”, half-width lower case letter “b”, full-width capital letter “C”, half-width Arabic numeral “1”, full-width Arabic numeral “2”, full-width kanji numeral “three”, full-width hiragana “a”, full-width katakana “a” Character information corresponding to is "01001101111
It will be 01011 ".

【００１３】ステップＳ２３において、文字を共通化
し、インデックス１内で共通に使用している文字“ＡＢ
Ｃ１２３あい”に文字変換し、インデックス変換処理へ
進む。［インディクス検索処理］インデックス検索処理では、
ステップＳ２４において、文字列“ＡＢＣ１２３あい”
とインデックス１とを比較して検索し、リーフ検索処理
へ進む。［リーフ検索処理］リーフ検索処理では、ステップＳ２
５において、リーフ２に達したかどうかを調べ、リーフ
２に達していればステップＳ２６へ進み、リーフ２に達
していなければステップＳ３１へ進む。ステップ２６に
おいて、辿りついたリーフ２の情報を取り出す。ステッ
プ２７において、辿り着いたリーフ２の情報を先頭より
検索文字数分だけ文字情報２ｂを取り出す。In step S23, the characters are made common and the character "AB" commonly used in the index 1 is used.
The character is converted to "C123ai" and the process proceeds to the index conversion process. [Index search process] In the index search process,
In step S24, the character string "ABC123ai"
And index 1 are compared to perform a search, and the process proceeds to leaf search processing. [Leaf Search Process] In the leaf search process, step S2
In step 5, whether or not the leaf 2 is reached is checked. If the leaf 2 is reached, the process proceeds to step S26, and if the leaf 2 is not reached, the process proceeds to step S31. In step 26, the information of the leaf 2 reached is extracted. In step 27, the character information 2b is extracted from the head of the information of the leaf 2 which has been reached by the number of search characters.

【００１４】ステップＳ２８において、取り出した文字
情報２ｂと検索文字列の文字情報"0100110111101011"を
それぞれ検索文字マスク情報"1111111111111111"でＡＮ
Ｄ操作を行ってマスクした後、これらを比較し区別する
文字種が一致しているかを判別する。そして、一致すれ
ばステップＳ２９へ進み、一致しなければステップＳ３
１へ進む。ステップＳ２９において、辿り着いたリーフ
２の場所情報２ａを返す。ステップＳ３０において、別
のリーフ２にも到達しているかを判別し、別のリーフに
到達していればステップＳ２６へ戻り、到達したリーフ
の数だけ同様の処理を繰り返す。別のリーフに到達せず
全ての到達したリーフ２についてステップＳ２８の比較
で一致しなければステップＳ３１に進み、別のリーフに
到達せず全ての到達したリーフ２についてステップＳ２
８の比較で一致するものがあればフルテキストサーチを
終了する。ステップＳ３１において、「該当文字列な
し」を返し、フルテキストサーチを終了する。以上説明
したように、本第２の実施例では、文字種毎に区別する
しないを指定できるので、利用者にはより便利にフルテ
キストサーチすることができるという利点がある。In step S28, the extracted character information 2b and the character information "0100110111101011" of the search character string are respectively AN with the search character mask information "1111111111111111".
After performing the D operation and masking, these are compared to determine whether the distinguishing character types match. If they match, the process proceeds to step S29, and if they do not match, step S3.
Proceed to 1. In step S29, the location information 2a of the leaf 2 reached is returned. In step S30, it is determined whether another leaf 2 has been reached. If another leaf 2 has been reached, the process returns to step S26, and the same processing is repeated for the number of reached leaves. If all the arrived leaves 2 that have not reached another leaf are not matched in the comparison in step S28, the process proceeds to step S31, and all the arrived leaves 2 that have not reached another leaf have been reached step S2.
If there is a match in the comparison of 8, the full text search is ended. In step S31, "no corresponding character string" is returned, and the full text search ends. As described above, in the second embodiment, since it is possible to specify that no distinction is made for each character type, there is an advantage that the user can perform a full-text search more conveniently.

【００１５】[0015]

【発明の効果】以上詳細に説明したように、本発明によ
れば、インデックス作成処理によって英字の大文字と小
文字、英字の全角と半角、数字の全角と半角、数字のア
ラビア数字と漢字、カタカナとひらがな、及びカタカナ
の全角と半角との各ペアについてこれら全てのペアまた
は一部のペアを同一の文字とみなしてインデックスを作
成し、リーフ作成処理により文書中の文字列の位置情報
に加えて、前記各ペアについてこれら全てのペアまたは
一部のペアを区別する文字情報をリーフの一部として作
成し、検索文字列変換処理、インデックス検索処理、リ
ーフ検索処理を実行する。よって、インデックス中で
は、英字の大文字と小文字等が同じものとしてみなすの
でインディクスの量を少なくすることができる。さら
に、リーフの文字情報によって大文字と小文字などを区
別するので、大文字と小文字などを同じものとして扱う
こともできるし異なるものとして扱うこともできる。As described above in detail, according to the present invention, by the indexing process, uppercase letters and lowercase letters, full-width and half-width of letters, full-width and half-width of numbers, Arabic numerals and kanji of numbers, and katakana are used. For each pair of hiragana and katakana full-width and half-width, consider all these pairs or some pairs as the same character to create an index, and in addition to the position information of the character string in the document by the leaf creation processing, Character information that distinguishes all or some of these pairs from each pair is created as a part of the leaf, and search character string conversion processing, index search processing, and leaf search processing are executed. Therefore, in the index, the uppercase and lowercase letters of alphabetic characters are regarded as the same, so that the amount of indexes can be reduced. Furthermore, since uppercase and lowercase letters are distinguished according to the leaf character information, uppercase and lowercase letters can be treated as the same or different.

[Brief description of drawings]

【図１】本発明の第１の実施例を示すフルテキストサー
チ方法のフローチャートである。FIG. 1 is a flowchart of a full-text search method according to a first embodiment of the present invention.

【図２】本発明の実施例を示すフルテキストサーチ方法
を実施するためのデータベース構造図である。FIG. 2 is a database structure diagram for implementing a full-text search method according to an embodiment of the present invention.

【図３】本発明の第２の実施例を示すフルテキストサー
チ方法のフローチャートである。FIG. 3 is a flowchart of a full-text search method showing a second embodiment of the present invention.

[Explanation of symbols]

１インデックス２リーフ２ａ場所情報２ｂ文字情報２ｂ−１全角・半角ビット２ｂ−２大文字・小文字ビット 1 Index 2 Leaf 2a Location information 2b Character information 2b-1 Full-width / half-width bit 2b-2 Uppercase / lowercase bit

Claims

[Claims]

1. A full-text search method for searching a character string in the document using a leaf representing position information of the character string in the document and an index for searching the leaf, wherein an uppercase and lowercase alphabetic character, Regarding full-width and half-width of English characters, full-width and half-width of numbers, Arabic numerals and kanji of numbers, katakana and hiragana, and full-width and half-width of katakana, all or some of these pairs are regarded as the same character. An index creating process for creating the index; a leaf creating process for creating the position information and creating character information that distinguishes all or some of these pairs for each pair as part of a leaf; When a character string is specified, the above-mentioned detection is performed in the same manner as the index creation processing in the case where the pair is regarded as the same character. A search character string conversion process for converting a search character string, an index search process for searching the index based on the character string converted by the search character string conversion process, and a case where the leaf is reached by the index search process. In the mode of searching without distinguishing the pair, the character string in the document is searched based on the position information of the leaf, and in the mode of searching by distinguishing the pair, the character information of the leaf and the A full-text search method comprising: performing a leaf search process of comparing a search character string and searching for a character string in the document based on position information of a matching leaf.