JP4402169B1

JP4402169B1 - Code string search device, search method and program

Info

Publication number: JP4402169B1
Application number: JP2009521270A
Authority: JP
Inventors: 敏男新庄; 光裕國分
Original assignee: S Grants Co Ltd
Current assignee: S Grants Co Ltd
Priority date: 2009-02-23
Filing date: 2009-03-24
Publication date: 2010-01-20
Anticipated expiration: 2029-03-24
Also published as: JPWO2010095179A1

Abstract

任意のコード列の検索を行うことができ、従来よりも短時間で作成することのできる索引のデータ構造を求め、それを用いたコード列検索手法を提供するために、検索対象のコード列を複数のブロックに分割し、分割されたコード列ブロックに対応してコード毎にそのコードＩＤの範囲を格納したコード別ＩＤ範囲表と各コードＩＤの次に位置する次コードＩＤを格納したＩＤ関係表を作成し、コード別ＩＤ範囲表から検索コード列を構成するコードのコードＩＤの範囲を読み出し、検索コード列の先頭コードのコードＩＤ範囲に含まれるコードＩＤに対応して格納された次コードＩＤをＩＤ関係表から読み出し、該次コードに対応して格納された次コードＩＤを順次ＩＤ関係表から読み出すとともに、ＩＤ関係表から読み出した次コードＩＤがコード別ＩＤ範囲読出表から読み出したコードＩＤの範囲に含まれるか照合する。
【選択図】図３ＡIn order to obtain an index data structure that can be searched for an arbitrary code string and can be created in a shorter time than before, and to provide a code string search method using the index data structure, An ID relationship table that stores the code ID range table for each code corresponding to the divided code string block and stores the code ID range for each code and the next code ID positioned next to each code ID Create a table, read the code ID range of the code that constitutes the search code string from the ID range table by code, and store the next code corresponding to the code ID included in the code ID range of the first code of the search code string The ID is read from the ID relation table, the next code ID stored corresponding to the next code is sequentially read from the ID relation table, and the next code I read from the ID relation table is read. There match either included in the scope of the read code ID from the code by ID range read table.
[Selection] Figure 3A

Description

本発明は、ビット列で構成される文字コードあるいは文字コード列を検索する文字列検索のように、コンピュータにより、ビット列で構成されるコードあるいはコード列を検索するコード列検索に関するものである。 The present invention relates to a code string search for searching for a code or code string composed of a bit string by a computer, such as a character string search for retrieving a character code composed of a bit string or a character code string.

近年、ビジネス文書を作成するためにワードプロセッサを使用することが通例となり、またインターネットが普及したことにより、ビット列からなる文字コードを用いた、コンピュータで処理可能な電子文書が世の中に大量に存在するようになっている。そのため、これら大量の電子文書の中からコンピュータを利用して必要なものを探し出すために、各種の文字列検索手法が開発されている。 In recent years, it has become common to use word processors to create business documents, and with the widespread use of the Internet, there seems to be a large amount of electronic documents that can be processed by computers using character codes consisting of bit strings. It has become. For this reason, various character string search methods have been developed in order to search for necessary ones among these large amounts of electronic documents using a computer.

これらの文字列検索手法においては、高速な検索を実現するために予め索引を作成することが一般的である。索引としては、例えば、文書中の単語を抽出し、単語毎にそれの含まれている文書名を対応付けた転置インデックスがよく知られている。この転置インデックスはサイズが比較的小さく、検索が高速であり、インデックスの構成も簡単であるという特徴を持っている。しかしながら、単語の抽出が難しい言語もある。また、複数の単語の組み合わせの検索を行おうとすると、文書中の単語位置を突き合わせる処理が必要になるという欠点も存在する。そして、一文中の任意の文字列を検索することも難しい。 In these character string search methods, it is common to create an index in advance in order to realize a high-speed search. As an index, for example, a transposed index in which words in a document are extracted and a document name included in each word is associated is well known. This inverted index is characterized by its relatively small size, high speed search, and simple index construction. However, there are languages where word extraction is difficult. In addition, when searching for a combination of a plurality of words, there is a drawback in that processing for matching word positions in a document is required. It is also difficult to search for an arbitrary character string in a sentence.

そこで、任意の文字列を検索可能とする接尾辞配列という索引が開発されている。下記特許文献１及び非特許文献１には、接尾辞配列とそれを用いた検索手法が開示されている。
図１は、上述の接尾辞配列に関する従来の検索方法の例を説明するものである。図１の（ａ）に例示するのは、検索対象の文字列である。図に示すように、文字列１０は、アルファベットの文字Ａ、Ｂ、Ｃ、Ｅと区切り文字＄で構成されている。文字Ａは文字列１０の文字位置１、４、７に位置している。文字Ｂは文字列１０の文字位置２、５に位置している。文字Ｃは文字列１０の文字位置６、８に位置している。文字Ｅは文字列１０の文字位置３に位置している。区切り文字＄は、文字列１０の末尾の位置である文字位置９に位置している。Therefore, an index called a suffix array has been developed that makes it possible to search for an arbitrary character string. The following Patent Document 1 and Non-Patent Document 1 disclose a suffix array and a search method using the suffix array.
FIG. 1 illustrates an example of a conventional search method relating to the suffix arrangement described above. FIG. 1A illustrates a character string to be searched. As shown in the figure, the character string 10 is composed of alphabetic characters A, B, C, E and a delimiter character $. The character A is located at the character positions 1, 4, and 7 of the character string 10. The character B is located at the character positions 2 and 5 of the character string 10. The character C is located at the character positions 6 and 8 of the character string 10. The character E is located at the character position 3 of the character string 10. The delimiter character $ is located at the character position 9 which is the last position of the character string 10.

図１の（ｂ）に示すのは、文字位置順の接尾辞２０、辞書順の接尾辞２０ａ及び接尾辞配列３０である。
文字列１０は、図１の（ｂ）に示すようにその部分文字列として９の接尾辞を持つと考えることができる。各接尾辞の先頭文字の文字位置順に接尾辞を並べた文字位置順の接尾辞２０を辞書順にソートすることにより、辞書順の接尾辞２０ａが得られる。このとき、辞書順に並べ替えた接尾辞の先頭文字の文字位置を配列に格納することにより、接尾辞配列３０が得られる。この接尾辞配列により、検索文字列のパターンと一致する検索対象文字列中の部分文字列の先頭の文字位置を求めることができる。FIG. 1B shows a suffix 20 in character position order, a suffix 20a in dictionary order, and a suffix array 30.
The character string 10 can be considered to have a suffix of 9 as its partial character string as shown in FIG. By sorting the suffixes 20 in the character position order in which the suffixes are arranged in the character position order of the first character of each suffix, the suffixes 20a in the dictionary order are obtained. At this time, the suffix array 30 is obtained by storing the character positions of the first characters of the suffixes sorted in the dictionary order in the array. With this suffix array, it is possible to obtain the leading character position of the partial character string in the search target character string that matches the pattern of the search character string.

図１の（ｃ）に示すのは、圧縮接尾辞配列による文字列検索の概念を説明するものであり、検索文字列４０と接尾辞配列３０に対応する圧縮接尾辞配列（概念図）５０が示されている。圧縮接尾辞配列（概念図）５０の配列番号（ｉ）には、次の配列番号（Ψ）が格納されている。次の配列番号（Ψ）は、接尾辞配列３０の配列番号（ｉ）に格納された文字位置に１を加えた文字位置が格納された接尾辞配列３０の配列番号である。 FIG. 1C illustrates the concept of character string search using a compressed suffix array. A compressed suffix array (conceptual diagram) 50 corresponding to the search character string 40 and the suffix array 30 is shown in FIG. It is shown. The next array number (Ψ) is stored in the array number (i) of the compression suffix array (conceptual diagram) 50. The next array number (Ψ) is the array element number of the suffix array 30 in which the character position obtained by adding 1 to the character position stored in the array element number (i) of the suffix array 30 is stored.

配列に格納するものを文字位置から次の配列番号（Ψ）に変更することにより、文字毎に格納される値は図に示すように昇順になる。したがって、各配列要素に格納する値は次の配列番号（Ψ）そのものではなく１つ前の配列要素に格納された値の増分とすることができるのでビット幅を狭くすることができ、情報量を圧縮することができる。 By changing what is stored in the array from the character position to the next array number (Ψ), the values stored for each character are in ascending order as shown in the figure. Therefore, since the value stored in each array element can be the increment of the value stored in the previous array element, not the next array number (Ψ) itself, the bit width can be reduced, and the amount of information Can be compressed.

検索の概念については、例示された検索文字列４０の各文字から圧縮接尾辞配列（概念図）５０の配列番号（ｉ）への点線の矢印と配列番号（ｉ）の太字で示す３、６、９と次の配列番号（Ψ）の太字で示す６、９との間の矢印の検索ステップで示している。すなわち、検索文字列４０の先頭の文字Ａに対応する配列番号から例えば３が選ばれ、配列番号３の次の配列番号６が検索文字列４０の２番目の文字Ｂに対応する配列番号であり、配列番号６の次の配列番号９が検索文字列４０の３番目の文字Ｅに対応する配列番号であることにより、検索対象の文字列１０が検索文字列４０による検索でヒットすることがわかる。 Regarding the concept of the search, the dotted line arrow from each character of the exemplified search character string 40 to the array number (i) of the compressed suffix array (conceptual diagram) 50 and the bold letters of the array number (i) 3, 6 , 9 and the next step of searching for an arrow between 6 and 9 shown in bold in the sequence number (Ψ). That is, for example, 3 is selected from the sequence numbers corresponding to the first character A of the search character string 40, and the sequence number 6 next to the sequence number 3 is the sequence number corresponding to the second character B of the search character string 40. Since the sequence number 9 next to the sequence number 6 is the sequence number corresponding to the third character E of the search character string 40, it is found that the search target character string 10 is hit by the search by the search character string 40. .

特許第３、６７２、２４２号公報Japanese Patent No. 3,672,242 定兼、「圧縮接尾辞に関する考察」、電子情報通信学会技術研究報告（2000-7-19）Vol.100,No.226,p49-56Sadakane, “Consideration on Compression Suffix”, IEICE Technical Report (2000-7-19) Vol.100, No.226, p49-56

文字列検索に圧縮接尾辞配列を用いることにより、任意の文字列の検索を行うことができ、配列の容量も削減することができる。しかし、圧縮接尾辞配列を作成するには、その前に検索対象の文字列から接尾辞を作成しその接尾辞を辞書順にソートして接尾辞配列を作成する必要があり、検索対象の文字列から圧縮接尾辞配列を作成する処理時間が大きなものとなる。
そこで本発明の解決しようとする課題は、文字列に限らず、任意のコード列の検索を行うことができる索引データの作成時間を従来のものよりも短縮することである。そして、本発明の目的は、任意のコード列の検索を行うことができ、従来よりも短時間で作成することのできる索引のデータ構造を求め、それを用いたコード列検索手法を提供することである。By using a compression suffix array for character string search, an arbitrary character string can be searched, and the capacity of the array can be reduced. However, before creating a compressed suffix array, it is necessary to create a suffix array from the search target character string, sort the suffixes in dictionary order, and create a suffix array. The processing time for creating a compressed suffix array from a long time becomes large.
Therefore, the problem to be solved by the present invention is to reduce the time for creating index data that can search for an arbitrary code string, not just a character string, as compared with the conventional one. An object of the present invention is to provide an index data structure that can be searched for an arbitrary code string and can be created in a shorter time than before, and to provide a code string search method using the index data structure. It is.

本発明によれば、検索対象コード列をいくつかのブロック（以下、コード列ブロックということがある。）に分割し、コード列ブロック毎に、コード列ブロックに位置する全ての各コードを一意に識別するコードＩＤが、異なるコードの値（以下、誤解の恐れのない場合には、単にコードという場合がある。また、逆に異なるコード値であることを強調して、コード種別ということもある。）間でコードＩＤの範囲が重ならないように、上記全ての各コードに付与されるものとする。例えばコード毎にコード列ブロック中に出現する順に昇順のコードＩＤを付与することを、コード種別毎に最初のコードＩＤの値をそれまで付与されたコードＩＤより大きい値として繰り返すことにより、上記コードＩＤの付与を実現することができる。 According to the present invention, the search target code string is divided into several blocks (hereinafter also referred to as code string blocks), and each code string located in the code string block is uniquely assigned to each code string block. The code ID to be identified is a different code value (hereinafter, when there is no possibility of misunderstanding, it may be simply referred to as a code. .)) So that the code ID ranges do not overlap each other. For example, by assigning code IDs in ascending order for each code in the order in which they appear in the code string block, by repeating the value of the first code ID for each code type as a value larger than the code ID assigned so far, the above code ID assignment can be realized.

そして本発明によれば、各コード列ブロックに対応してコード毎にそのコードＩＤの範囲を格納したコード別ＩＤ範囲表と各コードＩＤの次に位置するコードＩＤである次コードＩＤを格納したＩＤ関係表を作成し、コード別ＩＤ範囲表とＩＤ関係表を用いてコード列検索を実施する。 According to the present invention, the code-specific ID range table storing the range of the code ID for each code corresponding to each code string block and the next code ID which is the code ID positioned next to each code ID are stored. An ID relationship table is created, and a code string search is performed using the ID range table for each code and the ID relationship table.

本発明の、検索コード列による検索対象コード列のコード列検索によれば、先頭のコード列ブロックのコード別ＩＤ範囲表から検索コード列を構成するコードのコードＩＤの範囲を読み出し、検索コード列の先頭のコードのコードＩＤ範囲に含まれるコードＩＤに対応して格納された次コードＩＤをコード列ブロック毎に作成されたＩＤ関係表から読み出すとともに、該次のコードに対応して格納された次コードＩＤを順次ＩＤ関係表から読み出し、ＩＤ関係表から読み出した次コードＩＤがコード別ＩＤ範囲読出表から読み出したコードＩＤの範囲に含まれるか照合し、この照合を後続のコード列ブロックに対して同様に行う。 According to the code string search of the search target code string by the search code string of the present invention, the range of the code ID of the code constituting the search code string is read from the code-specific ID range table of the first code string block, and the search code string The next code ID stored corresponding to the code ID included in the code ID range of the first code of the code is read from the ID relation table created for each code string block, and stored corresponding to the next code The next code ID is sequentially read from the ID relation table, and whether the next code ID read from the ID relation table is included in the range of the code ID read from the code-specific ID range reading table is collated. Do the same for the same.

本発明によれば、簡単な構造のコード別ＩＤ範囲表とＩＤ関係表を用いて検索を実施することができるので、接尾辞配列を作成する必要がなく、コンピュータの索引作成の処理負担を小さくすることができる。 According to the present invention, it is possible to perform a search using a code-specific ID range table and an ID relation table having a simple structure, so that it is not necessary to create a suffix array, and the processing load of computer index creation is reduced. can do.

接尾辞配列に関する従来の検索方法の例を説明する図である。It is a figure explaining the example of the conventional search method regarding a suffix arrangement | sequence. 本発明の一実施の形態における索引用のデータ構造を作成するための機能ブロックを説明する図である。It is a figure explaining the functional block for creating the data structure for the index in one embodiment of this invention. 本発明の一実施の形態におけるコード列検索のための機能ブロックを説明する図である。It is a figure explaining the functional block for the code string search in one embodiment of this invention. 本発明の一実施の形態におけるハードウェア構成例を説明する図である。It is a figure explaining the hardware structural example in one embodiment of this invention. 本発明の一実施形態における索引データの構造を説明する図である。It is a figure explaining the structure of the index data in one Embodiment of this invention. 本発明の一実施の形態におけるコード列検索の概念を説明する図である。It is a figure explaining the concept of the code string search in one embodiment of this invention. 本発明の一実施形態におけるコード列ブロックの索引データを作成する前段の処理フローを説明する図である。It is a figure explaining the process flow of the front | former stage which produces the index data of the code string block in one Embodiment of this invention. 本発明の一実施形態におけるコード列ブロックの索引データを作成する後段の処理フローを説明する図である。It is a figure explaining the process flow of the back | latter stage which produces the index data of the code string block in one Embodiment of this invention. 本発明の一実施形態におけるコード列ブロックの索引データを作成する処理の概略フローを説明する図である。It is a figure explaining the schematic flow of the process which produces the index data of the code string block in one Embodiment of this invention. 本発明の一実施形態における検索対象のコード列に含まれるコードのコード種別毎の出現回数を計数する処理フローを説明する図である。It is a figure explaining the processing flow which counts the frequency | count of appearance for every code classification of the code contained in the code sequence of the search object in one Embodiment of this invention. 本発明の一実施形態における出現回数をもとにコード別ＩＤ範囲表のコードＩＤ範囲を設定する処理フローを説明する図である。It is a figure explaining the processing flow which sets the code ID range of the ID range table classified by code based on the frequency | count of appearance in one Embodiment of this invention. 本発明の一実施形態における検索対象コード列に含まれるコードをもとにＩＤ関係表を完成させる処理フローを説明する図である。It is a figure explaining the processing flow which completes an ID relationship table based on the code contained in the search object code sequence in one Embodiment of this invention. 本発明の一実施形態におけるコード列検索の処理全体の概略フローを説明する図である。It is a figure explaining the general | schematic flow of the whole process of the code string search in one Embodiment of this invention. 本発明の一実施形態における、あるコード列ブロックを検索開始位置とするコード列検索の前段の処理フローを説明する図である。It is a figure explaining the processing flow of the front | former stage of the code sequence search which makes a certain code sequence block the search start position in one Embodiment of this invention. 本発明の一実施形態における、あるコード列ブロックを検索開始位置とするコード列検索の後段の処理フローを説明する図である。It is a figure explaining the processing flow of the back | latter stage of the code string search which makes a certain code string block a search start position in one Embodiment of this invention. 本発明の一実施の形態における完全一致検索の処理フローを説明する図である。It is a figure explaining the processing flow of the exact matching search in one embodiment of this invention. 本発明の一実施の形態における前方一致検索の処理フローを説明する図である。It is a figure explaining the processing flow of the front matching search in one embodiment of this invention. 本発明の一実施の形態における任意コードを含む検索の処理フローを説明する図である。It is a figure explaining the processing flow of the search containing the arbitrary codes in one embodiment of this invention. 本発明の一実施の形態における次のコード列ブロックの検索の前段の処理フローを説明する図である。It is a figure explaining the processing flow of the front | former stage of the search of the following code sequence block in one embodiment of this invention. 本発明の一実施の形態における次のコード列ブロックの検索の後段の処理フローを説明する図である。It is a figure explaining the processing flow of the back | latter stage of the search of the next code sequence block in one embodiment of this invention. 本発明の一実施の形態における先頭のコード列ブロックからの検索の処理の流れを説明する図である。It is a figure explaining the flow of the process of the search from the head code sequence block in one embodiment of this invention. 本発明の一実施の形態における次のコード列ブロックの検索へ移行する流れを説明する図である。It is a figure explaining the flow which transfers to the search of the next code sequence block in one embodiment of this invention. 本発明の一実施の形態における先頭から２番目のコード列ブロックからの検索の処理の流れを説明する図である。It is a figure explaining the flow of the processing of the search from the 2nd code string block from the head in one embodiment of this invention.

BEST MODE FOR CARRYING OUT THE INVENTION

以下、本発明を実施するための最良の形態を、図面を参照しながら説明する。
図２Ａは、本発明の一実施の形態における索引用のデータ構造を作成するための機能ブロックを説明する図である。索引データ作成管理手段１０４は、索引データ作成手段１０５による検索対象コード列を分割したブロック（コード列ブロック）毎の索引データの作成を管理し、索引データ管理表を作成する。索引データ作成手段１０５は、検索対象コード列読出手段１０１、コード別ＩＤ範囲表生成手段１０２及びＩＤ関係表生成手段１０３を含む。
検索対象コード列が検索対象コード列読出手段１０１で読み出され、コード別ＩＤ範囲表生成手段１０２とＩＤ関係表生成手段１０３に渡される。コード別ＩＤ範囲表生成手段１０２は、コード毎にそのコードＩＤの範囲を格納したコード別ＩＤ範囲表を作成し、ＩＤ関係表生成手段１０３は、各コードＩＤの次に位置するコードＩＤである次コードＩＤを格納したＩＤ関係表を生成する。Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.
FIG. 2A is a diagram illustrating functional blocks for creating an index data structure according to an embodiment of the present invention. The index data creation management means 104 manages the creation of index data for each block (code string block) obtained by dividing the search target code string by the index data creation means 105, and creates an index data management table. The index data creation unit 105 includes a search target code string reading unit 101, a code-specific ID range table generation unit 102, and an ID relation table generation unit 103.
The search target code string is read by the search target code string reading unit 101 and passed to the code-specific ID range table generating unit 102 and the ID relation table generating unit 103. The code-specific ID range table generating unit 102 creates a code-specific ID range table storing the range of the code ID for each code, and the ID relation table generating unit 103 is a code ID positioned next to each code ID. An ID relation table storing the next code ID is generated.

図２Ｂは、本発明の一実施の形態におけるコード列検索を行うための機能ブロックを説明する図である。コード列検索管理手段１１５は、コード列検索手段１１６による検索対象コード列のコード列ブロック毎の検索を管理する。コード列検索手段１１６は、検索コード列読出手段１１１、コード別ＩＤ範囲読出手段１１２、ＩＤ関係読出手段１１３及びコードＩＤ照合手段１１４を含む。 FIG. 2B is a diagram illustrating functional blocks for performing a code string search according to an embodiment of the present invention. The code string search management unit 115 manages a search for each code string block of the search target code string by the code string search unit 116. The code string search means 116 includes a search code string reading means 111, a code-specific ID range reading means 112, an ID relation reading means 113, and a code ID collating means 114.

検索コード列が検索コード列読出手段１１１で読み出され、コード別ＩＤ範囲読出手段１１２に渡される。コード別ＩＤ範囲読出手段１１２は、コード別ＩＤ範囲表生成手段１０２で生成されたコード別ＩＤ範囲表より、検索コード列読出手段１１１から渡された検索コード列を構成するコードのコードＩＤの範囲を読み出してＩＤ関係読出手段１１３とコードＩＤ照合手段１１４に渡す。 The search code string is read by the search code string reading unit 111 and passed to the code-specific ID range reading unit 112. The code-specific ID range reading unit 112 uses the code-specific ID range table generated by the code-specific ID range table generating unit 102 to specify the code ID range of codes constituting the search code string passed from the search code string reading unit 111 Is transferred to the ID relation reading means 113 and the code ID collating means 114.

ＩＤ関係読出手段１１３は、コード別ＩＤ範囲読出手段１１２から渡された検索コード列の先頭のコードのコードＩＤ範囲に含まれるコードＩＤに対応して格納された次コードＩＤを、ＩＤ関係表生成手段１０３で生成されたＩＤ関係表から読み出すとともに、該次のコードに対応して格納された次コードＩＤを順次ＩＤ関係表から読み出してコードＩＤ照合手段１１４に渡す。
コードＩＤ照合手段１１４は、ＩＤ関係読出手段１１３から渡された次コードＩＤがコード別ＩＤ範囲読出手段から渡されたコードＩＤの範囲に含まれるか照合し、検索結果を出力する。The ID relation reading unit 113 generates an ID relation table by generating the next code ID stored corresponding to the code ID included in the code ID range of the first code in the search code string passed from the code-specific ID range reading unit 112. While reading from the ID relation table generated by the means 103, the next code ID stored corresponding to the next code is sequentially read from the ID relation table and passed to the code ID collating means 114.
The code ID collating unit 114 collates whether the next code ID passed from the ID relation reading unit 113 is included in the code ID range passed from the code-specific ID range reading unit, and outputs a search result.

図２Ｃは、本発明の一実施の形態におけるハードウェア構成例を説明する図である。
本発明の検索装置による検索処理及び索引生成処理は中央処理装置３０２及びキャッシュメモリ３０３を少なくとも備えたデータ処理装置３０１によりデータ格納装置３０８を用いて実施される。索引データ管理表３２１の格納領域と、コード列ブロックに対応するコード別ＩＤ範囲表３０９とＩＤ関係表３１０を格納する索引データの格納領域３２４を含むデータ格納装置３０８は、主記憶装置３０５または外部記憶装置３０６で実現することができ、あるいは通信装置３０７を介して接続された遠方に配置された装置を用いることも可能である。FIG. 2C is a diagram illustrating an exemplary hardware configuration according to an embodiment of the present invention.
Search processing and index generation processing by the search device of the present invention are performed by a data processing device 301 including at least a central processing unit 302 and a cache memory 303 using a data storage device 308. The data storage device 308 including the storage region of the index data management table 321 and the index data storage region 324 for storing the ID range table 309 and the ID relation table 310 corresponding to the code string block is the main storage device 305 or external It can be realized by the storage device 306, or a remote device connected via the communication device 307 can be used.

図２Ａ及び図２Ｂを参照して説明した索引でデータ検索手段１１６等の各機能ブロックは、図２Ｃに例示するハードウェアと後に説明するステップを備えたソフトウェアにより実現可能である。 Each functional block such as the data search unit 116 in the index described with reference to FIGS. 2A and 2B can be realized by hardware illustrated in FIG. 2C and software including steps described later.

図２Ｃの例示では、主記憶装置３０５、外部記憶装置３０６及び通信装置３０７が一本のバス３０４によりデータ処理装置３０１に接続されているが、接続方法はこれに限るものではない。また、主記憶装置３０５をデータ処理装置３０１内のものとすることもできる。
また、特に図示されてはいないが、処理の途中で得られた各種の値を後の処理で用いるためにそれぞれの処理に応じた一時記憶領域が用いられることは当然である。以下の説明では、一時記憶領域に格納されたあるいは設定された値を一時記憶領域の名前で呼ぶことがある。In the example of FIG. 2C, the main storage device 305, the external storage device 306, and the communication device 307 are connected to the data processing device 301 by one bus 304, but the connection method is not limited to this. In addition, the main storage device 305 can be in the data processing device 301.
Although not particularly illustrated, it is natural that a temporary storage area corresponding to each process is used in order to use various values obtained during the process in a later process. In the following description, the value stored or set in the temporary storage area may be called by the name of the temporary storage area.

次に、本発明の一実施態様における検索手法の概要を説明する。
図３Ａは、本発明の一実施の形態における索引データの構造を説明する図である。図３Ａの（ａ）に示すのは、索引データを作成する対象となる検索対象のコード列の例である。例示された検索対象コード列１０ａは、コードＡ、Ｂ、Ｅ、Ａ、Ｂ、Ｃ、Ａ、・・・、Ｃ、Ｂの英文字の文字コードから構成されている。それぞれの文字コードの下に記載されたＰ１〜Ｐ８、・・・、Ｐｎ−１、Ｐｎは、検索対象コード列１０ａにおけるコードの位置を表している。コード位置ポインタ１１は、検索対象コード列１０ａにおけるコードの位置を示すポインタであり、図の例ではコード位置Ｐ１を指している。
図に示す例では、検索対象コード列１０ａは４つのコード毎に分割されている。したがって、矢印１２で示すように、２番目のコード列ブロックの先頭位置はＰ５である。また、矢印１３で示すように、２番目のコード列ブロックの末尾位置はＰ８である。矢印１４で示すコード位置Ｐｎは、終端位置と定義する。最後のコード列ブロックだけは、２つのコードで構成されている。
個々のコード列ブロックに対して、索引データとして、コード別ＩＤ範囲表とＩＤ関係表が生成される。Next, an outline of a search method in one embodiment of the present invention will be described.
FIG. 3A is a diagram for explaining the structure of index data according to an embodiment of the present invention. FIG. 3A shows an example of a search target code string for which index data is to be created. The exemplified search target code string 10a is composed of alphabetic character codes of codes A, B, E, A, B, C, A,. P1 to P8,..., Pn-1 and Pn described below each character code represent the position of the code in the search target code string 10a. The code position pointer 11 is a pointer indicating the position of the code in the search target code string 10a, and points to the code position P1 in the illustrated example.
In the example shown in the figure, the search target code string 10a is divided into four codes. Therefore, as indicated by the arrow 12, the head position of the second code string block is P5. As indicated by an arrow 13, the end position of the second code string block is P8. The code position Pn indicated by the arrow 14 is defined as the end position. Only the last code string block is composed of two codes.
For each code string block, a code-specific ID range table and an ID relationship table are generated as index data.

図３Ａの（ｂ）に示すのは、コード列検索のための索引のデータ構造例であり、図３Ａの（ａ）に示す検索対象コード列とそのコード列ブロックに対応して生成される索引データ管理表３２１と、先頭のコード列ブロックに対応する索引データの格納領域３２４ａに格納されたコード別ＩＤ範囲表３０９ａとＩＤ関係表３１０ａ、２番目のコード列ブロックに対応する索引データの格納領域３２４ｂに格納されたコード別ＩＤ範囲表３０９ｂとＩＤ関係表３１０ｂ、３番目のコード列ブロックに対応する索引データの格納領域３２４ｃ、及び最後のコード列ブロックに対応する索引データの格納領域３２４ｄに格納されたコード別ＩＤ範囲表３０９ｄとＩＤ関係表３１０ｄが例示されている。索引データの格納領域３２４ｃに格納された索引データの表記は省略されている。なお、以下においては、「コード別ＩＤ範囲表３０９」、「ＩＤ関係表３１０」のように表記して共通事項を説明することがある。また、他の符号についても同様に表記する場合がある。
コード別ＩＤ範囲表３０９のエントリは、索引データを作成する対象である検索対象コード列に出現する異なるコードの種別毎に作成される。コード別ＩＤ範囲表３０９の左側に表示しているように、図に示す例では、アルファベットのうち、コードＡ〜Ｅからなるコード列である検索対象コード列が索引データを作成する対象であり、各コードに対応してエントリが作成されている。コード種別ポインタ３１１は、コード別ＩＤ範囲表３０９のエントリを指すポインタである。図の先頭のコード列ブロックに対応するコード別ＩＤ範囲表３０９ａの例では、コード種別ポインタ３１１ａがコードＡに対応するエントリを指している。同様に、２番目のコード列ブロックに対応するコード別ＩＤ範囲表３０９ｂの例では、コード種別ポインタ３１１ｂがコードＡに対応するエントリを指している。また、最後のコード列ブロックに対応するコード別ＩＤ範囲表３０９ｄの例では、コード種別ポインタ３１１ｄがコードＡに対応するエントリを指している。
なお、各コードはビット列で構成されることから、そのビット列のビット値により表現される値を持つ。したがって、コード別ＩＤ範囲表３０９の各コードに対応するエントリの位置は各コードの値と対応付けることができることは明らかである。つまり、コード種別ポインタ３１１のとる値をコードそのものとすることもできる。そこで、以下の説明においては、各コードに対応するエントリを、各コードの指すエントリと表記することがある。FIG. 3B shows an example of the data structure of an index for code string search, and an index generated corresponding to the search target code string and its code string block shown in FIG. Data management table 321 and code-specific ID range table 309a and ID relation table 310a stored in the index data storage area 324a corresponding to the first code string block The index data storage area corresponding to the second code string block ID range table 309b by code stored in 324b and ID relation table 310b, index data storage area 324c corresponding to the third code string block, and index data storage area 324d corresponding to the last code string block The code-specific ID range table 309d and the ID relation table 310d are illustrated. The notation of the index data stored in the index data storage area 324c is omitted. In the following description, common items may be described using notations such as “ID range table by code 309” and “ID relation table 310”. In addition, other codes may be similarly described.
An entry in the code-specific ID range table 309 is created for each type of different code that appears in a search target code string that is a target for creating index data. As shown on the left side of the code-specific ID range table 309, in the example shown in the figure, a search target code string that is a code string composed of codes A to E among alphabets is a target for creating index data. An entry is created for each code. The code type pointer 311 is a pointer that points to an entry in the code-specific ID range table 309. In the example of the code-specific ID range table 309a corresponding to the top code string block in the figure, the code type pointer 311a points to the entry corresponding to the code A. Similarly, in the example of the code-specific ID range table 309b corresponding to the second code string block, the code type pointer 311b points to the entry corresponding to the code A. In the example of the code ID range table 309d corresponding to the last code string block, the code type pointer 311d points to the entry corresponding to the code A.
Since each code is composed of a bit string, it has a value represented by the bit value of the bit string. Therefore, it is clear that the position of the entry corresponding to each code in the code-specific ID range table 309 can be associated with the value of each code. That is, the value taken by the code type pointer 311 can be the code itself. Therefore, in the following description, an entry corresponding to each code may be described as an entry indicated by each code.

コード別ＩＤ範囲表３０９ａの下側に表示しているように、コード別ＩＤ範囲表３０９ａのエントリは、設定表示、出現回数、先頭コードＩＤ、末尾コードＩＤ、コード別ＩＤカウンタの項目で構成されている。
設定表示は、対応するコード列ブロックにそのコードが出現するかを１あるいは０で示すものである。コード別ＩＤ範囲表３０９ａの例では、先頭のコード列ブロックにはコードＣとコードＤが出現しないので、コードＣとコードＤのエントリは０であり、他のエントリは１である。コード別ＩＤ範囲表３０９ｂの例では、２番目のコード列ブロックにはコードＤとコードＥが出現しないので、コードＤとコードＥのエントリは０であり、他のエントリは１である。コード別ＩＤ範囲表３０９Ｄの例では、最後のコード列ブロックにはコードＢとコードＣしか出現しないので、コードＢとコードＣのエントリは１であり、他のエントリは０である。
出現回数は、対応するコード列ブロックにそのコードが出現する回数である。コード別ＩＤ範囲表３０９ａの例では、コードＡからコードＥに対して、２、１、０、０、１が格納されている。コード別ＩＤ範囲表３０９ｂの例では、コードＡからコードＥに対して、１、１、２、０、０が格納されている。コード別ＩＤ範囲表３０９ｄの例では、コードＡからコードＥに対して、０、１、１、０、０が格納されている。As displayed below the code-specific ID range table 309a, the entry of the code-specific ID range table 309a includes items of setting display, appearance count, start code ID, end code ID, and code-specific ID counter. ing.
The setting display indicates by 1 or 0 whether the code appears in the corresponding code string block. In the example of the code-specific ID range table 309a, since the code C and the code D do not appear in the first code string block, the entries of the code C and the code D are 0, and the other entries are 1. In the example of the code-specific ID range table 309b, since the code D and the code E do not appear in the second code string block, the entries of the code D and the code E are 0, and the other entries are 1. In the example of the code-specific ID range table 309D, only the code B and the code C appear in the last code string block, so the entries of the code B and the code C are 1, and the other entries are 0.
The number of appearances is the number of times the code appears in the corresponding code string block. In the example of the code-specific ID range table 309a, 2, 1, 0, 0, 1 are stored for code A to code E. In the example of the ID range table 309b by code, 1, 1, 2, 0, 0 are stored for the code A to the code E. In the example of the code-specific ID range table 309d, 0, 1, 1, 0, 0 are stored for code A to code E.

先頭コードＩＤ及び末尾コードＩＤは、コード別のコードＩＤの範囲を示すものである。コードＩＤは、コード間で重ならないように、コード毎にコード列ブロック中の出現順に付与されたものである。
コード別ＩＤ範囲表３０９ａの例では、コードＡについては出現回数が２であるのでコードＩＤの範囲はＩＤ１からＩＤ２であり、次のコードＢについては出現回数が１であるので先頭コードと末尾コードは共にＩＤ３である。コードＣ及びコードＤについては出現回数が０であるから、先頭コードＩＤと末尾コードＩＤは共に未設定である。コードＥについては出現回数が１であるので先頭コードと末尾コードは共にＩＤ４である。
以下同様に、コード別ＩＤ範囲表３０９ｂの例では、コードＡの先頭コードと末尾コードは共にＩＤ１、コードＢの先頭コードと末尾コードは共にＩＤ２、コードＣについては出現回数が２であるのでコードＩＤの範囲はＩＤ３からＩＤ４である。
また、コード別ＩＤ範囲表３０９ｄの例では、コードＢの先頭コードと末尾コードは共にＩＤ１、コードＣの先頭コードと末尾コードは共にＩＤ２である。The head code ID and the tail code ID indicate a range of code IDs for each code. The code ID is assigned for each code in the order of appearance in the code string block so that the codes do not overlap.
In the example of the code ID range table 309a, the number of appearances for code A is 2, so the range of code IDs is ID1 to ID2, and for the next code B, the number of appearances is 1. Are both ID3. Since the number of appearances of code C and code D is 0, neither the start code ID nor the end code ID is set. For code E, the number of appearances is 1, so that the head code and the tail code are both ID4.
Similarly, in the example of the code-specific ID range table 309b, the first code and the last code of the code A are both ID1, the first code and the last code of the code B are both ID2, and the number of appearances is 2 for the code C. The range of ID is ID3 to ID4.
In the example of the code-specific ID range table 309d, the first code and the last code of the code B are both ID1, and the first code and the last code of the code C are both ID2.

なお、ＩＤ１等の値は具体的には１から始まる整数値とすることが好適であるが、それに限ることなく、コード別のＩＤ範囲を識別することのできるものであればよい。また、図の例では、コードＩＤの範囲を先頭コードＩＤと末尾コードＩＤで示しているが、可変長データとなることをいとわなければ、すべてのコードＩＤを列挙することで示すこともできる。 Specifically, the value such as ID1 is preferably an integer value starting from 1, but is not limited thereto, and any value can be used as long as it can identify an ID range for each code. In the example of the figure, the range of the code ID is indicated by the head code ID and the tail code ID. However, if it is willing to be variable length data, it can also be indicated by listing all the code IDs.

コード別ＩＤカウンタは、コード別ＩＤ範囲表を生成したのちＩＤ関係表を生成するときに必要なカウンタであり、索引データとして必要なものではない。したがって、異なるコードの種別毎にコード別ＩＤ範囲表とは別のカウンタとして設けることもできる。 The code-specific ID counter is a counter that is necessary when generating the ID-related table after generating the code-specific ID range table, and is not necessary as index data. Accordingly, a counter different from the code-specific ID range table can be provided for each different code type.

ＩＤ関係表３１０のエントリは、コード列ブロックのコードに対してつけられたコードＩＤ毎に作成される。ＩＤ関係表３１０の左側に表示しているように、図に示す例では最後のコード列ブロックのリンク表３１０ｄを除いて、コードＩＤ１〜コードＩＤ４に対応してエントリが作成されている。各エントリは、コード位置と次コードＩＤの項目から構成されている。コードＩＤポインタ３１２は、ＩＤ関係表３１０のエントリを指すポインタであり、図の例では、いずれのＩＤ関係表３１０においてもＩＤ１を指している。 An entry in the ID relationship table 310 is created for each code ID assigned to the code in the code string block. As shown on the left side of the ID relationship table 310, in the example shown in the figure, entries are created corresponding to the code ID1 to code ID4 except for the link table 310d of the last code string block. Each entry is composed of items of code position and next code ID. The code ID pointer 312 is a pointer that points to an entry in the ID relationship table 310. In the example of the figure, the ID ID pointer 312 points to ID1 in any ID relationship table 310.

各コードＩＤのエントリのコード位置は、そのコードＩＤのコードの位置する検索対象コード列１０ａにおけるコード位置である。ＩＤ関係表３１０ａでは、ＩＤ１に対してＰ１、ＩＤ２に対してＰ４、ＩＤ３に対してＰ２、ＩＤ４に対してＰ３が格納されている。
図の点線の矢印３１３ａ（Ａ）で示すように、ＩＤ関係表３１０ａの１〜２番目のエントリはコードＡに対応するものである。同様に、点線の矢印３１３ａ（Ｂ）で示すように、３番目のエントリはコードＢに、点線の矢印３１３ａ（Ｅ）で示すように、４番目のエントリはコードＥに対応する。
各コードＩＤのエントリの次コードＩＤは、コード列ブロックにおけるそのコードＩＤのコードの次に位置するコードのコードＩＤである。なお、コード列ブロックの末尾位置のコードに対しては、先頭位置のコードのコードＩＤが格納される。したがって、ＩＤ関係表３１０ａでは、次コードＩＤとして、ＩＤ１に対してＩＤ３、ＩＤ２に対してＩＤ１、ＩＤ３に対してＩＤ４、ＩＤ４に対してＩＤ２が格納されている。The code position of each code ID entry is the code position in the search target code string 10a where the code of the code ID is located. In the ID relationship table 310a, P1 is stored for ID1, P4 for ID2, P2 for ID3, and P3 for ID4.
As indicated by the dotted arrow 313a (A) in the figure, the first and second entries of the ID relationship table 310a correspond to the code A. Similarly, the third entry corresponds to the code B as indicated by the dotted arrow 313a (B), and the fourth entry corresponds to the code E as indicated by the dotted arrow 313a (E).
The next code ID of each code ID entry is the code ID of the code positioned next to the code of the code ID in the code string block. For the code at the end position of the code string block, the code ID of the code at the start position is stored. Therefore, in the ID relationship table 310a, ID3 is stored as ID3 for ID1, ID1 for ID2, ID4 for ID3, and ID2 for ID4.

ＩＤ関係表３１０ｂでは、ＩＤ１に対してＰ７、ＩＤ２に対してＰ５、ＩＤ３に対してＰ６、ＩＤ４に対してＰ８が格納されている。
図の点線の矢印３１３ｂ（Ａ）で示すように、ＩＤ関係表３１０ｂの１番目のエントリはコードＡに対応するものである。同様に、点線の矢印３１３ｂ（Ｂ）で示すように、２番目のエントリはコードＢに、点線の矢印３１３ｂ（Ｃ）で示すように、３〜４番目のエントリはコードＣに対応する。
また、次コードＩＤとして、ＩＤ１に対してＩＤ４、ＩＤ２に対してＩＤ３、ＩＤ３に対してＩＤ１、ＩＤ４に対してＩＤ２が格納されている。In the ID relationship table 310b, P7 is stored for ID1, P5 for ID2, P6 for ID3, and P8 for ID4.
As indicated by the dotted arrow 313b (A) in the figure, the first entry in the ID relationship table 310b corresponds to the code A. Similarly, the second entry corresponds to the code B as indicated by the dotted arrow 313b (B), and the third to fourth entries correspond to the code C as indicated by the dotted arrow 313b (C).
As the next code ID, ID4 is stored for ID1, ID3 is stored for ID2, ID1 is stored for ID3, and ID2 is stored for ID4.

ＩＤ関係表３１０ｄでは、ＩＤ１に対してＰｎ、ＩＤ２に対してＰｎ−１が格納されている。
図の点線の矢印３１３ｄ（Ｂ）で示すように、ＩＤ関係表３１０ｄの１番目のエントリはコードＢに対応するものである。同様に、点線の矢印３１３ｄ（Ｃ）で示すように、２番目のエントリはコードＣに対応する。
また、次コードＩＤとして、ＩＤ１に対してＩＤ２、ＩＤ２に対してＩＤ１が格納されている。In the ID relationship table 310d, Pn is stored for ID1 and Pn-1 is stored for ID2.
As indicated by the dotted arrow 313d (B) in the figure, the first entry in the ID relationship table 310d corresponds to the code B. Similarly, the second entry corresponds to the code C, as indicated by the dotted arrow 313d (C).
As the next code ID, ID2 is stored for ID1, and ID1 is stored for ID2.

ＩＤ関係表３１０は、コードＩＤで表した２つのコードがコード列ブロックにおいて連続した位置関係にあることを索引データとして保持している。前方のコード列ブロックの末尾位置のコードと後方のコード列ブロックの先頭位置のコードの関係は、索引データ管理表３２１に各コード列ブロックの先頭コードをもつことで管理される。 The ID relationship table 310 holds, as index data, that two codes represented by code IDs have a continuous positional relationship in the code string block. The relationship between the code at the end position of the front code string block and the code at the head position of the rear code string block is managed by having the index data management table 321 have the head code of each code string block.

図に示すように、索引データ管理表３２１は、コード列ブロック毎のエントリを有し、各エントリは設定表示、先頭コード、索引データポインタの項目から構成されている。索引データ管理ポインタ３２２は、索引データ管理表のエントリを指すポインタである。図の例では、索引データ管理ポインタ３２２は、先頭のコード列ブロックに対応するエントリ１を指している。
索引データ管理表３２１の設定表示には、エントリ１からエントリｍまで１が設定され、それ以外のエントリの設定表示は０である。エントリｍは最後のコード列ブロックに対応するものである。また、索引データ管理表３２１の先頭コードとして、エントリ１にはコードＡが、エントリ２にはコードＢが、エントリｍにはコードＣが設定されている。
索引データポインタは、点線の矢印３４４ａ、３４４ｂ、３４４ｃ、３４４ｄが示すように、それぞれ対応するコード列ブロックの索引データの格納領域３２４ａ、３２４ｂ、３２４ｃ、３２４ｄを指している。As shown in the figure, the index data management table 321 has an entry for each code string block, and each entry includes items of setting display, head code, and index data pointer. The index data management pointer 322 is a pointer that points to an entry in the index data management table. In the example of the figure, the index data management pointer 322 points to the entry 1 corresponding to the head code string block.
In the setting display of the index data management table 321, 1 is set from entry 1 to entry m, and the setting display of the other entries is 0. The entry m corresponds to the last code string block. In addition, as a head code of the index data management table 321, a code A is set for entry 1, a code B is set for entry 2, and a code C is set for entry m.
The index data pointers point to the index data storage areas 324a, 324b, 324c, and 324d of the corresponding code string blocks, as indicated by dotted arrows 344a, 344b, 344c, and 344d.

ＩＤ関係表３１０を図１の（ｃ）に示す従来例の圧縮接尾辞配列５０と比較すると、圧縮接尾辞配列５０では文字毎に次の配列番号がソートされているのに対して、ＩＤ関係表３１０では異なるコードの種別毎にコード位置がソートされている。したがって、同一コードを逐次検索する場合には、キャッシュ効果により高速化を図ることができる。 Comparing the ID relationship table 310 with the conventional compression suffix array 50 shown in FIG. 1C, the compression suffix array 50 sorts the next array number for each character. In the table 310, code positions are sorted for different code types. Therefore, when the same code is searched sequentially, the speed can be increased by the cache effect.

図３Ｂは、本発明の一実施の形態におけるコード列検索の概念を説明する図である。
検索対象コード列は、図３Ａに例示した検索対象コード列１０ａとし、図３Ａに示すようにコード列ブロックに分割されているものとする。また、検索コード列は図３Ｂに示す検索コード列４０ａとして、コード列検索の概念を説明する。検索対象コード列１０ａのコード列ブロックに対応して、コード別ＩＤ範囲表３０９とＩＤ関係表３１０が生成されており、また索引データ管理表３２１が生成されているものとする。
検索を開始する前に、矢印３４８ａで示す索引データ管理表の先頭のエントリ３２１（１）が読み出され、点線の矢印３４４ａが示すように索引データポインタ３４２ａにより索引データの格納領域３２４ａ内に格納されたコード別ＩＤ範囲表３０９ａとＩＤ関係表３１０ａが取得される。さらに、点線の矢印３４３ａで示すように、先頭コード３４１ａに格納されたコードＡに対応する、コード別ＩＤ範囲表３０９ａのエントリ３０９ａ（Ａ）が読み出され、点線の矢印３４５ａに示すように、先頭コードＩＤであるＩＤ１が読み出されて、一時記憶領域である先頭コードＩＤ３４６ａに設定されている。FIG. 3B is a diagram for explaining the concept of code string search according to an embodiment of the present invention.
The search target code string is the search target code string 10a illustrated in FIG. 3A, and is divided into code string blocks as shown in FIG. 3A. Further, the concept of the code string search will be described as the search code string 40a shown in FIG. 3B. Assume that the code-specific ID range table 309 and the ID relation table 310 are generated and the index data management table 321 is generated corresponding to the code string block of the search target code string 10a.
Before starting the search, the head entry 321 (1) of the index data management table indicated by the arrow 348a is read and stored in the index data storage area 324a by the index data pointer 342a as indicated by the dotted arrow 344a. The code-specific ID range table 309a and the ID relation table 310a are acquired. Further, as indicated by the dotted arrow 343a, the entry 309a (A) of the code-specific ID range table 309a corresponding to the code A stored in the head code 341a is read, and as indicated by the dotted arrow 345a, ID1, which is the head code ID, is read and set to the head code ID 346a, which is a temporary storage area.

検索コード列４０ａには、図に示すように、先頭からコードＥ、コードＡ、コードＢ、コードＣが位置している。そこで、図に点線の矢印３３１ａで示すように、１番目のコード３３２ａであるコードＥが読み出される。次に点線の矢印３３３ａで示すように、先頭のコード列ブロックに対応するコード別ＩＤ範囲表３０９ａのコードＥに対応するエントリ３０９ａ（Ｅ）が読み出される。（もし、先頭のコード列ブロックに検索コード列４０ａの先頭のコードが存在しなければ、その先頭のコードが存在するコード列ブロックに対応する索引データまで読み飛ばす。）
そして点線の矢印３３４ａで示すように、エントリ３０９ａ（Ｅ）からＩＤ範囲３３６ａに含まれるコードＩＤ、図の例ではコードＩＤ４が読み出され、読み出されたコードＩＤ４に対応するエントリ３１０ａ（４）がＩＤ関係表３１０ａから読み出される。In the search code string 40a, as shown in the drawing, code E, code A, code B, and code C are located from the top. Therefore, the code E, which is the first code 332a, is read as indicated by the dotted arrow 331a in the figure. Next, as indicated by a dotted arrow 333a, an entry 309a (E) corresponding to the code E in the code-specific ID range table 309a corresponding to the head code string block is read. (If the head code of the search code string 40a does not exist in the head code string block, the index data corresponding to the code string block in which the head code exists is skipped.)
As indicated by the dotted arrow 334a, the code ID included in the ID range 336a, that is, the code ID4 in the example shown in the figure, is read from the entry 309a (E), and the entry 310a (4) corresponding to the read code ID4. Are read from the ID relationship table 310a.

一方、コード列ＩＤ範囲表３０９ａに設定されている先頭のコードであるコードＡの先頭コード、ＩＤ１が一時記憶領域である先頭コードＩＤ３４６ａに設定されている。
そして、双方向の点線の矢印３４７ａで示すように、コードＩＤ４に対応するエントリ３１０ａ（４）の次コードＩＤであるＩＤ２と先頭コードＩＤ３４６ａに設定されているＩＤ１が比較され、次コードＩＤは先頭コードＩＤ以外であると判定される。On the other hand, the first code of code A, which is the first code set in the code string ID range table 309a, and ID1 are set as the first code ID 346a, which is a temporary storage area.
Then, as shown by a two-way dotted arrow 347a, ID2 that is the next code ID of the entry 310a (4) corresponding to code ID4 and ID1 set in the head code ID 346a are compared, and the next code ID is the head. It is determined that it is other than the code ID.

次に点線の矢印３３１ｂで示すように、２番目のコード３３２ｂであるコードＡが読み出される。先に述べた双方向の点線の矢印３４７ａでの判定が先頭コードＩＤ以外であったことから、点線の矢印３３３ｂで示すように、１番目のコードＥの場合と同じコード別ＩＤ範囲表３０９ａのコードＡに対応するエントリ３０９ａ（Ａ）が読み出される。そして、双方向の点線の矢印３３５ｂで示すように、ＩＤ関係表３１０ａのコードＩＤ４に対応するエントリ３１０ａ（４）の次コードＩＤ３３７ａであるＩＤ２が、コード別ＩＤ範囲表３０９ａのコードＡに対応するエントリ３０９ａ（Ａ）のコードＩＤの範囲３３６ｂ（ＩＤ１〜ＩＤ２）に含まれるかを判定する。図の例では、この判定はイエスになる。このことは、コードＥ、コードＡのコードの並びが、検索対象コード列１０ａの先頭のコード列ブロックに存在することを意味している。また、ＩＤ関係表３１０ａから読み出されたコードＩＤ４に対応するエントリ３１０ａ（４）のコード位置３３８ａがＰ３であることから、そのコードＥ、コードＡのコードの並びの先頭位置がＰ３であることが分かる。 Next, as indicated by the dotted arrow 331b, the code A which is the second code 332b is read out. Since the determination by the bidirectional dotted arrow 347a described above is other than the head code ID, as shown by the dotted arrow 333b, the same code-specific ID range table 309a as in the case of the first code E is shown. The entry 309a (A) corresponding to the code A is read. Then, as indicated by a bidirectional dotted arrow 335b, ID2 which is the next code ID 337a of the entry 310a (4) corresponding to the code ID4 of the ID relation table 310a corresponds to the code A of the code-specific ID range table 309a. It is determined whether it is included in the code ID range 336b (ID1 to ID2) of the entry 309a (A). In the example shown, this determination is yes. This means that the code sequence of code E and code A exists in the first code string block of the search target code string 10a. Since the code position 338a of the entry 310a (4) corresponding to the code ID 4 read from the ID relation table 310a is P3, the top position of the code sequence of the code E and code A is P3. I understand.

さらに点線の矢印３３４ｂで示すように、次コードＩＤ３３７ａであるＩＤ２に対応するエントリ３１０ａ（２）の次コードＩＤ３３７ｂであるＩＤ１が読み出される。そして、双方向の点線の矢印３４７ｂに示すように、この読み出されたＩＤ１と、先に先頭コードＩＤ３４６ａに設定されたＩＤ１の比較がおこなわれ、次コードＩＤと先頭コードＩＤが等しいことが判定される。すなわち、２番目のコード３３２ｂであるコードＡと照合する先頭のコード列ブロックのコードＩＤ２のコードＡは、先頭のコード列ブロックの末尾位置に位置するものであることが判定される。
すると、矢印３４８ｂで示す索引データ管理表の２番目のエントリ３２１（２）が読み出され、その先頭コード３４１ｂに格納されたコードＢが点線の矢印３５１ｂで示すように一時記憶領域３５２ｂに設定される。そして、点線の矢印３３１ｃで示すようにコードＢが３番目のコード３３２ｃとして読み出されると、双方向の点線の矢印３５３ｂで示すように、一時記憶領域３５２ｂに設定されたコードと一致するかが判定される。すなわち、３番目のコード３３２ｃであるコードＢが２番目のコード列ブロックの先頭コードであるかが判定される。図の例では、肯定的な判定結果が得られる。したがって、検索対象コード列１０ａは検索コード列ＥＡＢでヒットすることがわかる。Further, as indicated by a dotted arrow 334b, ID1 that is the next code ID 337b of the entry 310a (2) corresponding to ID2 that is the next code ID 337a is read. Then, as indicated by a bidirectional dotted arrow 347b, the read ID1 is compared with the ID1 previously set in the head code ID 346a, and it is determined that the next code ID and the head code ID are equal. Is done. That is, it is determined that the code A of the code ID2 of the first code string block to be compared with the code A that is the second code 332b is located at the end position of the first code string block.
Then, the second entry 321 (2) of the index data management table indicated by the arrow 348b is read, and the code B stored in the head code 341b is set in the temporary storage area 352b as indicated by the dotted arrow 351b. The When the code B is read as the third code 332c as indicated by the dotted arrow 331c, it is determined whether or not it matches the code set in the temporary storage area 352b as indicated by the bidirectional dotted arrow 353b. Is done. That is, it is determined whether the code B which is the third code 332c is the head code of the second code string block. In the illustrated example, a positive determination result is obtained. Therefore, it can be seen that the search target code string 10a hits the search code string EAB.

そこで、点線の矢印３４４ｂが示すように索引データポインタ３４２ｂにより索引データの格納領域３２４ｂがアクセスされ、点線の矢印３４３ｂで示すように、先頭コード３４１ｂに格納されたコードＢに対応する、コード別ＩＤ範囲表３０９ｂのエントリ３０９ｂ（Ｂ）が読み出される。点線の矢印３４５ｃに示すように、そのコードＩＤの範囲３３６ｆの先頭コードＩＤであるＩＤ２が読み出されて、一時記憶領域である先頭コードＩＤ３４６ｂに設定される。
次に、点線の矢印３３４ｃで示すように、先頭コードＩＤ３４６ｂであるＩＤ２に対応するエントリ３１０ｂ（２）の次コードＩＤ３３７ｃであるＩＤ３が読み出される。そして、双方向の点線の矢印３４７ｃに示すように、この読み出されたＩＤ３と、先に先頭コードＩＤ３４６ｂに設定されたＩＤ２の比較がおこなわれ、次コードＩＤは先頭コードＩＤ以外であると判定される。Therefore, the index data storage area 324b is accessed by the index data pointer 342b as indicated by the dotted arrow 344b, and the ID by code corresponding to the code B stored in the head code 341b as indicated by the dotted arrow 343b. The entry 309b (B) in the range table 309b is read. As indicated by the dotted arrow 345c, ID2 that is the first code ID in the code ID range 336f is read and set to the first code ID 346b that is a temporary storage area.
Next, as indicated by a dotted arrow 334c, ID3 which is the next code ID 337c of the entry 310b (2) corresponding to ID2 which is the head code ID 346b is read. Then, as indicated by a bidirectional dotted arrow 347c, the read ID3 is compared with ID2 previously set to the head code ID 346b, and it is determined that the next code ID is other than the head code ID. Is done.

次に点線の矢印３３１ｄで示すように、４番目のコード３３２ｄであるコードＣが読み出される。先に述べた双方向の点線の矢印３４７ｃでの判定が先頭コードＩＤ以外であったことから、点線の矢印３３３ｄで示すように、３番目のコードＢの場合と同じコード別ＩＤ範囲表３０９ｂのコードＣに対応するエントリ３０９ｂ（Ｃ）が読み出される。そして、双方向の点線の矢印３３５ｄで示すように、点線の矢印３３４ｃに示すＩＤ関係表３１０ｂのコードＩＤ２に対応するエントリ３１０ｂ（２）の次コードＩＤ３３７ｃであるＩＤ３が、点線の矢印３３３ｄで示すコードＣに対応するエントリ３０９ｂ（Ｃ）のコードＩＤの範囲３３６ｄ（ＩＤ３〜ＩＤ４）に含まれるかを判定する。図の例では、この判定はイエスになることから、検索対象コード列１０ａは検索コード列ＥＡＢＣでヒットすることがわかる。
この判定に続いて、点線の矢印３３４ｄで示すように、次コードＩＤ３３７ｃであるＩＤ３に対応するエントリ３１０ｂ（３）の次コードＩＤ３３７ｄであるＩＤ１が読み出される。そして、双方向の点線の矢印３４７ｄに示すように、この読み出されたＩＤ１と、先に先頭コードＩＤ３４６ｂに設定されたＩＤ２の比較がおこなわれ、次コードＩＤと先頭コードＩＤが等しくないことが判定される。
そして、ＩＤ関係表３１０ａから読み出されたコードＩＤ２に対応するエントリ３１０ａ（２）のコード位置３３８ｂがＰ４であること、ＩＤ関係表３１０ｂから読み出されたコードＩＤ２に対応するエントリ３１０ｂ（２）のコード位置３３８ｃはＰ５であること、コードＩＤ３に対応するエントリ３１０ｂ（３）のコード位置３３８ｄはＰ６であることから、上述のヒット位置はコード位置Ｐ３、Ｐ４、Ｐ５、Ｐ６であることが分かる。Next, as indicated by the dotted arrow 331d, the code C which is the fourth code 332d is read out. Since the determination by the bidirectional dotted arrow 347c described above is other than the head code ID, as shown by the dotted arrow 333d, the same code-specific ID range table 309b as in the case of the third code B is shown. An entry 309b (C) corresponding to the code C is read. Then, as indicated by a bidirectional dotted arrow 335d, ID3 which is the next code ID 337c of the entry 310b (2) corresponding to the code ID2 of the ID relationship table 310b indicated by the dotted arrow 334c is indicated by the dotted arrow 333d. It is determined whether it is included in the code ID range 336d (ID3 to ID4) of the entry 309b (C) corresponding to the code C. In the example shown in the figure, this determination is yes, so that it is found that the search target code string 10a hits the search code string EABC.
Following this determination, as indicated by a dotted arrow 334d, ID1 that is the next code ID 337d of the entry 310b (3) corresponding to ID3 that is the next code ID 337c is read. Then, as indicated by the bidirectional dotted arrow 347d, the read ID1 is compared with the ID2 previously set in the head code ID 346b, and the next code ID and the head code ID may not be equal. Determined.
Then, the code position 338b of the entry 310a (2) corresponding to the code ID2 read from the ID relation table 310a is P4, and the entry 310b (2) corresponding to the code ID2 read from the ID relation table 310b. Since the code position 338c is P5 and the code position 338d of the entry 310b (3) corresponding to the code ID3 is P6, it can be seen that the hit positions are the code positions P3, P4, P5, and P6. .

検索コード列４０ａの図示しない５番目のコードについても、点線の矢印３３４ｅに示すように、次コードＩＤ３３７ｄであるＩＤ１に対応するＩＤ関係表３１０のエントリの次コードＩＤが読み出され、５番目のコードのコード種別の指すコード別ＩＤ範囲表３０９のエントリのコードＩＤの範囲内であるかの判定等が繰り返される。
以上のようにして、本発明の一実施の形態によるコード列検索が実施される。Also for the fifth code (not shown) of the search code string 40a, the next code ID of the entry in the ID relationship table 310 corresponding to ID1 that is the next code ID 337d is read as shown by the dotted arrow 334e. The determination of whether or not the code ID is within the code ID range of the entry in the code ID range table 309 indicated by the code type of the code is repeated.
As described above, the code string search according to the embodiment of the present invention is performed.

次に、本発明の一実施の形態における索引データの作成処理を説明する。索引データは、図３Ａの（ｂ）に例示するように、索引データ管理表と、各コード列ブロックに対応する索引データの格納領域に格納されるコード別ＩＤ管理表とＩＤ関係表から構成される。
図４Ａ及び図４Ｂは、本発明の一実施形態における索引データを作成する処理のフローを説明する図である。図４Ａ及び図４Ｂに示す索引データの作成処理フローは、初期処理のものと、各コード列ブロックに対応する索引データ（以下、各コード列ブロックに対応するブロック索引データ、あるいは単にブロック索引データということがある。）の作成処理を順次実行するフローから構成される。Next, index data creation processing according to an embodiment of the present invention will be described. The index data is composed of an index data management table, an ID management table for each code stored in an index data storage area corresponding to each code string block, and an ID relationship table, as illustrated in FIG. 3A (b). The
4A and 4B are diagrams illustrating a flow of processing for creating index data according to an embodiment of the present invention. The index data creation processing flow shown in FIG. 4A and FIG. 4B includes initial processing and index data corresponding to each code string block (hereinafter referred to as block index data corresponding to each code string block or simply block index data). It may consist of a flow that sequentially executes the creation process.

図４Ａは、本発明の一実施形態における索引データを作成する処理、すなわち、各コード列ブロックに対応するブロック索引データを順次作成する処理の前段の処理フローを説明する図である。この前段の処理は、先に述べた初期処理である。
図４Ａに示すように、ステップＳ４０１において、検索対象コード列を設定する。検索対象コード列の設定は、データ格納装置に格納された検索対象となるコード列の集合から、１つのコード列を図２Ａに例示する検索対象コード列読出手段１１１で読み出して、図示しない検索対象コード列設定エリアに設定することを意味する。なお、上述の検索対象コード列設定エリアは、先に述べた「処理の途中で得られた各種の値を後の処理で用いるためにそれぞれの処理に応じた一時記憶装置」の１つである。以下の説明では、「図示しない検索対象コード列設定エリアに設定する」のような表現に変えて、「検索対象コード列として設定する」あるいは単に「検索対象コード列に設定する」のように記述することもある。検索対象コード列以外についても同様である。FIG. 4A is a diagram for explaining the processing flow before the process of creating index data according to an embodiment of the present invention, that is, the process of sequentially creating block index data corresponding to each code string block. This preceding process is the initial process described above.
As shown in FIG. 4A, a search target code string is set in step S401. The search target code string is set by reading one code string from the set of search target code strings stored in the data storage device by the search target code string reading unit 111 illustrated in FIG. This means setting in the code string setting area. The above-described search target code string setting area is one of the above-described “temporary storage devices corresponding to each process in order to use various values obtained during the process in the subsequent process”. . In the following explanation, instead of the expression “set in a search target code string setting area (not shown)”, it is described as “set as a search target code string” or simply “set as a search target code string”. Sometimes. The same applies to other than the search target code string.

次にステップＳ４０２において、索引データ管理表の格納領域を取得するとともに、索引データ管理ポインタを索引データ管理表の先頭エントリに位置付ける。ステップＳ４０３に進み、検索対象コード列を分割したコード列ブロックの最大コード数を設定する。図３Ａに示す例では、最大コード数は４である。次のステップＳ４０４では、コード列の先頭位置に、検索対象のコード位置の先頭位置を設定する。さらにステップＳ４０５ではコード列の終端位置に、検索対象のコード列の終端位置を設定し、図４Ｂに示すステップＳ４０６に移行する。
以上で索引データ作成処理の初期処理が終了する。図３Ａの例示では、検索対象コード列１０ａが設定され、索引データ管理ポインタ３２２は索引データ管理用３２１の先頭のエントリに位置付けられ、最大コード数には４が、コード列の先頭位置にはＰ１が、コード列の終端位置にはＰｎが設定される。In step S402, the storage area of the index data management table is acquired, and the index data management pointer is positioned at the first entry of the index data management table. In step S403, the maximum number of codes in the code string block obtained by dividing the search target code string is set. In the example shown in FIG. 3A, the maximum number of codes is four. In the next step S404, the head position of the code position to be searched is set as the head position of the code string. In step S405, the end position of the code string to be searched is set as the end position of the code string, and the process proceeds to step S406 shown in FIG. 4B.
This completes the initial process of creating index data. In the example of FIG. 3A, the search target code string 10a is set, the index data management pointer 322 is positioned at the head entry of the index data management 321, the maximum code number is 4, and the code string head position is P1. However, Pn is set at the end position of the code string.

図４Ｂは、各コード列ブロックに対応するブロック索引データを順次作成する処理の後段の処理フローを説明する図である。
図に示すように、ステップＳ４０６において、残りコード数に、コード列の終端位置からコード列の先頭位置を減じた値を設定し、ステップＳ４０７で、残りコード数は最大コード数より大きいか判定する。
残りコード数が最大コード数より大きければステップＳ４０８に進み、コード列の末尾位置に、コード列の先頭位置から最大コード数分だけ移動した位置を設定する。また、残りコード数が最大コード数より大きくなければステップＳ４０９に進み、コード列の末尾位置に、コード列の終端位置を設定する。
上述のステップＳ４０６〜ステップＳ４０９の処理は、後述の各コード列ブロックに対応する索引データの作成処理の終了を、ステップＳ４０８あるいはステップＳ４０９で設定したコード列の末尾位置で判定するために行われる。FIG. 4B is a diagram for explaining a processing flow at the latter stage of processing for sequentially creating block index data corresponding to each code string block.
As shown in the figure, in step S406, a value obtained by subtracting the head position of the code string from the end position of the code string is set in the remaining code number, and in step S407, it is determined whether the remaining code number is larger than the maximum code number. .
If the remaining code number is larger than the maximum code number, the process proceeds to step S408, and the position moved by the maximum code number from the head position of the code string is set as the end position of the code string. If the remaining code number is not greater than the maximum code number, the process advances to step S409 to set the end position of the code string at the end position of the code string.
The processes in steps S406 to S409 described above are performed in order to determine the end of index data creation processing corresponding to each code string block described later, based on the end position of the code string set in step S408 or step S409.

次にステップＳ４１０において、現在索引データ作成対象となっているコード列ブロックの索引データの格納領域を確保するとともに、該格納領域のポインタを取得し、ステップＳ４１１に進む。索引データ作成対象のコード列ブロックは、ステップＳ４０４あるいは後述のステップＳ４１５で設定されるコード列の先頭位置に位置するコードから始まるものである。 In step S410, a storage area for the index data of the code string block that is the current index data creation target is secured, a pointer to the storage area is acquired, and the process proceeds to step S411. The code string block for which the index data is to be created starts from a code positioned at the head position of the code string set in step S404 or step S415 described later.

ステップＳ４１１では、現在索引データ作成対象となっているコード列ブロックの索引データを作成し、ステップＳ４１０で確保した格納領域に格納するとともに、最先頭コードを取得する。ステップＳ４１１の処理の詳細、及び最先頭コードについては、後に図４Ｃ、及び図５Ａ〜図５Ｃを参照して説明する。 In step S411, the index data of the code string block that is the current index data creation target is created, stored in the storage area secured in step S410, and the topmost code is acquired. Details of the processing in step S411 and the top code will be described later with reference to FIG. 4C and FIGS. 5A to 5C.

次にステップＳ４１２で、索引データポインタの指す索引データ管理表の設定表示に「あり」を、先頭コードに最先頭コードを、索引データポインタにステップＳ４１０で取得したポインタを、それぞれ設定する。なお、最先頭コードは、ステップＳ４１１の処理において設定されているものである。 In step S412, “present” is set in the setting display of the index data management table pointed to by the index data pointer, the top code is set in the top code, and the pointer acquired in step S410 is set in the index data pointer. Note that the top code is set in the process of step S411.

次にステップＳ４１３で、コード列の末尾位置はコード列の終端位置か判定する。コード列の末尾位置がコード列の終端位置であれば索引データの作成は完了しているので処理を終了する。コード列の末尾位置がコード列の終端位置でなければ、ステップＳ４１４へ進み、索引データ管理ポインタを索引データ管理表の次のエントリに位置付け、ステップＳ４１５でコード列の先頭位置に、コード列の末尾位置の次のコード位置を設定してステップＳ４０６に戻る。
以上のステップＳ４０６〜ステップＳ４１５のループ処理を、ステップＳ４１３においてコード列の末尾位置がコード列の終端位置を指していると判定されるまで繰り返し、該判定が得られると全てのコード列に対する索引データの作成が完了しているので、索引データ作成の処理を終了する。In step S413, it is determined whether the end position of the code string is the end position of the code string. If the end position of the code string is the end position of the code string, the creation of the index data has been completed, and the process ends. If the end position of the code string is not the end position of the code string, the process proceeds to step S414, the index data management pointer is positioned at the next entry in the index data management table, and the code string end position is set at the head position of the code string in step S415. The code position next to the position is set, and the process returns to step S406.
The loop processing from step S406 to step S415 is repeated until it is determined in step S413 that the end position of the code string indicates the end position of the code string, and if the determination is obtained, index data for all code strings is obtained. Since the creation of the index data has been completed, the index data creation process is terminated.

次に、図４Ｂに示すステップＳ４１１の詳細な説明として、本発明の一実施の形態におけるブロック索引データの作成処理を説明する。このブロック索引データの作成処理はどのコード列ブロックに対しても同じであり、コード列ブロックも一つのコード列であることから、以下の説明においては、現在索引データ作成の対象となっているコード列ブロックを、検索対象コード列、あるいは検索対象のコード列という場合もある。 Next, as detailed description of step S411 shown in FIG. 4B, block index data creation processing according to an embodiment of the present invention will be described. Since this block index data creation process is the same for any code string block, and the code string block is also a single code string, in the following description, the code that is currently the target of index data creation The column block may be referred to as a search target code string or a search target code string.

図４Ｃは、本発明の一実施形態におけるブロック索引データを作成する処理の概略フローを説明する図である。
まず、ステップＳ４２０において、検索対象のコード種別数をもとにコード別ＩＤ範囲表の領域を確保すると共に、検索対象コード列に含まれるコードを順次読み出してコード種別毎の出現回数とコードの総数を求める。図３Ａに示す先頭のコード列ブロックの場合のコードの総数は、図４Ａに示すステップＳ４０３で設定した最大コード数と等しい４である。
ステップＳ４２０の処理の詳細は、後に図５Ａを参照して説明する。FIG. 4C is a diagram illustrating a schematic flow of processing for creating block index data according to an embodiment of the present invention.
First, in step S420, the area of the ID range table for each code is secured based on the number of code types to be searched, and the codes included in the search target code string are sequentially read to show the number of appearances and the total number of codes for each code type. Ask for. The total number of codes in the case of the first code string block shown in FIG. 3A is 4, which is equal to the maximum code number set in step S403 shown in FIG. 4A.
Details of the processing in step S420 will be described later with reference to FIG. 5A.

次に、ステップＳ４３０で、コード種別毎の出現回数をもとに、コード別ＩＤ範囲表にコード種別毎のコードＩＤの範囲を設定する。ステップＳ４３０の処理の詳細は、後に図５Ｂを参照して説明する。 Next, in step S430, a code ID range for each code type is set in the code-specific ID range table based on the number of appearances for each code type. Details of the processing in step S430 will be described later with reference to FIG. 5B.

次にステップＳ４４０で、コード総数をもとにＩＤ関係表の領域を確保すると共に、コード別ＩＤ範囲表を参照しながら、検索対象コード列に含まれるコードを順次読み出してＩＤ関係表を完成させ、処理を終了する。ステップＳ４４０の処理の詳細は、後に図５Ｃを参照して説明する。 Next, in step S440, the area of the ID relation table is secured based on the total number of codes, and the codes included in the search target code string are sequentially read out while referring to the code ID range table to complete the ID relation table. The process is terminated. Details of the processing in step S440 will be described later with reference to FIG. 5C.

図５Ａは、図４Ｂに示すステップＳ４２０の処理の詳細なフローを示すものであり、検索対象のコード列に含まれるコードのコード種別毎の出現回数を計数する処理フローを説明する図である。 FIG. 5A shows a detailed flow of the processing in step S420 shown in FIG. 4B, and is a diagram for explaining the processing flow for counting the number of appearances for each code type of codes included in the search target code string.

図に示すように、ステップＳ５０１において、検索対象コード列を設定する。検索対象コード列の設定は、現在索引データ作成の対象となっているコード列ブロックを、図示しない検索対象コード列設定エリアに設定することを意味する。なお、上述の検索対象コード列設定エリアは、先に述べた「処理の途中で得られた各種の値を後の処理で用いるためにそれぞれの処理に応じた一時記憶装置」の１つである。以下の説明では、「図示しない検索対象コード列設定エリアに設定する」のような表現に変えて、「検索対象コード列として設定する」あるいは単に「検索対象コード列に設定する」のように記述することもある。検索対象コード列以外についても同様である。 As shown in the figure, in step S501, a search target code string is set. Setting the search target code string means that the code string block that is currently the target of index data creation is set in a search target code string setting area (not shown). The above-described search target code string setting area is one of the above-described “temporary storage devices corresponding to each process in order to use various values obtained during the process in later processes”. . In the following explanation, instead of the expression “set in a search target code string setting area (not shown)”, it is described as “set as a search target code string” or simply “set as a search target code string”. Sometimes. The same applies to other than the search target code string.

次にステップＳ５０２において、コードの種別数を設定する。コードの種別数は、コード体系により決定されるものであり、予め与えられるものとする。次にステップＳ５０３に進み、図４Ｂに示すステップＳ４１０で確保した領域内に、ステップＳ５０２で設定したコードの種別数をもとにコード別ＩＤ範囲表の格納領域を確保し、出現回数を０に初期化する。続いてステップＳ５０４でコード位置ポインタに、ステップＳ５０１で設定したコード列の先頭位置を設定し、ステップＳ５０５でコード数カウンタに値０を設定する。以上のステップＳ５０１〜ステップＳ５０５の処理は、初期処理である。 In step S502, the number of code types is set. The number of types of codes is determined by the code system and is given in advance. In step S503, a storage area for the ID range table for each code is secured in the area secured in step S410 shown in FIG. 4B based on the number of code types set in step S502, and the number of appearances is set to 0. initialize. Subsequently, in step S504, the head position of the code string set in step S501 is set in the code position pointer, and in step S505, a value 0 is set in the code number counter. The processes in steps S501 to S505 described above are initial processes.

初期処理に続いてステップＳ５０６に進み、コード列より、コード位置ポインタの指すコードを取り出す。次にステップＳ５０７で、取り出したコードのコード種別に対応するコード別ＩＤ範囲表のエントリ（以下、コードの指すコード別ＩＤ範囲表ということがある。）の出現回数に値１を加え、ステップＳ５０８でコード数カウンタに値１を加えてステップＳ５０９に進む。 Progressing to step S506 following the initial processing, the code pointed to by the code position pointer is extracted from the code string. Next, in step S507, a value 1 is added to the number of appearances of an entry in the ID range table for each code corresponding to the code type of the extracted code (hereinafter also referred to as a code-specific ID range table indicated by the code), and step S508 is performed. The value 1 is added to the code number counter, and the process proceeds to step S509.

ステップＳ５０９では、コード位置ポインタが、図４ＢのステップＳ４０８あるいはステップＳ４０９で設定されたコード列の末尾位置であるか判定し、末尾位置でなければステップＳ５１０でコード位置ポインタを次のコード位置に進めてステップＳ５０６に戻る。コード位置ポインタがコード列の末尾位置であれば、ステップＳ５１１でコード総数にコード数カウンタを設定して処理を終了する。
以上の処理により、コード別ＩＤ範囲表の出現回数が設定されると共に、コード総数が設定される。In step S509, it is determined whether the code position pointer is the end position of the code string set in step S408 or step S409 of FIG. 4B. If it is not the end position, the code position pointer is advanced to the next code position in step S510. The process returns to step S506. If the code position pointer is the end position of the code string, the code number counter is set as the total number of codes in step S511, and the process is terminated.
Through the above processing, the number of appearances of the code-specific ID range table is set, and the total number of codes is set.

図５Ｂは、図４Ｃに示すステップＳ４３０の処理の詳細なフローを示すものであり、図５Ａに示す処理により設定された出現回数をもとにコード種別毎のコードＩＤ範囲を設定する処理フローを説明する図である。 FIG. 5B shows a detailed flow of the process in step S430 shown in FIG. 4C, and shows a process flow for setting a code ID range for each code type based on the number of appearances set by the process shown in FIG. 5A. It is a figure explaining.

まずステップＳ５２１において、コード種別ポインタに、コード別ＩＤ範囲表の先頭位置を設定し、次にステップＳ５２２において、コードＩＤカウンタに初期値を設定する。次にステップＳ５２３に進み、コード種別ポインタの指すコード別ＩＤ範囲表より、出現回数を取り出し、ステップＳ５２４で、取り出した出現回数が０か判定する。 First, in step S521, the head position of the code ID range table is set in the code type pointer, and in step S522, an initial value is set in the code ID counter. In step S523, the number of appearances is extracted from the code-specific ID range table indicated by the code type pointer. In step S524, it is determined whether the number of appearances extracted is zero.

出現回数が０でなければ、ステップＳ５２５でコード種別ポインタの指すコード別ＩＤ範囲表の設定表示に「あり」を設定すると共に、先頭コードＩＤとコード別ＩＤカウンタにコードＩＤカウンタの値を設定する。次にステップＳ５２６でコードＩＤカウンタに出現回数を加え、ステップＳ５２７でコード種別ポインタの指すコード別ＩＤ範囲表の末尾コードＩＤに、コードＩＤカウンタの値より１を減じた値を設定してステップＳ５２９に進む。 If the number of appearances is not 0, “Yes” is set in the setting display of the code ID range table pointed to by the code type pointer in step S525, and the value of the code ID counter is set in the head code ID and the code ID counter. . Next, in step S526, the number of appearances is added to the code ID counter, and in step S527, a value obtained by subtracting 1 from the value of the code ID counter is set in the tail code ID of the code ID range table pointed to by the code type pointer, and in step S529. Proceed to

一方、ステップＳ５２４の判定で出現回数が０となった場合は、ステップＳ５２８でコード種別ポインタの指すコード別ＩＤ範囲表の設定表示に「なし」を設定し、ステップＳ５２８ａでコード種別ポインタの指すコード別ＩＤ範囲表の先頭コードＩＤと末尾コードＩＤに未設定ＩＤを設定してステップＳ５２９に進む。未設定ＩＤとしては、０や−１を使うことができる。 On the other hand, if the number of appearances is 0 in the determination in step S524, “None” is set in the setting display of the code ID range table pointed to by the code type pointer in step S528, and the code pointed to by the code type pointer in step S528a. An unset ID is set in the head code ID and the tail code ID of the separate ID range table, and the process proceeds to step S529. As the unset ID, 0 or -1 can be used.

ステップＳ５２９では、コード種別ポインタはコード別ＩＤ範囲表の終端位置であるか判定し、終端位置でなければステップＳ５３０でコード種別ポインタを、コード別ＩＤ範囲表の次のコード種別の位置に進めてステップＳ５２３に戻る。終端位置であれば、コード別ＩＤ範囲表の設定は完了しているので、処理を終了する。 In step S529, it is determined whether the code type pointer is the end position of the code-specific ID range table. If it is not the end position, the code type pointer is advanced to the position of the next code type in the code-specific ID range table in step S530. The process returns to step S523. If it is the end position, since the setting of the code-specific ID range table has been completed, the processing ends.

図５Ｃは、図４Ｃに示すステップＳ４３０の処理の詳細なフローを示すものであり、検索対象コード列に含まれるコードをもとにＩＤ関係表を完成させる処理フローを説明する図である。 FIG. 5C shows a detailed flow of the processing in step S430 shown in FIG. 4C, and is a diagram for explaining the processing flow for completing the ID relationship table based on the codes included in the search target code string.

まずステップＳ５４１で、図４ＢのステップＳ４１０で確保した格納領域内に、図５Ｂに示す処理により求めたコード総数をもとにＩＤ関係表の格納領域を確保し、ステップＳ５４２で、コード位置ポインタに、検索対象コード列の先頭位置を設定する。次にステップＳ５４３で、検索対象コード列より、コード位置ポインタの指すコードを取り出すとともに、最先頭コードに設定する。そして、ステップＳ５４４で、コードの指すコード別ＩＤ範囲表のコード別ＩＤカウンタを読み出し、コードＩＤポインタに設定する。次にステップＳ５４５で、最先頭コードＩＤに、コードＩＤポインタを設定し、ステップＳ５４６に進む。 First, in step S541, a storage area for the ID relation table is secured in the storage area secured in step S410 of FIG. 4B based on the total number of codes obtained by the process shown in FIG. 5B. In step S542, the code position pointer is stored. The head position of the search target code string is set. Next, in step S543, the code pointed to by the code position pointer is extracted from the search target code string, and is set as the first code. In step S544, the code-specific ID counter in the code-specific ID range table pointed to by the code is read and set in the code ID pointer. Next, in step S545, a code ID pointer is set to the topmost code ID, and the process proceeds to step S546.

ステップＳ５４６では、コード位置ポインタは、図４ＢのステップＳ４０８あるいはステップＳ４０９で設定されたコード列の末尾位置か判定し、コード列の末尾位置でなければ、ステップＳ５４７〜ステップＳ５５４の処理を実行し、該当するコードＩＤの指すＩＤ関係表のコード位置と次コードＩＤを設定してステップＳ５４６に戻る。 In step S546, it is determined whether the code position pointer is the end position of the code string set in step S408 or step S409 of FIG. 4B. If the code position pointer is not the end position of the code string, the processing of steps S547 to S554 is executed. The code position and the next code ID in the ID relation table indicated by the corresponding code ID are set, and the process returns to step S546.

まずステップＳ５４７では、コードＩＤポインタの指すＩＤ関係表のコード位置に、コード位置ポインタを設定する。
次にステップＳ５５０で、ステップＳ５４３あるいは後記ステップＳ５５２で取り出したコードの指すコード別ＩＤ範囲表のコード別ＩＤカウンタに１を加え、ステップＳ５５１で、コード位置ポインタを次のコード位置に進める。First, in step S547, a code position pointer is set at the code position in the ID relation table pointed to by the code ID pointer.
In step S550, 1 is added to the code ID counter in the code ID range table pointed to by the code extracted in step S543 or step S552, and the code position pointer is advanced to the next code position in step S551.

次にステップＳ５５２において検索対象コード列より、コード位置ポインタの指すコードを取り出し、ステップＳ５５３で、該取り出したコードの指すコード別ＩＤ範囲表のコード別ＩＤカウンタを読み出し、コードＩＤカウンタの指すＩＤ関係表の次コードＩＤに設定する。 Next, in step S552, the code pointed to by the code position pointer is extracted from the search target code string, and in step S553, the code-specific ID counter in the code-specific ID range table pointed to by the extracted code is read, and the ID relationship pointed to by the code ID counter Set to the next code ID in the table.

次にステップＳ５５４において、ステップＳ５５３で読み出したコード別ＩＤカウンタをコードＩＤカウンタに設定してステップＳ５４６に戻る。以上のステップＳ５４６〜ステップＳ５５４の処理をコード位置ポインタが検索対象コード列の末尾位置を指すまで繰り返し、コード位置ポインタが検索対象コード列の末尾位置、または、終端位置になるとステップＳ５５５に分岐する。ステップＳ５５５では、コードＩＤポインタの指すＩＤ関係表の、コード位置にコード位置ポインタを、次コードＩＤにステップＳ５４５で設定した最先頭コードＩＤを設定して処理を終了する。
以上図４Ａ〜図５Ｃを参照して詳細に説明した処理により、本発明の一実施の形態におけるコード列検索のための索引データが作成される。In step S554, the code ID counter read in step S553 is set in the code ID counter, and the process returns to step S546. The processes in steps S546 to S554 described above are repeated until the code position pointer points to the end position of the search target code string. When the code position pointer reaches the end position or end position of the search target code string, the process branches to step S555. In step S555, the code position pointer is set in the code position of the ID relation table pointed to by the code ID pointer, and the top code ID set in step S545 is set in the next code ID, and the process is terminated.
Index data for code string search according to an embodiment of the present invention is created by the processing described in detail with reference to FIGS. 4A to 5C.

次に、図６〜図９Ｂを参照して、本発明の一実施の形態におけるコード列検索の処理を説明する。本発明の一実施の形態におけるコード列検索は、先に図３Ｂを参照してその概念を説明したとおり、まず検索コード列の先頭コードと一致する検索対象コード列中のコードとその位置であるコード位置を求め、次に求めたコード位置からの検索対象コード列中のコードと検索コード列中のコードとの１コード毎の照合を、コード列ブロック対応に作成されたコード別ＩＤ範囲表とＩＤ関係表を用いて行うものである。 Next, a code string search process according to an embodiment of the present invention will be described with reference to FIGS. As described above with reference to FIG. 3B, the code string search according to the embodiment of the present invention is first the code in the search target code string that matches the head code of the search code string and its position. A code position is obtained, and a code-by-code ID range table corresponding to the code string block is created for each code collation between the code in the search target code string and the code in the search code string from the obtained code position. This is performed using an ID relationship table.

そこで、図６〜図９Ｂを参照した具体的説明に入る前に、本発明の一実施の形態におけるコード列検索処理の処理フローの概要と各図面に記載した処理の関係について説明する。
本発明の一実施の形態におけるコード列検索の処理フローは、３重のループを有する。
最外側のループは、コード列ブロック毎のループである。検索対象コード列の先頭のコード列ブロックから終端のコード列ブロックまで検索コード列による検索を繰り返す。この最外側のループの制御フローは、図６に示されている。
次の内側のループは、検索コード列の先頭コードのコードＩＤ毎のループである。あるコード列ブロックにおいて、検索コード列の先頭コードのコードＩＤの範囲に亘って検索コード列による検索を繰り返す。このループの制御フローは図７Ａ及び図７Ｂに示されている。
最内側のループは、検索コード列の１コード毎のコード列ブロックとの照合のループである。検索コード列の先頭のコードから末尾のコードまでの１コード毎の照合が繰り返される。この最内側のループの制御フローは、完全一致検索に関しては図８Ａに、前方一致検索に関しては図８Ｂに、任意コードを含む検索に関しては図８Ｃに示されている。Therefore, before entering a specific description with reference to FIG. 6 to FIG. 9B, an overview of the processing flow of the code string search processing and the relationship between the processing described in each drawing in the embodiment of the present invention will be described.
The processing flow of the code string search in one embodiment of the present invention has a triple loop.
The outermost loop is a loop for each code string block. The search by the search code string is repeated from the first code string block to the last code string block of the search target code string. The control flow of this outermost loop is shown in FIG.
The next inner loop is a loop for each code ID of the head code of the search code string. In a certain code string block, the search by the search code string is repeated over the range of the code ID of the head code of the search code string. The control flow of this loop is shown in FIGS. 7A and 7B.
The innermost loop is a loop for collation with the code string block for each code of the search code string. The verification for each code from the head code to the tail code of the search code string is repeated. The control flow of this innermost loop is shown in FIG. 8A for an exact match search, in FIG. 8B for a forward match search, and in FIG. 8C for a search that includes an arbitrary code.

本発明の一実施の形態におけるコード列検索処理の処理フローによれば、最外側のループ処理のコード列ブロック毎に次の内側のループ処理が呼び出され、検索コード列の先頭コードのコードＩＤ毎に最内側のループ処理が呼び出されてコード列ブロック中の各コードと検索コード列の先頭のコードから末尾のコードまでの１コード毎の照合が繰り返される。
そして、本発明においては、検索対象コード列がコード列ブロック毎に分割されており、最内側のループ処理において、上記１コード毎の照合を繰り返しているとき、検索コード列の末尾のコードの照合が終わらないうちに、当該コード列ブロックの末尾位置に至ることがありうる。すると、次のコード列ブロックに亘っての上記１コード毎の照合の繰り返しを継続する必要がある。
この１コード毎の照合の繰り返しの継続を実現するのが図９Ａ及び図９Ｂにその処理フローを示す次のコード列ブロックに対する検索処理である。この検索処理は最内側のループ処理により呼び出されるが、１コード毎の照合の繰り返しのために再帰的に当該最内側のループ処理を呼び出す。According to the processing flow of the code string search process in one embodiment of the present invention, the next inner loop process is called for each code string block of the outermost loop process, and for each code ID of the head code of the search code string The innermost loop process is called to repeat the verification for each code from the first code to the last code of each code in the code string block and the search code string.
In the present invention, when the search target code string is divided for each code string block, and the above-mentioned verification for each code is repeated in the innermost loop processing, the verification of the code at the end of the search code string is performed. It is possible that the end position of the code string block may be reached before the process ends. Then, it is necessary to continue the above-mentioned collation for each code over the next code string block.
The search process for the next code string block whose processing flow is shown in FIGS. 9A and 9B realizes the continuation of the verification for each code. This search processing is called by the innermost loop processing, but the innermost loop processing is called recursively for repeated verification for each code.

図６は、先に述べたとおりのものであり、したがって、本発明の一実施の形態におけるコード列検索の処理全体の概略フローを説明する図である。図６に示すフローは、初期処理と、検索対象コード列のうち検索を開始するコード列ブロックを先頭から次のコード列ブロックに順次切り替えて検索するループ処理からなるものである。 FIG. 6 is as described above, and is therefore a diagram for explaining the general flow of the entire code string search process in one embodiment of the present invention. The flow shown in FIG. 6 includes an initial process and a loop process in which a search is started by sequentially switching a code string block to be searched from a search target code string from the top to the next code string block.

まず、ステップＳ６０１において、検索コード列を設定する。この検索コード列の設定は、図２Ｂに示す検索コード列読出手段１１１により読み出された検索コード列を一時記憶領域に設定することにより行われ、その設定された検索コード列の先頭位置は与えられているものとする。
次にステップＳ６０２において、一時記憶領域である検索開始位置の索引データ管理ポインタに、索引データ管理表の先頭のエントリ位置を設定する。
以上で先に述べた初期処理が終了する。First, in step S601, a search code string is set. The search code string is set by setting the search code string read by the search code string reading means 111 shown in FIG. 2B in the temporary storage area, and the start position of the set search code string is given. It is assumed that
In step S602, the head entry position of the index data management table is set in the index data management pointer at the search start position which is a temporary storage area.
This completes the initial processing described above.

次にステップＳ６０３に進み、検索開始位置の索引データポインタの指す索引データ管理表のエントリを取り出し、ステップＳ６０４において、該取り出したエントリの設定表示は「あり」であるかを判定する。設定表示が「あり」であればステップＳ６０５に進み、設定表示が「あり」でなければ全ての検索は終了しているので、処理を終了する。 In step S603, the index data management table entry pointed to by the index data pointer at the search start position is extracted. In step S604, it is determined whether the setting display of the extracted entry is “Yes”. If the setting display is “present”, the process proceeds to step S605. If the setting display is not “present”, all searches are completed, and the process is terminated.

ステップＳ６０５では、ステップＳ６０３で取り出したエントリの索引データポインタを取り出し、索引データポインタの指す索引データの格納領域内に格納されたコード別ＩＤ範囲表とＩＤ関係表を取得する。このコード別ＩＤ範囲表とＩＤ関係表の取得は、図５Ａに示すステップＳ５０３及び図５Ｃに示すステップＳ５４１においてそれぞれコード別ＩＤ範囲表とＩＤ関係表の格納領域を確保したときにそれらの先頭アドレスを指すそれぞれのポインタを設定しておき、それらのポインタを利用することで実現することができる。 In step S605, the index data pointer of the entry extracted in step S603 is extracted, and the code-specific ID range table and ID relation table stored in the index data storage area pointed to by the index data pointer are acquired. The ID range table for each code and the ID relationship table are acquired when the storage areas of the ID range table for each code and the ID relationship table are secured in step S503 shown in FIG. 5A and step S541 shown in FIG. 5C, respectively. This can be realized by setting respective pointers pointing to and using those pointers.

次にステップＳ６０６において、ステップＳ６０３で取り出したエントリの先頭コードを取り出す。そして、ステップＳ６０７で該先頭コードの指すコード別ＩＤ範囲表より先頭コードＩＤを取り出し、検索開始位置の先頭コードＩＤに設定する。
次にステップＳ６０８において、ステップＳ６０５で取得したコード別ＩＤ範囲表とＩＤ関係表をもとに、該当するコード列ブロックを検索する。ステップＳ６０８の処理の詳細は、後に図７Ａ及び図７Ｂを参照して説明する。
次にステップＳ６０９で、検索開始位置の索引管理データポインタに索引データ管理表の次のエントリ位置を設定してステップＳ６０３に戻る。In step S606, the top code of the entry extracted in step S603 is extracted. In step S607, the head code ID is extracted from the ID range table by code pointed to by the head code, and set as the head code ID at the search start position.
Next, in step S608, the corresponding code string block is searched based on the code-specific ID range table and ID relationship table acquired in step S605. Details of the processing in step S608 will be described later with reference to FIGS. 7A and 7B.
In step S609, the next entry position in the index data management table is set in the index management data pointer at the search start position, and the process returns to step S603.

上述のステップＳ６０３〜ステップＳ６０９のループ処理を、ステップＳ６０９において検索開始位置の索引データ管理ポインタを更新しながらステップ６０４で索引データ管理表のエントリの設定表示が「あり」ではないと判定されるまで繰り返す。
なお、上述のステップＳ６０２、Ｓ６０９の検索開始位置の索引データ管理ポインタの設定処理及びステップＳ６０７の先頭コードＩＤの設定処理は、先に述べたように検索を開始するコード列ブロックから次のコード列ブロックに亘って１コード毎の照合が行われる場合があるので、検索を開始するコード列ブロックに係わる索引データ管理ポインタ及び先頭コードＩＤを退避するためのものである。この先頭コードＩＤは、後に図８Ａを参照して説明するように、コードの照合が現在のコード列ブロックの末尾にまで至り、次のコード列ブロックのコードとの照合に進む判定に用いられる。The loop processing from step S603 to step S609 described above is performed until it is determined in step 604 that the index data management table entry setting display is not “present” while the index data management pointer at the search start position is updated in step S609. repeat.
The index data management pointer setting process at the search start position in steps S602 and S609 and the start code ID setting process in step S607 are performed from the code string block starting the search as described above. Since collation for each code may be performed over the block, the index data management pointer and the head code ID related to the code string block for starting the search are saved. As will be described later with reference to FIG. 8A, the head code ID is used for determination that the code collation reaches the end of the current code string block and proceeds to collation with the code of the next code string block.

次に、図７Ａ及び図７Ｂを参照して、図６に示すステップＳ６０８の検索処理について詳細に説明する。
図７Ａは、本発明の一実施の形態における、あるコード列ブロックを検索開始位置のコード列ブロックとして行われるコード列検索の前段の処理フローを説明する図である。
まず、ステップＳ７０１において、検索先頭位置に、検索コード列の先頭位置を設定し、ステップＳ７０２で、検索末尾位置に、検索コード列の末尾位置を設定する。Next, the search processing in step S608 shown in FIG. 6 will be described in detail with reference to FIGS. 7A and 7B.
FIG. 7A is a diagram for explaining the processing flow of the previous stage of code string search performed using a certain code string block as the code string block at the search start position in one embodiment of the present invention.
First, in step S701, the start position of the search code string is set as the search start position, and in step S702, the end position of the search code string is set as the search end position.

次にステップＳ７０３で、検索先頭位置の指す検索コード列より検索コードを取り出し、検索先頭位置の検索コードに設定する。ステップＳ７０４で、検索先頭位置の検索コードの指すコード別ＩＤ範囲表より、設定表示を取り出し、ステップＳ７０５で該取り出した設定表示は「あり」であるか判定する
設定表示が「あり」でなければ、検索先頭位置の検索コードが検索対象コード列中に存在しないということであるから、検索処理を終了する。In step S703, the search code is extracted from the search code string pointed to by the search head position, and set as the search code at the search head position. In step S704, a setting display is extracted from the code-specific ID range table indicated by the search code at the search head position, and in step S705, the setting display for determining whether the extracted setting display is “present” is not “present”. Since the search code at the search start position does not exist in the search target code string, the search process is terminated.

ステップＳ７０５での判定が設定表示は「あり」であれば、ステップＳ７０６に進み、検索先頭位置の検索コードの指すコード別ＩＤ範囲表より先頭コードＩＤを取り出し、検索開始コードＩＤに設定する。次にステップＳ７０７で、検索先頭位置の検索コードの指すコード別ＩＤ範囲表より末尾コードＩＤを取り出して検索終了コードＩＤに設定する。
ステップＳ７０６の処理は、先に述べた検索コード列の先頭コードのコードＩＤ毎のループ処理における処理中のコードＩＤである検索開始コードＩＤを、コードＩＤの範囲の先頭コードＩＤに初期設定するものであり、ステップ７０７の処理は、処理対象のコードＩＤの終端を識別可能とするためのものである。
ステップＳ７０７に引き続き、図７Ｂに示すステップＳ７１１に進む。If the determination in step S705 is that the setting display is “Yes”, the process proceeds to step S706, where the head code ID is extracted from the code-specific ID range table indicated by the search code at the search head position, and is set as the search start code ID. In step S707, the end code ID is extracted from the code-specific ID range table indicated by the search code at the search start position and set as the search end code ID.
In the processing of step S706, the search start code ID, which is the code ID being processed in the loop processing for each code ID of the head code of the search code string described above, is initialized to the head code ID in the range of the code ID. The processing in step 707 is for making it possible to identify the end of the code ID to be processed.
Subsequent to step S707, the process proceeds to step S711 shown in FIG. 7B.

図７Ｂは、本発明の一実施の形態における、あるコード列ブロックを検索開始位置のコード列ブロックとして行われるコード列検索の後段の処理フローを説明する図である。
ステップＳ７１１では、検索進行位置にステップＳ７０１で設定した検索先頭位置を設定する。検索進行位置は、先に述べた図８Ａ等に示す検索コード列の１コード毎のコード列ブロックとの照合ループにおける、照合対象のコードのコード位置を示すものであり、ステップＳ７１１では、検索先頭位置、すなわち検索コード列の先頭位置に初期設定される。
次にステップＳ７１２において、索引データ管理ポインタに、図６に示すステップＳ６０２で設定した検索開始位置の索引データ管理ポインタを設定し、ステップＳ７１３で一時記憶領域である先頭コードＩＤに、図６に示すステップＳ６０７で設定した検索開始位置の先頭コードＩＤを設定する。さらにステップＳ７１４において、検索開始コードＩＤを退避してステップＳ７１５に進む。FIG. 7B is a diagram illustrating a processing flow of the latter stage of code string search performed using a certain code string block as a code string block at a search start position according to an embodiment of the present invention.
In step S711, the search head position set in step S701 is set as the search progress position. The search progress position indicates the code position of the code to be collated in the collation loop with the code string block for each code of the search code string shown in FIG. 8A and the like described above. In step S711, the search head Initially set to the position, that is, the start position of the search code string.
Next, in step S712, the index data management pointer at the search start position set in step S602 shown in FIG. 6 is set in the index data management pointer, and in step S713, the head code ID which is a temporary storage area is shown in FIG. The head code ID of the search start position set in step S607 is set. In step S714, the search start code ID is saved and the process proceeds to step S715.

ここで検索開始コードＩＤを退避するのは、ステップＳ７１５の処理において、先に述べたように複数のコード列ブロックに亘ってコード列の照合が行われる可能性がある。その場合には再帰的に図８Ａ等の処理が呼び出され、その際に次のコード列ブロックの先頭のコードの指すコード別ＩＤ範囲表（次のコード列ブロックに対応したもの）の先頭コードＩＤに検索開始コードＩＤが変更される可能性があるからである。 Here, the search start code ID is saved because there is a possibility that the code string is collated over a plurality of code string blocks as described above in the process of step S715. In that case, the process of FIG. 8A and the like is recursively called, and at that time, the first code ID of the ID range table by code pointed to by the first code of the next code string block (corresponding to the next code string block) This is because the search start code ID may be changed.

ステップＳ７１５では、先に述べた、コード列ブロック中の各コードと検索コード列の先頭のコードから末尾のコードまでの１コード毎の照合による検索を行う。そして、検索が成功であったか失敗であったかを返す。ステップＳ７１５の詳細については、完全一致検索に関しては図８Ａ、前方一致検索に関しては図８Ｂ、任意コードを含む検索に関しては図８Ｃを参照して後に説明する。 In step S715, the above-described search is performed by collating each code from the code in the code string block to the code at the beginning of the search code string to the code at the end. It returns whether the search was successful or unsuccessful. Details of step S715 will be described later with reference to FIG. 8A regarding complete match search, FIG. 8B regarding forward match search, and FIG. 8C regarding search including an arbitrary code.

次にステップＳ７１６において、ステップＳ７１４で退避した検索開始コードＩＤを復元する。そしてステップＳ７１７において、検索開始位置の索引データ管理ポインタの指す索引データ管理表のエントリを取り出し、ステップＳ７１８で、該取り出したエントリの索引データポインタの指す索引データの格納領域内に格納されたコード別ＩＤ範囲表とＩＤ関係表を取得する。上記ステップＳ７１７とステップＳ７１８の処理は、先に述べたように、ステップＳ７１５の処理において複数のコード列ブロックに亘ってコード列の照合が行われる可能性があり、その場合には、図６に示すステップＳ６０５で取得したコード別ＩＤ範囲表とＩＤ関係表とは別のコード別ＩＤ範囲表とＩＤ関係表が用いられているので、図６のステップＳ６０２あるいはステップＳ６０７で設定した検索開始位置の索引データ管理ポインタを用いて再度コード別ＩＤ範囲表とＩＤ関係表を取得するものである。 In step S716, the search start code ID saved in step S714 is restored. In step S717, an entry in the index data management table pointed to by the index data management pointer at the search start position is extracted. In step S718, each code stored in the index data storage area pointed to by the index data pointer in the extracted entry is stored. An ID range table and an ID relationship table are acquired. As described above, in the processes in steps S717 and S718, there is a possibility that code strings are collated over a plurality of code string blocks in the process in step S715. In this case, in FIG. Since the code-specific ID range table and ID relationship table that are different from the code-specific ID range table and ID relationship table acquired in step S605 are used, the search start position set in step S602 or step S607 of FIG. The code-specific ID range table and ID relation table are obtained again using the index data management pointer.

次にステップＳ７１９に進み、ステップＳ７１５における検索は成功であったか失敗であったかを判定する。失敗であればステップＳ７２１に進み、成功であれば、ステップＳ７２０で、検索開始コードＩＤの指すＩＤ関係表よりコード位置を取り出し、検索結果コード位置に出力してステップＳ７２１に進む。 In step S719, it is determined whether the search in step S715 was successful or unsuccessful. If unsuccessful, the process proceeds to step S721. If successful, in step S720, the code position is extracted from the ID relationship table pointed to by the search start code ID, and is output to the search result code position, and the process proceeds to step S721.

ステップＳ７２１では、検索開始コードＩＤは検索終了コードＩＤと一致するか判定し、一致しなければ、ステップＳ７２２で、検索開始コードＩＤを次のコードＩＤに更新してステップＳ７１１に戻る。
検索開始コードＩＤと検索終了コードＩＤが一致すれば、現在処理中のコード列ブロックにおける、検索コード列の先頭コードが指すコード別ＩＤ範囲表のコードＩＤの範囲の検索が終了しているので、図６に示す処理に戻る。In step S721, it is determined whether the search start code ID matches the search end code ID. If not, the search start code ID is updated to the next code ID in step S722, and the process returns to step S711.
If the search start code ID and the search end code ID match, the search of the code ID range in the code-specific ID range table pointed to by the first code of the search code string in the currently processed code string block has been completed. Returning to the processing shown in FIG.

次に、図８Ａ、図８Ｂ及び図８Ｃを参照して、図７Ｂに示すステップＳ７１５の処理について詳細に説明する。先に述べたように、検索の態様が完全一致検索であるか、前方一致検索であるか、あるいは任意コードを含む検索であるかによって、ステップＳ７１５の処理は、図８Ａ、図８Ｂあるいは図８Ｃに例示するものとなる。 Next, with reference to FIG. 8A, FIG. 8B, and FIG. 8C, the process of step S715 shown to FIG. 7B is demonstrated in detail. As described above, the process in step S715 is performed as shown in FIG. 8A, FIG. 8B, or FIG. 8C depending on whether the search mode is a complete match search, a forward match search, or a search including an arbitrary code. It will be illustrated as follows.

図８Ａは、本発明の一実施の形態における、完全一致検索の処理フローを説明する図である。
図に示すように、ステップＳ８１０でコードＩＤポインタに検索開始コードＩＤを設定する。この検索開始コードＩＤは、図７Ａに示すステップＳ７０６で初期設定されたか、あるいは図７Ｂに示すステップＳ７２２で更新され設定されたものである。次にステップＳ８１１において、コードＩＤポインタの指すＩＤ関係表より次コードＩＤを取り出し、検索コードＩＤに設定するとともに、コードＩＤポインタに設定する。FIG. 8A is a diagram for explaining the processing flow of an exact match search in one embodiment of the present invention.
As shown in the figure, the search start code ID is set in the code ID pointer in step S810. This search start code ID is either initially set in step S706 shown in FIG. 7A or updated and set in step S722 shown in FIG. 7B. Next, in step S811, the next code ID is extracted from the ID relationship table pointed to by the code ID pointer, set as a search code ID, and set as a code ID pointer.

次にステップＳ８１２で検索進行位置は検索末尾位置か判定し、検索末尾位置でなければステップＳ８１３に進み、検索末尾位置であれば、１コード毎の照合が検索コード列の末尾まで成功したことになるので、検索成功を返して図７Ｂに示すループ処理に戻る。
ステップＳ８１３では、ステップＳ８１１で取り出した次コードＩＤは先頭コードＩＤと一致するか判定する。先頭コードＩＤは、図７Ｂに示すステップＳ７１３で設定したものである。次コードＩＤと先頭コードＩＤが一致しなければ、ステップＳ８１４に進み、検索進行位置を、検索コード列の次の検索コードの位置に進め、ステップＳ８１５で、検索進行位置の指す検索コード列より検索コードを取り出し、ステップＳ８１６で、該取り出した検索コードの指すコード別ＩＤ範囲表より先頭コードＩＤと末尾コードＩＤを取り出す。Next, in step S812, it is determined whether the search progress position is the search end position. If it is not the search end position, the process proceeds to step S813. If it is the search end position, it is confirmed that the collation for each code succeeds to the end of the search code string. Therefore, the search success is returned and the process returns to the loop process shown in FIG. 7B.
In step S813, it is determined whether the next code ID extracted in step S811 matches the head code ID. The head code ID is set in step S713 shown in FIG. 7B. If the next code ID and the head code ID do not match, the process proceeds to step S814, the search progress position is advanced to the position of the next search code in the search code string, and the search is performed from the search code string indicated by the search progress position in step S815. In step S816, the head code ID and the tail code ID are extracted from the code-specific ID range table pointed to by the extracted search code.

そしてステップＳ８１７において、ステップＳ８１１で設定した検索コードＩＤがステップＳ８１６で取り出した先頭コードＩＤと末尾コードＩＤの範囲内か判定し、範囲内であればステップＳ８１１に戻り、範囲内でなければ照合が取れなかったコードが存在したことになるので、検索失敗を返して図７Ｂに示すループ処理に戻る。 In step S817, it is determined whether the search code ID set in step S811 is within the range of the head code ID and end code ID extracted in step S816. If it is within the range, the process returns to step S811, and if it is not within the range, verification is performed. Since there was a code that could not be obtained, a search failure is returned and the processing returns to the loop processing shown in FIG. 7B.

一方、ステップＳ８１３で、次コードＩＤと先頭コードＩＤが一致すると判定されると、ステップＳ８１８に進み、次のコード列ブロックを検索する。ステップＳ８１８の処理の詳細は、後に図９Ａ及び図９Ｂを参照して説明する。 On the other hand, if it is determined in step S813 that the next code ID and the head code ID match, the process proceeds to step S818 to search for the next code string block. Details of the processing in step S818 will be described later with reference to FIGS. 9A and 9B.

次にステップＳ８１９において、次のコード列ブロックの検索は成功であるか判定する。成功であれば検索成功を返し、成功でなければ検索失敗を返して図７Ｂに示すループ処理に戻る。 In step S819, it is determined whether the search for the next code string block is successful. If successful, search success is returned. If not successful, search failure is returned and the process returns to the loop processing shown in FIG. 7B.

図８Ｂは、本発明の一実施の形態における前方一致検索の処理フローを説明する図である。図８Ａに示す完全一致検索の処理フローと比較すると、図８Ｂに示すステップＳ８３０〜ステップＳ８３８の各ステップで実行する処理自体は、そのステップ番号から２０を引いたステップ番号の、図８Ａに示すステップＳ８１０〜ステップＳ８１８の各ステップで実行する処理と同じである。 FIG. 8B is a diagram for explaining the process flow of the forward match search in one embodiment of the present invention. Compared with the processing flow of the exact match search shown in FIG. 8A, the processing itself executed in each step of steps S830 to S838 shown in FIG. 8B is the step shown in FIG. 8A with the step number obtained by subtracting 20 from the step number. This is the same as the process executed in steps S810 to S818.

しかし、図８Ａに示す完全一致検索のステップＳ８１７では、検索コードＩＤは先頭コードＩＤと末尾コードＩＤの範囲内ではないと判定すると検索失敗を返して図７Ｂに示す処理に戻るのに対して、図８Ｂに示す前方一致検索のステップＳ８３７では、検索コードＩＤは先頭コードＩＤと末尾コードＩＤの範囲内ではないと判定しても、検索成功を返して図７Ｂに示すループ処理に戻る。
なお、ステップＳ８３１において、コードＩＤポインタの指すＩＤ関係表のエントリより、次コードＩＤに加えてコード位置を順次取り出しておき、ステップＳ８３７において検索コードは索引コードと一致しないと判定したとき、ステップＳ８３１で最後に取り出したコード位置を、検索成功と共に検索結果として返してもよい。この最後に取り出したコード位置は、上述のステップＳ８３７での判定が否定的となったときの検索コードＩＤである次コードＩＤとＩＤ関係表の同一のエントリに含まれるコード位置であり、言い換えれば、ステップＳ８３７での判定が肯定的となる最後の検索コードＩＤの指すＩＤ関係表のエントリに格納されたコード位置である。
したがって、このコード位置に位置する検索対象コード列のコードまでは、検索コード列の検索コードと一致している。上記最後に取り出したコード位置と、図７Ｂに示すステップＳ７２０で検索開始コードの指すＩＤ関係表から取り出すコード位置を検索結果コード位置として出力することにより、検索コード列と前方一致する検索対象コード列のコード位置の範囲を知ることができる。However, in step S817 of the exact match search shown in FIG. 8A, if it is determined that the search code ID is not within the range of the start code ID and the end code ID, a search failure is returned and the process returns to the process shown in FIG. 7B. In step S837 of the forward match search shown in FIG. 8B, even if it is determined that the search code ID is not within the range of the start code ID and the end code ID, search success is returned and the process returns to the loop processing shown in FIG. 7B.
In step S831, code positions are sequentially extracted in addition to the next code ID from the entry in the ID relationship table pointed to by the code ID pointer. When it is determined in step S837 that the search code does not match the index code, step S831 is executed. The code position extracted last may be returned as a search result together with a successful search. This last extracted code position is the code position included in the same entry in the ID relation table and the next code ID that is the search code ID when the determination in step S837 is negative. The code position stored in the entry of the ID relationship table indicated by the last search code ID for which the determination in step S837 is positive.
Therefore, the code of the search target code string located at this code position matches the search code of the search code string. The last extracted code position and the code position extracted from the ID relationship table pointed to by the search start code in step S720 shown in FIG. You can know the range of code positions.

また、図８Ａに示す完全一致検索のステップＳ８１８で次のコード列ブロックを検索したのち、ステップＳ８１９で次のコード列ブロックの検索は成功であるか判定し、成功であれば検索成功を返し、成功でなければ検索失敗を返して図７Ｂに示すループ処理に戻るのに対して、図８Ｂに示す前方一致検索においては、ステップＳ８３８で次のコード列ブロックを検索したのち、直ちに検索成功を返して図７Ｂに示すループ処理に戻る。 Further, after searching for the next code string block in step S818 of the exact match search shown in FIG. 8A, it is determined in step S819 whether the search for the next code string block is successful. If the search is not successful, a search failure is returned and the process returns to the loop processing shown in FIG. 7B. On the other hand, in the forward match search shown in FIG. 8B, the search succeeds immediately after searching for the next code string block in step S838. Then, the processing returns to the loop processing shown in FIG. 7B.

これは、図７Ａに示すステップＳ７０５の判定処理により、検索コード列中の先頭の検索コードが検索対象コード列中に存在することが保証されており、したがって、少なくとも検索コード列の先頭のコードまでは一致するコード列が検索対象コード列に存在するので、検索成功を返して図７Ｂに示すループ処理に戻る。
上述のステップＳ８３７での判定後のリターン種別及びステップＳ８３８以降の処理以外に関しては、先に述べたように全て図８Ａに示すものと同じであるので、その説明は省略する。This is because it is ensured by the determination processing in step S705 shown in FIG. 7A that the first search code in the search code string exists in the search target code string, and therefore at least the first code in the search code string is included. Since a matching code string exists in the search target code string, the search is returned to success and the process returns to the loop processing shown in FIG. 7B.
Since the return type after the determination in step S837 and the processing other than step S838 and subsequent steps are all the same as those shown in FIG. 8A as described above, description thereof will be omitted.

図８Ｃは、本発明の一実施の形態における任意コードを含む検索の処理フローを説明する図である。ここで任意コードとは、検索対象コード列の任意のコードと照合するコードである。検索コード列が任意コードを含み、任意コード以外の全てのコードが一致するコード列が検索対象コード列に存在すれば、その検索対象コード列は、前記任意コードを含む検索コード列でヒットする。 FIG. 8C is a diagram illustrating a search processing flow including an arbitrary code according to the embodiment of this invention. Here, the arbitrary code is a code that is collated with an arbitrary code in the search target code string. If the search code string includes an arbitrary code and a code string that matches all the codes other than the arbitrary code exists in the search target code string, the search target code string is hit with the search code string including the arbitrary code.

図８Ｃに示すフローを図８Ａに示す完全一致検索の処理フローと比較すると、図８Ｃに示すステップＳ８５０〜ステップＳ８５９の各ステップで実行する処理は、ステップＳ８５５ａの処理がステップＳ８５５とステップＳ８５６の間に挿入されている以外は、そのステップ番号から４０を引いたステップ番号の、図８Ａに示すステップＳ８１０〜ステップＳ８１９の各ステップで実行する処理と全く同じである。 When the flow shown in FIG. 8C is compared with the processing flow of the exact match search shown in FIG. 8A, the processing executed in each step from step S850 to step S859 shown in FIG. 8C is the processing in step S855a between step S855 and step S856. Is the same as the processing executed in steps S810 to S819 shown in FIG. 8A for the step number obtained by subtracting 40 from the step number.

ステップＳ８５５ａでは、ステップＳ８５５で取り出した検索コードは任意コードか判定する。ステップＳ８５５ａで任意コードと判定されると、ステップＳ８５６及びステップＳ８５７のコードＩＤの範囲の判定処理を経ることなくステップＳ８５１に戻る。ステップＳ８５５ａで任意コードと判定されなければ、ステップＳ８５６に進む。
上述のステップＳ８５５ａでの判定処理以外は、先に述べたように全て図８Ａに示すものと同じであるので、その説明は省略する。In step S855a, it is determined whether the search code extracted in step S855 is an arbitrary code. If it is determined in step S855a that the code is an arbitrary code, the process returns to step S851 without performing the code ID range determination process in steps S856 and S857. If it is not determined in step S855a that the code is an arbitrary code, the process proceeds to step S856.
Except for the determination process in step S855a described above, all the steps are the same as those shown in FIG.

次に、図８Ａに示すステップＳ８１８、図８Ｂに示すステップＳ８３８、あるいは図８Ｃに示すステップＳ８５８の次のコード列ブロックの検索処理について詳細に説明する。
図９Ａは、本発明の一実施の形態における次のコード列ブロックの検索の処理フローの前段を説明する図である。Next, the search processing for the next code string block in step S818 shown in FIG. 8A, step S838 shown in FIG. 8B, or step S858 shown in FIG. 8C will be described in detail.
FIG. 9A is a diagram for explaining the first stage of the processing flow for searching for the next code string block in the embodiment of the present invention.

図に示すように、ステップＳ９０１で、索引データ管理ポインタに索引データ管理表の次のエントリ位置を設定する。このとき索引データ管理ポインタには、図７Ｂに示すステップＳ７１２において、検索開始位置の索引データ管理ポインタが設定されている。次にステップＳ９０２に進み、該索引データ管理ポインタの指す索引データ管理表のエントリを取り出し、ステップＳ９０３において、該取り出したエントリの設定表示は「あり」であるかを判定する。 As shown in the figure, in step S901, the next entry position of the index data management table is set in the index data management pointer. At this time, the index data management pointer at the search start position is set as the index data management pointer in step S712 shown in FIG. 7B. In step S902, an entry in the index data management table pointed to by the index data management pointer is extracted. In step S903, it is determined whether the setting display of the extracted entry is “Yes”.

設定表示が「あり」であればステップＳ９０４に進み、設定表示が「あり」でなければそれ以上コード列ブロックは存在せず、１コード毎の照合が途中で中断されることになるので、検索失敗を返して図８Ａ、図８Ｂあるいは図８Ｃの処理に戻る。 If the setting display is “Yes”, the process proceeds to step S904. If the setting display is not “Yes”, there are no more code string blocks, and verification for each code is interrupted. The process returns to the process of FIG. 8A, FIG. 8B or FIG.

一方、ステップＳ９０３においてエントリの設定表示は「あり」であると判定され、ステップＳ９０４に進むと、ステップＳ９０２で取り出した索引管理表のエントリの先頭コードを取り出し、一時記憶領域である先頭コードに設定する。次にステップＳ９０５で、検索進行位置を、検索コード列の次の検索コードの位置に進め、ステップＳ９０６で、検索進行位置の指す検索コード列より検索コードを取り出し、ステップＳ９０７に進む。 On the other hand, in step S903, it is determined that the entry setting display is “Yes”, and when the process proceeds to step S904, the head code of the entry in the index management table taken out in step S902 is taken out and set as the head code which is a temporary storage area. To do. Next, in step S905, the search progress position is advanced to the position of the next search code in the search code string. In step S906, the search code is extracted from the search code string indicated by the search progress position, and the process proceeds to step S907.

ステップＳ９０７では、ステップＳ９０４で設定した先頭コードとステップＳ９０６で取り出した検索コードが一致するかを判定する。この判定は、次のコード列ブロックの先頭位置のコードと検索コード列の検索進行位置にあるコードとの照合である。この判定結果が否定的なものであれば、検索失敗を返して図８Ａ、図８Ｂあるいは図８Ｃに示す処理に戻る。
一方、ステップＳ９０７での判定結果が肯定的なものであれば、図９Ｂに示すステップＳ９１１以降の処理に進み、１コード毎の照合をさらに進める。In step S907, it is determined whether the head code set in step S904 matches the search code extracted in step S906. This determination is a comparison between the code at the head position of the next code string block and the code at the search progress position of the search code string. If this determination result is negative, search failure is returned and the processing returns to FIG. 8A, FIG. 8B or FIG. 8C.
On the other hand, if the determination result in step S907 is affirmative, the process proceeds to step S911 and subsequent steps shown in FIG. 9B, and collation for each code is further advanced.

図９Ｂは、本発明の一実施の形態における次のコード列ブロックの検索の処理フローの後段を説明する図である。
ステップＳ９１１では、図９Ａに示すステップＳ９０２で先に取り出したエントリの索引データポインタの指す索引データの格納領域内に格納されたコード別ＩＤ範囲表とＩＤ関係表を取得する。FIG. 9B is a diagram for explaining the latter part of the processing flow of the search for the next code string block in the embodiment of the present invention.
In step S911, the code-specific ID range table and ID relation table stored in the storage area of the index data pointed to by the index data pointer of the entry previously extracted in step S902 shown in FIG. 9A are acquired.

次にステップＳ９１２で、ステップＳ９０４で設定した先頭コードの指すコード別ＩＤ表より先頭コードＩＤを取り出し、一時記憶領域である先頭コードＩＤに設定し、ステップＳ９１３で該先頭コードＩＤを検索開始コードＩＤに設定してステップＳ９１４に進む。 Next, in step S912, the head code ID is extracted from the code-specific ID table pointed to by the head code set in step S904, set to the head code ID that is a temporary storage area, and in step S913, the head code ID is set as the search start code ID. Then, the process proceeds to step S914.

ステップＳ９１４では、図８Ａ、図８Ｂあるいは図８Ｃに示す処理を再帰的に呼び出し、コード列ブロック中の各コードと検索コード列の先頭のコードから末尾のコードまでの１コード毎の照合による検索を行う。そして、検索が成功であったか失敗であったかを返す。
ステップＳ９１５では、ステップＳ９１４での検索が成功であれば検索成功を、失敗であれば検索失敗を返して、図８Ａ、図８Ｂあるいは図８Ｃに示す処理に戻る。In step S914, the process shown in FIG. 8A, FIG. 8B, or FIG. 8C is recursively called, and a search is performed by collating each code from the first code to the last code of each code in the code string block and the search code string. Do. It returns whether the search was successful or unsuccessful.
In step S915, if the search in step S914 is successful, search success is returned, and if it is unsuccessful, search failure is returned, and the process returns to the process shown in FIG. 8A, FIG. 8B, or FIG.

以上、本発明の実施形態について詳細に説明した。以下においては、本発明についての理解をさらに容易にするために、図１０Ａ〜図１０Ｃを参照して本発明の一実施の形態におけるコード列検索のうち完全一致検索の処理の流れについて説明する。図１０Ａ〜図１０Ｃに例示すものは、検索対象コード列を図３Ａに示すもののうち２番目までのコード列ブロックまでのものとし、検索コード列をＡＢＣとしたものである。以下において、上記検索対象コード列を、図３Ａの表記と同様に、検索対象コード列１０ａと表記することがある。 The embodiment of the present invention has been described in detail above. In the following, in order to further facilitate understanding of the present invention, the flow of the exact match search process in the code string search according to the embodiment of the present invention will be described with reference to FIGS. 10A to 10C. In the examples shown in FIGS. 10A to 10C, the search target code string is one up to the second code string block of those shown in FIG. 3A, and the search code string is ABC. Hereinafter, the search target code string may be referred to as a search target code string 10a in the same manner as in FIG. 3A.

図１０Ａと図１０Ｂは、検索対象コード列の先頭のコード列ブロックからの処理の流れを説明する図であり、図６に示す最外側のループ処理については、１順目のループ処理に相当する。
図１０Ａは、そのうちの先頭のコード列ブロックを対象とした検索の流れを説明するものである。
図において、符号７０１ａを付した点線で囲ったブロックには、検索コード列ＡＢＣの各検索コードを先頭から処理する流れが記載されている。言い換えれば、該ブロック７０１ａは、検索進行位置のコードの変化を示すものである。符号７０２ａを付した点線で囲ったブロックには、その検索進行位置のコードの指すコード別ＩＤ範囲表３０９ａのコードＩＤの範囲と、コード列ブロックの先頭位置にあるコードＡの指すコード別ＩＤ範囲表３０９ａの先頭コードＩＤであるＩＤ１が記載されている。符号７０３ａを付した点線で囲ったブロックには、ＩＤ関係表３１０ａから順次次コードを求める流れが記載されている。
また、図中括弧書きで示しているのは、図に示す処理の流れに関連する図６〜図９Ｂに示す処理ステップである。10A and 10B are diagrams for explaining the flow of processing from the top code string block of the search target code string. The outermost loop process shown in FIG. 6 corresponds to the first loop process. .
FIG. 10A illustrates the search flow for the first code string block.
In the figure, a block surrounded by a dotted line denoted by reference numeral 701a describes a flow for processing each search code of the search code string ABC from the top. In other words, the block 701a indicates a change in the code of the search progress position. A block surrounded by a dotted line with a reference numeral 702a includes a code ID range in the code-specific ID range table 309a pointed to by the code at the search progress position and a code-specific ID range pointed to by the code A at the head position of the code string block. ID1 which is the head code ID of the table 309a is described. A block surrounded by a dotted line denoted by reference numeral 703a describes a flow for sequentially obtaining the next code from the ID relation table 310a.
Also, what is shown in parentheses in the figure are the processing steps shown in FIGS. 6 to 9B related to the processing flow shown in the figure.

検索を開始する前の処理として、図の矢印７３１ａに示すように、図６の（以下の説明では、図面番号の表記は省略する。）ステップＳ６０３で、索引データ管理表の先頭のエントリ７０４ａが取り出される。そして、図の矢印７３４ａに示すように、ステップＳ６０５で該エントリの索引データポインタ７３３ａに基づき索引データの格納領域７０５ａ内に格納されたコード別ＩＤ範囲表３０９ａとＩＤ関係表３１０ａが取得される。そして、点線の矢印７３５ａに示すように、ステップＳ６０６及びＳ６０７で、該エントリ７０４ａの先頭コード７３２ａに格納されたコードＡに対応する、コード別ＩＤ範囲表３０９ａのエントリ３０９ａ（Ａ）が読み出され、先頭コードＩＤであるＩＤ１が読み出されて、先頭コードＩＤ７４２ａに設定される。 As a process before starting the search, as shown by an arrow 731a in the figure, in step S603 of FIG. 6 (in the following description, the drawing number notation is omitted), the top entry 704a of the index data management table is displayed. It is taken out. Then, as shown by an arrow 734a in the figure, in step S605, the code-specific ID range table 309a and the ID relation table 310a stored in the index data storage area 705a are acquired based on the index data pointer 733a of the entry. Then, as indicated by the dotted arrow 735a, in steps S606 and S607, the entry 309a (A) in the code-specific ID range table 309a corresponding to the code A stored in the head code 732a of the entry 704a is read. The head code ID ID1 is read and set to the head code ID 742a.

最初に検索コード列の先頭に位置するコードＡがブロック７０１ａに示すようにステップＳ７０３で取り出され、ブロック７０２ａへの矢印７２３ａで示すように、コードＡの指すコード別ＩＤ範囲表３０９ａの先頭コードＩＤであるＩＤ１がステップＳ７０６で取り出されて検索開始コードＩＤに設定される。また末尾コードＩＤであるＩＤ１がステップＳ７０７で取り出されて検索終了コードＩＤに設定される。
次に、ブロック７０２ａのＩＤ１からブロック７０３ａへの矢印７２４ａで示すように、ＩＤ１の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ３がステップＳ８１０及びステップＳ８１１により取り出される。そして、ブロック７０３ａの、ＩＤ１の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ３と、ブロック７０２ａの、先頭コードＩＤ７４２ａの間の双方向の点線の矢印７３６ａで示すように、ステップＳ８１３において、次コードＩＤであるＩＤ３は先頭コードＩＤであるＩＤ１とは異なることが判定される。First, the code A located at the head of the search code string is extracted in step S703 as indicated by the block 701a, and as indicated by the arrow 723a to the block 702a, the head code ID of the code-specific ID range table 309a indicated by the code A is shown. ID1 is extracted in step S706 and set as the search start code ID. Further, ID1, which is the tail code ID, is extracted in step S707 and set as the search end code ID.
Next, as indicated by an arrow 724a from ID1 of the block 702a to the block 703a, ID3 which is the next code ID of the ID relationship table 310a pointed to by ID1 is extracted in steps S810 and S811. Then, in step S813, as shown by a bidirectional dotted arrow 736a between ID3 which is the next code ID of the ID relation table 310a pointed to by ID1 in the block 703a and the head code ID 742a in the block 702a, in step S813 It is determined that ID3, which is an ID, is different from ID1, which is the head code ID.

すると、ブロック７０１ａのコードＡからコードＢへの矢印７２１ａに示すように、ステップＳ８１４で次のコード位置のコードが処理対象となり、ステップＳ８１５でコードＢが取り出される。ブロック７０２ａへの矢印７２３ｂで示すように、ステップＳ８１６でコードＢの指すコード別ＩＤ範囲表３０９ａの先頭コードＩＤであるＩＤ３と末尾コードＩＤであるＩＤ３がコードＩＤ範囲として取り出される。 Then, as indicated by an arrow 721a from code A to code B in block 701a, the code at the next code position becomes the processing target in step S814, and code B is extracted in step S815. As indicated by the arrow 723b to the block 702a, ID3 that is the first code ID and ID3 that is the last code ID in the code-specific ID range table 309a pointed to by the code B are extracted as a code ID range in step S816.

そして、ブロック７０３ａの、ＩＤ１の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ３とブロック７０２ａの、コードＢの指すコード別ＩＤ範囲表３０９ａのコードＩＤ範囲の間の双方向の点線の矢印７２５ｂで示すように、ステップＳ８１７においてＩＤ１の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ３がコードＢの指すコード別ＩＤ範囲表３０９ａのコードＩＤ範囲であることが判定される。
すると次に、ブロック７０３ａ内の矢印７２４ｂで示すように、ＩＤ３の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ４がステップＳ８１１で取り出される。そして、ブロック７０３ａの、ＩＤ３の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ４と、ブロック７０２ａの、先頭コードＩＤ７４２ａの間の双方向の点線の矢印７３６ｂで示すように、ステップＳ８１３において、次コードＩＤであるＩＤ４は先頭コードＩＤであるＩＤ１とは異なることが判定される。Then, a bidirectional dotted arrow 725b between ID3 which is the next code ID of the ID relation table 310a pointed to by ID1 of the block 703a and the code ID range of the code ID range table 309a pointed to by the code B of the block 702a. As shown, it is determined in step S817 that ID3, which is the next code ID of the ID relationship table 310a pointed to by ID1, is the code ID range of the code-specific ID range table 309a pointed to by code B.
Then, as indicated by an arrow 724b in the block 703a, ID4 which is the next code ID of the ID relation table 310a pointed to by ID3 is extracted in step S811. Then, in step S813, as shown by a bidirectional dotted arrow 736b between ID4 which is the next code ID of the ID relationship table 310a pointed to by ID3 in block 703a and the head code ID 742a in block 702a, the next code It is determined that ID4, which is the ID, is different from ID1, which is the head code ID.

次にブロック７０１ａのコードＢからコードＣへの矢印７２１ｂに示すように、ステップＳ８１４で次のコード位置のコードが処理対象となり、ステップＳ８１５でコードＣが取り出される。ブロック７０２ａへの矢印７２３ｃで示すように、ステップＳ８１６でコードＣの指すコード別ＩＤ範囲表３０９ａの先頭コードＩＤと末尾コードＩＤがコードＩＤ範囲として取り出される。しかし、図に示すように、コードＣは先頭のコード列ブロックには存在せず、先頭コードＩＤと末尾コードＩＤには有意なコードＩＤは格納されていない（図５Ｂに示すステップＳ５２８ａで未設定ＩＤが設定されている。）ので、双方向の点線の矢印７２５ｃで示すステップＳ８１３での判定は、次コードＩＤはコード範囲の範囲外となり、検索失敗となる。そこで、検索失敗を返して図７Ｂに示すループ処理に戻る。
つまり、検索対象コード列１０ａの先頭のコード列ブロックのうち、コードＩＤがＩＤ１であるコードＡからのコード列は、検索コード列ＡＢＣと一致しないことを示している。これは、検索対象コード列１０ａの先頭のコード列ブロックのうち、コードＩＤがＩＤ１であるコードＡからの３コードのコード列は、図３Ａの（ａ）に示すようにＡＢＥであり、ＡＢＣではないことに整合している。Next, as indicated by the arrow 721b from the code B to the code C in the block 701a, the code at the next code position becomes the processing target in step S814, and the code C is extracted in step S815. As indicated by the arrow 723c to the block 702a, the head code ID and the tail code ID of the code-specific ID range table 309a pointed to by the code C are extracted as the code ID range in step S816. However, as shown in the figure, the code C does not exist in the head code string block, and no significant code ID is stored in the head code ID and the tail code ID (not set in step S528a shown in FIG. 5B). Therefore, the determination in step S813 indicated by the bidirectional dotted arrow 725c is that the next code ID is out of the code range and the search fails. Therefore, the search failure is returned and the processing returns to the loop processing shown in FIG. 7B.
That is, the code string from the code A whose code ID is ID1 among the code string blocks at the head of the search target code string 10a does not match the search code string ABC. This is because the code string of 3 codes from the code A whose code ID is ID1 in the head code string block of the search target code string 10a is ABE as shown in FIG. Consistent with not being.

図１０Ｂに示すのは、検索コード列ＡＢＣの検索開始コードＩＤを、ステップＳ７２２でコードＡのＩＤ１の次のコードＩＤであるＩＤ２として先頭のコード列ブロックから検索する流れである。図７Ｂに示すループ処理では、図１０Ａに示すものは１順目の処理であり、図１０Ｂに示す処理は２順目の処理である。
そして、この２順目の処理においては、検索対象コード列と検索コード列間の照合が先頭のコード列ブロックの次のコード列ブロックに亘って行われる。FIG. 10B shows a flow in which the search code ID ABC of the search code string ABC is searched from the top code string block as ID2 which is the code ID next to ID1 of code A in step S722. In the loop process shown in FIG. 7B, the process shown in FIG. 10A is the first process, and the process shown in FIG. 10B is the second process.
In the second processing, the search target code string and the search code string are collated over the next code string block after the first code string block.

図１０Ｂのブロック７０２ａ内の矢印に示すように、図７Ｂに示すループ処理のステップＳ７２２において、検索開始コードＩＤがＩＤ１からＩＤ２に更新される。そして、ブロック７０２ａのＩＤ２からブロック７０３ａへの矢印７２４ｃで示すように、ＩＤ２の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ１がステップＳ８１０及びステップＳ８１１により取り出される。また、ブロック７０３ａの、ＩＤ２の指すＩＤ関係表３１０ａの次コードＩＤであるＩＤ１と、ブロック７０２ａの、先頭コードＩＤ７４２ａの間の双方向の点線の矢印７３６ｃで示すように、ステップＳ８１３において、次コードＩＤであるＩＤ１は先頭コードＩＤであるＩＤ１と一致することが判定される。 As indicated by the arrow in block 702a in FIG. 10B, the search start code ID is updated from ID1 to ID2 in step S722 of the loop processing shown in FIG. 7B. Then, as indicated by an arrow 724c from ID2 of the block 702a to the block 703a, ID1 which is the next code ID of the ID relationship table 310a pointed to by ID2 is extracted in steps S810 and S811. In step S813, as shown by a bidirectional dotted arrow 736c between ID1 which is the next code ID of the ID relationship table 310a pointed to by ID2 in block 703a and the head code ID 742a in block 702a, the next code It is determined that ID1 that is the ID matches ID1 that is the head code ID.

すると、点線の矢印７３７ａで示すように、ステップＳ９０１において、索引データ管理表の先頭のエントリ７０４ａの次のエントリ７０４ｂが取り出される。そして、該エントリ７０４ｂの先頭コード７３２ｂに格納されたコードＢが図の矢印７３８ａに示すように、ステップＳ９０４で先頭コード７４１ｂに設定される。 Then, as indicated by a dotted arrow 737a, in step S901, the entry 704b next to the first entry 704a in the index data management table is extracted. Then, the code B stored in the head code 732b of the entry 704b is set to the head code 741b in step S904 as indicated by an arrow 738a in the figure.

一方、ブロック７０１ａのコードＡからコードＢへの矢印７２１ａに示すように、ステップＳ９０５で次のコード位置のコードが処理対象とされ、ステップＳ９０６で検索コード列から先頭のコードＡの次のコードＢが取り出される。そして、双方向の点線の矢印７４４ｂで示すように、ステップＳ９０７において、コードＡの次に位置するコードであるコードＢは先頭コード７４１に設定されたコードＢと一致することが判定される。
すると、図の矢印７３９aに示すように、ステップＳ９１１でエントリ７０４ｂの索引データポインタ７３３ｂに基づき索引データの格納領域７０５ｂ内に格納されたコード別ＩＤ範囲表３０９ｂとＩＤ関係表３１０ｂが取得される。On the other hand, as indicated by an arrow 721a from code A to code B in block 701a, the code at the next code position is processed in step S905, and the code B next to the first code A from the search code string in step S906. Is taken out. Then, as indicated by the bidirectional dotted arrow 744b, in step S907, it is determined that the code B, which is the code positioned next to the code A, matches the code B set in the head code 741.
Then, as shown by an arrow 739a in the figure, in step S911, the code-specific ID range table 309b and the ID relation table 310b stored in the index data storage area 705b are acquired based on the index data pointer 733b of the entry 704b.

次に矢印７４５ｂに示すように、ステップＳ９１２において、先頭コード７４１ｂに設定されたコードＢの指すコード別ＩＤ範囲表３０９ｂより先頭コードＩＤであるＩＤ２が取り出され、先頭コードＩＤ７４２ｂに設定される。
続いて矢印７２４ｄで示すように、ＩＤ２の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ３が、ステップＳ９１３及び再帰的に呼び出された図８Ａに示す処理のステップＳ８１１により取り出される。そして、ブロック７０３ｂの、ＩＤ２の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ３と、ブロック７０２ｂの、先頭コードＩＤ７４２ｂの間の双方向の点線の矢印７３６ｄで示すように、ステップＳ８１３において、次コードＩＤであるＩＤ３は先頭コードＩＤであるＩＤ２と異なることが判定される。Next, as shown by an arrow 745b, in step S912, ID2 which is the head code ID is extracted from the ID-specific ID range table 309b pointed to by the code B set in the head code 741b, and is set as the head code ID 742b.
Subsequently, as indicated by an arrow 724d, ID3, which is the next code ID of the ID relationship table 310b pointed to by ID2, is extracted in step S913 and step S811 of the process shown in FIG. 8A recursively called. Then, in step S813, as shown by a bidirectional dotted arrow 736d between ID3 which is the next code ID of the ID relation table 310b pointed to by ID2 of the block 703b and the head code ID 742b of the block 702b, in step S813 It is determined that ID3 which is an ID is different from ID2 which is the head code ID.

そこで、ブロック７０１ａのコードＢからコードＣへの矢印７２１ｂに示すように、ステップＳ８１４で次のコード位置のコードが処理対象となり、ステップＳ８１５でコードＣが取り出される。ブロック７０２ｂへの矢印７２３ｄで示すように、ステップＳ８１６でコードＣの指すコード別ＩＤ範囲表３０９ｂの先頭コードＩＤであるＩＤ３と末尾コードＩＤであるＩＤ４がコードＩＤ範囲として取り出される。 Therefore, as indicated by the arrow 721b from the code B to the code C in the block 701a, the code at the next code position becomes the processing target in step S814, and the code C is extracted in step S815. As indicated by an arrow 723d to the block 702b, ID3 which is the first code ID and ID4 which is the last code ID in the code-specific ID range table 309b pointed to by the code C are extracted as a code ID range in step S816.

そして、ブロック７０３ｂの、ＩＤ２の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ３とブロック７０２ｂの、コードＣの指すコード別ＩＤ範囲表３０９ｂのコードＩＤ範囲の間の双方向の点線の矢印７２５ｄで示すように、ステップＳ８１７においてＩＤ２の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ３がコードＣの指すコード別ＩＤ範囲表３０９ｂのコードＩＤ範囲であることが判定される。 A bidirectional dotted arrow 725d between ID3 which is the next code ID of the ID relation table 310b pointed to by ID2 in the block 703b and the code ID range of the code ID range table 309b pointed to by the code C in the block 702b As illustrated, in step S817, it is determined that ID3, which is the next code ID of the ID relationship table 310b pointed to by ID2, is the code ID range of the code-specific ID range table 309b pointed to by code C.

つまり、検索対象コード列１０ａのうち、コードＩＤがＩＤ２であるコードＡからのコード列は、検索コード列ＡＢＣと一致することが示されている。これは、検索対象コード列１０ａのうち、コードＩＤがＩＤ２であるコードＡからのコード列は、図３Ａの（ａ）に示すようにＡＢＣであることに整合している。
そこでステップＳ７２０で、矢印７２８ａに示すように、符号７０５ｂで示す検索結果コード位置に、検索開始コードＩＤであるＩＤ２の指すＩＤ関係表３１０ａのコード位置Ｐ４を設定する。That is, it is shown that the code string from the code A whose code ID is ID2 in the search target code string 10a matches the search code string ABC. This is consistent with the fact that the code string from the code A whose code ID is ID2 in the search target code string 10a is ABC as shown in FIG. 3A (a).
Therefore, in step S720, as indicated by the arrow 728a, the code position P4 of the ID relationship table 310a pointed to by ID2 which is the search start code ID is set at the search result code position indicated by reference numeral 705b.

そして、検索開始コードＩＤであるＩＤ２は、ステップＳ７０７で設定された検索終了コードＩＤであることから、先頭のコード列ブロックを検索開始位置とする検索は終了し、図６に示すループ処理に戻り、検索開始位置を１つ進めて、すなわち先頭から２番目のコード列ブロックからの検索を行う。 Since the search start code ID ID2 is the search end code ID set in step S707, the search using the first code string block as the search start position ends, and the process returns to the loop processing shown in FIG. The search start position is advanced by one, that is, the search is performed from the second code string block from the top.

図１０Ｃは、検索対象コード列の２番目のコード列ブロックからの処理の流れを説明する図であり、図６に示す最外側のループ処理については、２順目のループ処理に相当する。以下説明する処理の流れは、先に図１０Ａを参照して説明したものと同様なものである。
検索を開始する前の処理として、図の矢印７３１ｂに示すように、ステップＳ６０９で検索開始位置の索引データ管理ポインタの値が更新され、ステップＳ６０３で索引データ管理表の先頭のエントリ７０４ｂが取り出される。そして、図の矢印７３４ｂに示すように、ステップＳ６０５で該エントリの索引データポインタ７３３ｂに基づき索引データの格納領域７０５ｂ内に格納されたコード別ＩＤ範囲表３０９ｂとＩＤ関係表３１０ｂが取得される。また、点線の矢印７３５ｂに示すように、ステップＳ６０６及びＳ６０７で、該エントリ７０４ｂの先頭コード７３２ｂに格納されたコードＢに対応する、コード別ＩＤ範囲表３０９ｂのエントリ３０９ｂ（Ｂ）が読み出され、先頭コードＩＤであるＩＤ２が読み出されて、先頭コードＩＤ７４２ｂに設定される。FIG. 10C is a diagram for explaining the flow of processing from the second code string block of the search target code string, and the outermost loop process shown in FIG. 6 corresponds to the second loop process. The processing flow described below is the same as that described above with reference to FIG. 10A.
As processing before starting the search, as indicated by an arrow 731b in the figure, the value of the index data management pointer at the search start position is updated in step S609, and the top entry 704b of the index data management table is extracted in step S603. . Then, as shown by an arrow 734b in the figure, in step S605, the code-specific ID range table 309b and the ID relation table 310b stored in the index data storage area 705b are acquired based on the index data pointer 733b of the entry. Also, as indicated by the dotted arrow 735b, in steps S606 and S607, the entry 309b (B) of the code-specific ID range table 309b corresponding to the code B stored in the head code 732b of the entry 704b is read. The head code ID ID2 is read and set to the head code ID 742b.

２番目のコード列ブロックからの検索の最初に、検索コード列の先頭に位置するコードＡがブロック７０１ａに示すようにステップＳ７０３で再度取り出される。そして、ブロック７０２ｂへの矢印７２３ｅで示すように、コードＡの指すコード別ＩＤ範囲表３０９ｂの先頭コードＩＤであるＩＤ１がステップＳ７０６で取り出されて検索開始コードＩＤに設定される。また末尾コードＩＤであるＩＤ１がステップＳ７０７で取り出されて検索終了コードＩＤに設定される。
次に、ブロック７０２ｂのＩＤ１からブロック７０３ｂへの矢印７２４ｅで示すように、ＩＤ１の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ４がステップＳ８１０及びステップＳ８１１により取り出される。そして、ブロック７０３ｂの、ＩＤ１の指すＩＤ関係表３０９ｂの次コードＩＤであるＩＤ４と、ブロック７０２ｂの、先頭コードＩＤ７４２ｂの間の双方向の点線の矢印７３６ｅで示すように、ステップＳ８１３において、次コードＩＤであるＩＤ３は先頭コードＩＤであるＩＤ１とは異なることが判定される。At the beginning of the search from the second code string block, the code A located at the head of the search code string is extracted again in step S703 as shown in block 701a. Then, as indicated by an arrow 723e to the block 702b, ID1 that is the first code ID of the code-specific ID range table 309b pointed to by the code A is extracted in step S706 and set as the search start code ID. Further, ID1, which is the tail code ID, is extracted in step S707 and set as the search end code ID.
Next, as indicated by the arrow 724e from ID1 of the block 702b to the block 703b, ID4, which is the next code ID of the ID relation table 310b pointed to by ID1, is extracted in steps S810 and S811. Then, in step S813, as shown by a bidirectional dotted arrow 736e between ID4 which is the next code ID of the ID relation table 309b pointed to by ID1 in block 703b and the head code ID 742b in block 702b, in step S813 It is determined that ID3, which is an ID, is different from ID1, which is the head code ID.

すると、ブロック７０１ａのコードＡからコードＢへの矢印７２１ａに示すように、ステップＳ８１４で次のコード位置のコードが処理対象となり、ステップＳ８１５でコードＢが取り出される。ブロック７０２ｂへの矢印７２３ｆで示すように、ステップＳ８１６でコードＢの指すコード別ＩＤ範囲表３０９ｂの先頭コードＩＤであるＩＤ２と末尾コードＩＤであるＩＤ２がコードＩＤ範囲として取り出される。 Then, as indicated by an arrow 721a from code A to code B in block 701a, the code at the next code position becomes the processing target in step S814, and code B is extracted in step S815. As indicated by an arrow 723f to the block 702b, ID2 that is the first code ID and ID2 that is the last code ID in the code-specific ID range table 309b pointed to by the code B are extracted as a code ID range in step S816.

そして、ブロック７０３ｂの、ＩＤ１の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ４とブロック７０２ｂの、コードＢの指すコード別ＩＤ範囲表３０９ｂのコードＩＤ範囲の間の双方向の点線の矢印７２５ｅで示すように、ステップＳ８１７においてＩＤ１の指すＩＤ関係表３１０ｂの次コードＩＤであるＩＤ４がコードＢの指すコード別ＩＤ範囲表３０９ｂのコードＩＤ範囲外であることが判定され、検索失敗となる。そこで、検索失敗を返して図７Ｂに示すループ処理に戻る。 Then, a bidirectional dotted arrow 725e between ID4 which is the next code ID of the ID relation table 310b pointed to by ID1 in the block 703b and the code ID range of the code ID range table 309b pointed to by the code B in the block 702b. As shown in the figure, in step S817, it is determined that ID4, which is the next code ID of the ID relation table 310b pointed to by ID1, is outside the code ID range of the code-specific ID range table 309b pointed to by code B, resulting in a search failure. Therefore, the search failure is returned and the processing returns to the loop processing shown in FIG. 7B.

そして、検索開始コードＩＤであるＩＤ１は検索終了コードＩＤであることから、図７Ｂに示すステップＳ７２１の判定により処理終了となり、図６に示すループ処理にさらに戻り、図１０Ａ〜図１０Ｃに示す例では、検索対象コード列は２番目のコード列ブロックまでとしたことから、ステップＳ６０４において検索処理全体の終了が判定される。 Since ID1 which is the search start code ID is the search end code ID, the processing is ended by the determination in step S721 shown in FIG. 7B, and the processing returns to the loop processing shown in FIG. Since the search target code string is limited to the second code string block, the end of the entire search process is determined in step S604.

以上本発明を実施するための形態について詳細に説明したが、本発明の実施の形態はそれに限ることなく種々の変形が可能であることは当業者に明らかである。
また、本発明のコード列検索装置が、索引データ管理表とコード別ＩＤ範囲表とＩＤ関係表を格納する記憶手段と、図６〜図９Ｂに示した処理をコンピュータに実行させるプログラムによりコンピュータ上に構築可能なことは明らかである。Although the embodiment for carrying out the present invention has been described in detail above, it is obvious to those skilled in the art that the embodiment of the present invention is not limited thereto and can be variously modified.
In addition, the code string search apparatus of the present invention has a storage unit that stores an index data management table, an ID range table for each code, and an ID relation table, and a program that causes the computer to execute the processes shown in FIGS. It is clear that it can be constructed.

さらに、図４Ａ〜図５Ｃに示したコード列検索のための索引データを作成する処理とその均等物をコンピュータに実行させるプログラムにより、本発明の索引データ作成装置及び方法が実現可能であることも明らかである。そして、それらのプログラムにより、本発明の索引データを作成する手段等がコンピュータ上に実現される。
したがって、上記プログラム、及びプログラムを記録したコンピュータ読み取り可能な記録媒体は、本発明の実施の形態に含まれる。さらに、本発明のコード列検索のための索引データのデータ構造及びそのデータ構造を有する索引データを記録したコンピュータ読み取り可能な記録媒体も、本発明の実施の形態に含まれる。Furthermore, the index data creation apparatus and method of the present invention can be realized by a program that causes a computer to execute the process of creating index data for code string search shown in FIGS. 4A to 5C and its equivalent. it is obvious. By these programs, means for creating the index data of the present invention is realized on the computer.
Therefore, the program and a computer-readable recording medium recording the program are included in the embodiment of the present invention. Furthermore, a data structure of index data for code string search of the present invention and a computer-readable recording medium on which index data having the data structure is recorded are also included in the embodiment of the present invention.

以上詳細に説明した本発明が提供する新しい索引データ構造であるコード別ＩＤ範囲表とＩＤ関係表及びそれらを管理する索引データ管理表を採用することにより、索引データ作成の負荷を軽減すると共に、効率的にコード列検索を行うことが可能となる。
また、本発明によれば、索引データを複数の格納領域に分割して格納することができるので、多量の索引データであっても、利用するハードウェア環境に応じてコード列ブロックの大きさを決定し、索引データへのアクセスやメンテナンスを容易にすることもできる。By adopting the code-specific ID range table and ID relationship table and the index data management table for managing them, which are new index data structures provided by the present invention described in detail above, the load of creating index data is reduced, The code string search can be performed efficiently.
Further, according to the present invention, since index data can be divided and stored in a plurality of storage areas, the size of the code string block can be reduced according to the hardware environment to be used even for a large amount of index data. It can also be made easier to access and maintain the index data.

Explanation of symbols

１０文字列
１０ａ検索対象コード列
１１コード位置ポインタ
２０文字位置順の接尾辞
２０ａ辞書順の接尾辞
３０接尾辞配列
４０検索文字列
４０ａ検索コード列
５０圧縮接尾辞配列
１０１検索対象コード列読出手段
１０２コード別ＩＤ範囲表生成手段
１０３ＩＤ関係表生成手段
１０４索引データ作成管理手段
１０５索引データ作成手段
１１１検索コード列読出手段
１１２コード別ＩＤ範囲読出手段
１１３ＩＤ関係読出手段
１１４コードＩＤ照合手段
１１５コード列検索管理手段
１１６コード列検索手段
３０１データ処理装置
３０２中央処理装置
３０３キャッシュメモリ
３０４バス
３０５主記憶装置
３０６外部記憶装置
３０７通信装置
３０８データ格納装置
３０９コード別ＩＤ範囲表
３１０ＩＤ関係表
３１１コード種別ポインタ
３１２コードＩＤポインタ
３２１索引データ管理表
３２２索引データ管理ポインタ
３２４索引データの格納領域10 character string 10a search object code string 11 code position pointer 20 suffix 20a in character position order suffix 30 in dictionary order 30 suffix array 40 search character string 40a search code string 50 compression suffix array 101 search object code string reading means 102 ID range table generation means 103 by code ID relation table generation means 104 Index data creation management means 105 Index data creation means 111 Search code string reading means 112 ID range reading means by code 113 ID relation reading means 114 Code ID collating means 115 Code string Search management means 116 Code string search means 301 Data processing device 302 Central processing device 303 Cache memory 304 Bus 305 Main storage device 306 External storage device 307 Communication device 308 Data storage device 309 ID range table 310 by code ID relationship table 311 Code type poi 312 Code ID pointer 321 Index data management table 322 Index data management pointer 324 Index data storage area

Claims

In a code string search device that searches a search target code string that is a search target using a search code string,
Provided for each code string block that is a partial code string obtained by dividing the search target code string into a plurality of parts,
A code-specific ID range table storing a code ID range, which is a range of code IDs that uniquely identify all codes located in the code string block, for each code of the same type;
Corresponding to the code ID, a next code ID that is a code ID of a code positioned next to the code related to the code ID in the code string block is stored, and the code related to the code ID is stored in the code string block. An ID relationship table that stores the code ID of the code located at the beginning of the code string block as the next code ID when located at the end;
A search execution unit that executes a search by the search code string with reference to an ID range table by code and an ID relation table provided for each code string block;
An index data management table storing a head code located at the head of the code string block for each code string block;
A search management unit that manages execution of search by the search execution unit;
With
The search execution unit
Search code string reading means for reading the search code string;
The code ID range of the code type for each code from the head code constituting the search code string read by the search code string reading means from the ID range table by code corresponding to the specified code string block Code-specific ID range reading means for sequentially reading
The next code ID stored corresponding to the code ID included in the code ID range of the first code type of the search code string read by the code-specific ID range reading means is the designated code string block The next code ID stored corresponding to the read next code ID is sequentially read from the ID relation table, and the next code ID is read from the head of the code string block. ID relationship reading means for determining whether the code ID is equal to the code;
When the next code ID read by the ID relation reading means is not equal to the code ID of the first code of the code string block, the next code ID is the code ID read by the code-specific ID range reading means. Code ID verification means for verifying whether it is included in the range,
The search management unit
While sequentially specifying the code string block from the top code string block to the search execution unit,
When the ID relation reading means determines that the read next code ID is equal to the code ID of the head code of the code string block, the head of the code string block located next to the code string block from the index data management table A code string search device, wherein a code string block positioned next to the code string block is designated to the search execution unit by reading a code and collating the code in the search code string with the head code.

The code string search device according to claim 1,
The code ID collating means obtains a first code ID, which is a code ID included in a code ID range of the first code type, which is the first code of the search code string, read by the ID relation reading means. When the code ID is 1 code ID, the next code ID stored corresponding to the first code ID is a code of the second code type that is the code positioned next to the first code in the search target code string. If the positions of the first code and the second code in the search code string are updated by the reading operation of the code-specific ID range reading means and the ID relation reading means, The next code ID stored corresponding to the code ID of the updated first code of the position is the code ID range of the type of the updated second code of the position Is intended to match or contained,
When the search management unit designates the next code string block located in the search execution unit, the ID relation reading unit has the read next code ID equal to the code ID of the head code of the code string block. Is read from the index data management table, the head code of the next code string block is read, and the head code and the next code that has been read are stored next to the first code. And the code string block positioned next is designated to the search execution unit when the head code and the code positioned next match.
A code string search device characterized by that.

In the code string search device according to claim 2,
The ID relation table stores a code position indicating a position of a code related to the code ID in the search target code string, corresponding to the code ID,
The code ID collating means checks whether the next code ID read by the ID relation reading means is included in the code ID range read by the code-specific ID range reading means at the head of the search code string. When succeeding from the code to the last code, the code position stored in the ID relation table corresponding to the code ID of the first code is output as a search result code position.
A code string search device characterized by that.

In the code string search device according to claim 3,
The next code ID and code position stored in the ID relation table corresponding to the code ID are stored in order of code position for each code ID of the same type of code.
A code string search device characterized by that.

The code string search device according to claim 4, wherein
The code ID collating unit uses all code IDs included in the code ID range of the type of the first code of the search code string as the first code ID, and the next code ID read by the ID relation reading unit is Check whether it is included in the code ID range read by the code-specific ID range reading means,
A code string search device characterized by that.

The code string search device according to claim 5, wherein
The code ID collating means determines that the ID relation if the collation of whether the next code ID read by the ID relation reading means is included in the code ID range read by the code-specific ID range reading means fails. Outputting the code position stored in the same entry as the next code ID of the table and the code position stored in the ID relation table corresponding to the code ID of the head code as a search result code position;
A code string search device characterized by that.

The code string search device according to claim 5, wherein
The search code string includes an arbitrary code that matches an arbitrary code,
In the code ID collating unit, the next code ID read by the ID relation reading unit using the arbitrary code as a second code is included in the range of code IDs read by the code-specific ID range reading unit. In place of the collation, the collation is performed with the code located next to the arbitrary code in the search code string as the second code.
A code string search device characterized by that.

A code string search device for searching a search target code string that is a search target using a search code string, wherein the code string is provided for each code string block that is a partial code string obtained by dividing the search target code string into a plurality of code strings. A code ID range table storing a code ID range, which is a range of code IDs for uniquely identifying all codes located in the block, for each code of the same type, and the code string block corresponding to the code ID The next code ID, which is the code ID of the code located next to the code related to the code ID, and when the code related to the code ID is located at the end of the code string block, the next code ID An ID relation table for storing the code ID of the code located at the head of the code string block, and the code string for each code string block; And index data management table that stores the start code at the head of the column blocks, and a search execution section, in the code string search method according to the code string search apparatus having a retrieval manager,
The search execution unit
A search code string reading step for reading the search code string;
The code ID range of the code type for each code from the head code constituting the search code string read in the search code string read step from the ID range table for each code corresponding to the specified code string block ID range reading step by code for sequentially reading
The next code ID stored corresponding to the code ID included in the code ID range of the first code type of the search code string read in the code-specific ID range reading step is the designated code string block The next code ID stored corresponding to the read next code ID is sequentially read from the ID relation table, and the next code ID is read from the head of the code string block. An ID relationship reading step for determining whether the code ID is equal to the code ID;
When the next code ID read in the ID relation reading step is not equal to the code ID of the first code of the code string block, the next code ID is the code ID read in the code-specific ID range reading step. A code ID verification step for verifying whether it is included in the range;
Run
The search management unit
A search start position specifying step of sequentially specifying the code string block from the top code string block in the search execution unit;
In the ID relation reading step, when it is determined that the read next code ID is equal to the code ID of the head code of the code string block, the code string block located next to the code string block from the index data management table A next code string designating step for designating a code string block positioned next to the code string block to the search execution unit by comparing the head code with a code in the search code string;
The code string search method characterized by performing this.

The code string search method according to claim 8, wherein
The code ID verification step includes:
When the first code ID, which is the code ID included in the code ID range of the first code type that is the first code of the search code string, read in the ID relation reading step, is the first code ID. Whether the next code ID stored corresponding to the first code ID is included in the code ID range of the second code type that is the code positioned next to the first code in the search target code string After that, when the positions of the first code and the second code in the search code string are updated by the reading operation of the ID range reading step by code and the ID relation reading step, the positions are updated. Check whether the next code ID stored corresponding to the code ID of the first code is included in the code ID range of the updated second code type of the position Is shall,
The next code string specifying step includes:
In the ID relation reading step, when it is determined that the read next code ID is equal to the code ID of the head code of the code string block, the head of the code string block located next from the index data management table A code is read, and the head code and the code next to the first code stored corresponding to the read next code are collated, and if both match, the code string block located next To the search execution unit,
A code string search method characterized by the above.

The code string search method according to claim 9,
The ID relation table stores a code position indicating a position of a code related to the code ID in the search target code string, corresponding to the code ID,
In the code ID collation step, collation of whether the next code ID read in the ID relation reading step is included in the code ID range read in the code-specific ID range reading step is the head of the search code string If the code from the first code to the last code is successful, the code position stored in the ID relation table corresponding to the code ID of the first code is output as the search result code position.
A code string search method characterized by the above.

The code string search method according to claim 10,
In the code ID collating step, all the code IDs included in the code ID range of the type of the first code in the search code string are set as the first code ID, and the next code ID read in the ID relation reading step is Check whether it is included in the range of the code ID read in the code-specific ID range reading step,
A code string search method characterized by the above.

A code string search device for searching a search target code string that is a search target using a search code string, wherein the code string is provided for each code string block that is a partial code string obtained by dividing the search target code string into a plurality of code strings. A code ID range table storing a code ID range, which is a range of code IDs for uniquely identifying all codes located in the block, for each code of the same type, and the code string block corresponding to the code ID The next code ID, which is the code ID of the code located next to the code related to the code ID, and when the code related to the code ID is located at the end of the code string block, the next code ID An ID relation table for storing the code ID of the code located at the head of the code string block, and the code string for each code string block; And index data management table that stores the start code at the head of the column blocks, and a search execution section, in the code string search program for realizing the functions of the code string search apparatus having a retrieval management unit to the computer,
On the computer,
As a function of the search execution unit,
A search code string reading function for reading the search code string;
The code ID range of the code type for each code from the head code constituting the search code string read out by the search code string read function from the code ID range table corresponding to the specified code string block ID range reading function for each code that sequentially reads
The next code ID stored corresponding to the code ID included in the code ID range of the first code type of the search code string read by the code-specific ID range reading function is the designated code string block The next code ID stored corresponding to the read next code ID is sequentially read from the ID relation table, and the next code ID is read from the head of the code string block. ID relation reading function for determining whether the code ID is equal to the code ID,
When the next code ID read by the ID relation reading function is not equal to the code ID of the first code of the code string block, the next code ID is the code ID read by the code-specific ID range reading function. Code ID verification function to verify whether it is included in range
Realized,
As a function of the search management unit,
A search start position specifying function for sequentially specifying the code string block from the top code string block in the search execution unit;
When it is determined by the ID relation reading function that the read next code ID is equal to the code ID of the first code of the code string block, the code string block located next to the code string block from the index data management table A next code string designating function for designating a code string block positioned next to the code string block to the search execution unit by comparing the head code with a code in the search code string,
A code string search program characterized by realizing the above.

In the code string search program according to claim 12,
The code ID verification function is
When the first code ID, which is a code ID included in the code ID range of the first code type that is the first code of the search code string, read by the ID relation read function, is the first code ID. Whether the next code ID stored corresponding to the first code ID is included in the code ID range of the second code type that is the code positioned next to the first code in the search target code string When the position in the search code string of the first code and the second code is updated by the reading operation of the ID range reading function by code and the ID relation reading function, the position is updated. Whether the next code ID stored in correspondence with the code ID of the first code is included in the code ID range of the updated second code type of the position It includes that function,
The next code string specifying function is:
When the ID related read function determines that the read next code ID is equal to the code ID of the head code of the code string block, the head of the code string block located next from the index data management table A code is read, and the head code and the code next to the first code stored corresponding to the read next code are collated, and if both match, the code string block located next A code string search program comprising a function of designating the search execution unit as

In the code string search program according to claim 13,
The ID relation table stores a code position indicating a position of a code related to the code ID in the search target code string, corresponding to the code ID,
The code ID collation function is configured such that collation of whether the next code ID read by the ID relation reading function is included in the range of code IDs read by the code-specific ID range reading function is the head of the search code string Including the function of outputting the code position stored in the ID relation table corresponding to the code ID of the head code as the search result code position when succeeding from the code to the tail code,
A code string search program characterized by that.

In the code string search program according to claim 14,
The code ID collating function uses all the code IDs included in the code ID range of the first code type of the search code string as the first code ID, and the next code ID read by the ID relation reading function is Including a function of checking whether the code ID is read by the code ID range reading function.
A code string search program characterized by that.

The computer-readable recording medium which recorded the code sequence search program of any one of Claims 12-15.

In the data structure for code string search for searching the search target code string by the search code string,
A code ID range that is provided for each code string block that is a partial code string obtained by dividing the search target code string into a plurality of code IDs that uniquely identify all codes located in the code string block. A code-specific ID range table stored for each code of the same type and a next code ID corresponding to the code ID, which is a code ID of a code positioned next to the code related to the code ID in the code string block In addition, when the code related to the code ID is located at the end of the code string block, an ID relation table for storing the code ID of the code located at the beginning of the code string block as the next code ID, and the code string An index data management table storing a head code positioned at the head of the code string block for each block, and
A search execution section, and a search management unit, the code-specific ID range table, the code string search apparatus that includes a storage unit for storing the ID relationship table and the ID management table,
By the search execution section,
A search code string reading step for reading the search code string;
The code ID range of the code type for each code from the head code constituting the search code string read in the search code string read step from the ID range table for each code corresponding to the specified code string block ID range reading step by code for sequentially reading
The next code ID stored corresponding to the code ID included in the code ID range of the first code type of the search code string read in the code-specific ID range reading step is the designated code string block The next code ID stored corresponding to the read next code ID is sequentially read from the ID relation table, and the next code ID is read from the head of the code string block. An ID relationship reading step for determining whether the code ID is equal to the code ID;
When the next code ID read in the ID relation reading step is not equal to the code ID of the first code of the code string block, the next code ID is the code ID read in the code-specific ID range reading step. A code ID verification step for verifying whether it is included in the range;
Run
By the search management unit,
A search start position specifying step of sequentially specifying the code string block from the top code string block in the search execution unit;
In the ID relation reading step, when it is determined that the read next code ID is equal to the code ID of the head code of the code string block, the code string block located next to the code string block from the index data management table A next code string designating step for designating a code string block positioned next to the code string block to the search execution unit by comparing the head code with a code in the search code string;
By running
A data structure for code string search, which enables execution of search of the search target code string by the search code string.

The data structure for code string search according to claim 17,
The ID relation table stores a code position indicating a position of a code related to the code ID in the search target code string, corresponding to the code ID,
Checking whether the next code ID read from the ID relation table is included in the range of code IDs read from the code-specific ID range reading table is from the first code to the last code in the search code string If successful, the code position stored in the ID relation table corresponding to the code ID of the head code can be output as a search result code position.
A data structure for code string search characterized by that.

The data structure for code string search according to claim 18,
The next code ID and code position stored in the ID relation table corresponding to the code ID are stored in order of code position for each code ID of the same type of code.
A data structure for code string search characterized by that.

A computer-readable recording medium on which data having the data structure according to any one of claims 17 to 19 is recorded.

In an index data creation device for code string search for searching a search target code string that is a search target by a search code string,
A search target code string reading means for sequentially reading out a code string block that is a partial code string obtained by dividing the search target code string into a plurality of codes, and obtaining the number of appearances for each type of code of the read code string block; and the search target code string A code ID range that is a range of code IDs for uniquely identifying all the codes located in the code string block is stored for each code of the same type based on the number of appearances of each type of code obtained by the reading means A code-specific ID range table generating means for generating a code-specific ID range table;
Based on the code string block read by the search target code string reading means and the code-specific ID range table, the code of the code positioned next to the code related to the code ID in the code string block corresponding to the code ID ID relation table generating means for generating an ID relation table storing the next code ID as an ID for each code string block;
A code data storage area for storing the code-specific ID range table and ID relation table corresponding to the code string block is secured for each code string block, and the code and index data located at the head of the code string block Index data creation management means for generating an index data management table for storing the storage area pointer for each code string block;
An index data creating apparatus comprising:

In the index data creation device according to claim 21,
The next code ID and code position stored in the ID relation table corresponding to the code ID are stored in order of code position for each code ID of the same type of code.
An index data creation device characterized by that.

In the index data creation method by the index data creation device for code string search for searching the search object code string that is the search object by the search code string,
A search target code string reading step for reading a code string block which is a partial code string obtained by dividing the search target code string into a plurality of codes, and obtaining the number of appearances for each type of code of the read code string block; and reading the search target code string A code in which a code ID range, which is a range of code IDs for uniquely identifying all the codes located in the code string block, is stored for each code of the same type based on the number of appearances of each type of code obtained in the step A code-specific ID range table generating step for generating another ID range table;
Based on the code string block read in the search target code string reading step and the code-specific ID range table generated in the code-specific ID range table, the code ID in the code string block corresponds to the code ID. An ID relation table generating step for generating an ID relation table storing a next code ID which is a code ID of a code located next to the code according to
With
An index data creation method comprising: repeating the search target code string reading step, the code-specific ID range table generation step, and the ID relation table generation step for all the code string blocks.

In an index data creation program for causing a computer to execute an index data creation method for code string search for retrieving a search target code string that is a search target by using a search code string,
On the computer,
An index data creation method for code column search,
A search target code string reading step for reading a code string block which is a partial code string obtained by dividing the search target code string into a plurality of codes, and obtaining the number of appearances for each type of code of the read code string block; and reading the search target code string A code in which a code ID range, which is a range of code IDs for uniquely identifying all the codes located in the code string block, is stored for each code of the same type based on the number of appearances of each type of code obtained in the step A code-specific ID range table generating step for generating another ID range table;
Based on the code string block read in the search target code string reading step and the code-specific ID range table generated in the code-specific ID range table, the code ID in the code string block corresponds to the code ID. An ID relation table generating step for generating an ID relation table storing a next code ID which is a code ID of a code located next to the code according to
With
Index data, characterized by repeating the search target code string reading step, the code-specific ID range table generating step, and the ID relation table generating step for all the code string blocks. Creation program.

A computer-readable recording medium on which the index data creating program according to claim 24 is recorded.