JP2000029884A

JP2000029884A - Character code registration search device and character code registration search method

Info

Publication number: JP2000029884A
Application number: JP10193833A
Authority: JP
Inventors: Hironori Yahagi; 裕紀矢作
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-07-09
Filing date: 1998-07-09
Publication date: 2000-01-28

Abstract

PROBLEM TO BE SOLVED: To reduce the storage capacity of a tri-array structure as much as possible by introducing a new data structure for which a double array structure which is a one-dimensional array is developed further and supplying a free registration part in respective character codes so as to be overlapped with each other on a CHECK array. SOLUTION: One of the values of a BASE array is a parallel movement amount as before and is applied to a KANJI (Chinese character) code (of low frequency) not used often (a). The other value is equivalent to the suffix I1 in the horizontal direction of a second BASE array. Then, for instance, in an area for expressing first and second bytes for constituting a 7-bit code, addresses are divided into three and classified into the groups d1, d2, d3 of three kinds (b). The CHECK array supplies the free registration part in the respective KANJI codes so as to be mutually overlapped on the CHECK array by supplying the parallel movement amounts d1, d2 and d3 even, for the character codes (slave nodes) continued to the same character (master node) (c).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、キー検索技術に関
し、特に、データ構造としての一次元配列であるダブル
配列構造にキー検索対象となる漢字コード等の文字列を
登録し、文字列を探索する文字コード登録探索装置、及
び文字コード登録探索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a key search technique, and in particular, registers a character string such as a kanji code to be searched for a key in a double array structure which is a one-dimensional array as a data structure, and searches for the character string. And a character code registration search method.

【０００２】[0002]

【従来の技術】近年、電子メール等の普及に伴い、個人
の持つ電子化された文書の量は飛躍的に増加してきた。
例えば、１日に数１００〜１０００通近く、電子メール
を処理する人も多い。１日に数ＭＢ、年間で数１００Ｍ
Ｂ〜ＧＢの文書データになることも、珍しくない。2. Description of the Related Art In recent years, with the spread of electronic mail and the like, the amount of electronic documents held by individuals has increased dramatically.
For example, there are many people who process e-mails in the number of hundreds to nearly 1,000 per day. Several MB a day, several hundred M per year
It is not uncommon for the document data to be B to GB.

【０００３】大量のデータを扱うときは、データの中の
冗長な部分を省いてデータ量を圧縮することで、記憶容
量を減らしたり、速く伝送したりできるようになる。様
々なデータを１つの方式でデータ圧縮できる方法として
ユニバーサル符号化が提案されている。上記の技術傾向
から、圧縮技術は必要不可欠な技術となっている。When a large amount of data is handled, by compressing the amount of data by omitting redundant portions in the data, it becomes possible to reduce the storage capacity or to transmit the data at high speed. Universal encoding has been proposed as a method capable of compressing various data by one method. From the technical trends described above, compression technology has become an indispensable technology.

【０００４】ところが、電子化された日本語、中国語等
の文書を単語単位で圧縮しようとする場合、文書から入
力された文字列が、辞書に予め登録された単語であるか
を高速に判断する必要がある。更に判断すべき単語が多
いため、この辞書はなるべく無駄な領域（いわゆる、疎
の領域）がないように編成しなくてはならない。However, when attempting to compress a digitized document such as Japanese or Chinese on a word basis, it is quickly determined whether a character string input from the document is a word registered in a dictionary in advance. There is a need to. Further, since there are many words to be judged, this dictionary must be organized so that there is no useless area (so-called sparse area) as much as possible.

【０００５】以下では、情報理論で用いられている呼称
を踏襲し、データの１ワード（ｗｏｒｄ）単位を記号
（シンボル（ｓｙｍｂｏｌ））、あるいは、文字と呼
び、データが任意ワード数だけ連なったものを記号列、
あるいは、文字列（ストリング（ｓｔｒｉｎｇ））と呼
ぶことにする。In the following, one word unit of data is called a symbol (symbol) or a character, following the name used in the information theory, and the data is connected by an arbitrary number of words. The symbol string,
Alternatively, it is referred to as a character string.

【０００６】一方言語コードの圧縮において、単語など
の文字列（キー集合（鍵集合））を、なるべく記憶容量
の小さいデータ構造としての一次元配列であるダブル配
列構造（すなわち、トライ配列構造）に格納し、このト
ライ配列構造を高速に探索する検索技術を開発すること
が重要な課題となる。On the other hand, in the compression of a language code, a character string such as a word (key set (key set)) is converted into a double array structure (ie, a tri-array structure) which is a one-dimensional array as a data structure having as small a storage capacity as possible. It is an important issue to develop a search technology for storing and searching this tri-array structure at high speed.

【０００７】特に、検索対象としての単語（キー集合）
を格納した辞書を構成するトライ配列構造は、キー集合
が予め分かっているような準静的キー集合が登録される
ものとと考えられ、後で適宜キーを追加登録して拡充す
る場合も多い。このようなデータ構造を準静的データ構
造と呼ぶ。In particular, a word (key set) as a search target
It is considered that a quasi-static key set whose key set is known in advance is registered in the trie array structure that configures the dictionary in which is stored. . Such a data structure is called a quasi-static data structure.

【０００８】この様な技術背景を鑑みて、青江らは、複
数のキーを高速にパターンマッチングするためのデータ
構造としての一次元配列であるダブル配列構造を用いた
文字コード文字列登録探索技術を提案している（青江順
一：ダブル配列による高速ディジタル検索アルゴリズ
ム、信学論（Ｄ），Ｖｏｌ．Ｊ７１−Ｄ，Ｎｏ．９，
ｐ．１５９２−１６００（１９８８）参照）。In view of such a technical background, Aoe et al. Have developed a character code character string registration search technique using a double array structure, which is a one-dimensional array as a data structure for pattern matching a plurality of keys at high speed. Proposal (Junichi Aoe: High-speed digital search algorithm with double array, IEICE (D), Vol. J71-D, No. 9,
p. 1592-1600 (1988)).

【０００９】図８は、従来技術を説明するための図であ
って、図８（ａ）は、ダブル配列構造のＢＡＳＥ配列及
びＣＨＥＣＫ配列の二つの一次元配列、及びこれらのＢ
ＡＳＥ配列内容及びＣＨＥＣＫ配列内容を示し、図８
（ｂ）は、ｂａｂｙ＃，ｂａｃｈｅｌｏｒ＃，ｂａｄｇ
ｅｒ＃，ｂａｄｇｅ＃，ｊａｒ＃が格納されているトラ
イ（ｔｒｉｅ）構造（状態遷移図）を示し、図８（ｃ）
は、トライ構造中の親子のノード関係、ダブル配列によ
る探索の動作を説明するための図である。FIG. 8 is a diagram for explaining the prior art. FIG. 8A shows two one-dimensional arrays of a BASE array and a CHECK array having a double array structure, and B and B of these arrays.
FIG. 8 shows the contents of the ASE sequence and the CHECK sequence.
(B) shows baby #, batchor #, badg
FIG. 8C shows a trie structure (state transition diagram) in which er #, badge #, and jar # are stored.
FIG. 8 is a diagram for explaining a parent-child node relationship in a trie structure and a search operation using a double array.

【００１０】この様なダブル配列による高速ディジタル
検索技術では、ｂａｂｙ＃，ｂａｃｈｅｌｏｒ＃，ｂａ
ｄｇｅｒ＃，ｂａｄｇｅ＃，ｊａｒ＃が格納されている
トライ構造に対応するダブル配列に対して、図８（ａ）
に示すように、ダブル配列構造のＢＡＳＥ配列及びＣＨ
ＥＣＫ配列の二つの一次元配列を用意する。In such a high-speed digital search technology using a double array, baby #, batchor #, ba #
FIG. 8A shows a double array corresponding to a trie structure in which dger #, badge #, and jar # are stored.
As shown in FIG.
Two two-dimensional arrays of the ECK array are prepared.

【００１１】これらのＢＡＳＥ配列内容及びＣＨＥＣＫ
配列内容は、ｂａｂｙ＃，ｂａｃｈｅｌｏｒ＃，ｂａｄ
ｇｅｒ＃，ｂａｄｇｅ＃，ｊａｒ＃が格納されている図
８（ｂ）のトライ（ｔｒｉｅ）構造（状態遷移図）に対
応する。These BASE sequence contents and CHECK
Sequence contents are baby #, batchor #, bad
This corresponds to a trie structure (state transition diagram) in FIG. 8B in which ger #, badge #, and jar # are stored.

【００１２】ここで、ＢＡＳＥ配列上の添字、ＣＨＥＣ
Ｋ配列上の添字は、いずれも、図８（ｂ）に示すトライ
構造の中の各節点指標（白丸○で囲まれた数字１〜２７
で示すノード）に対応する。Here, the subscript CHEC on the BASE array
The subscripts on the K array are all the node indices (numbers 1 to 27 enclosed by white circles) in the trie structure shown in FIG.
(A node indicated by).

【００１３】また図８（ｃ）の左のようなトライ構造中
の親子のノード関係において、親の節点（白丸○で囲ま
れたｎで示すノード）の指標ｎは、ＢＡＳＥ配列の添字
に対応する。In the parent-child node relationship in the trie structure as shown in the left side of FIG. 8C, the index n of the parent node (the node indicated by n surrounded by white circles) corresponds to the subscript of the BASE array. I do.

【００１４】一方、子の節点（白丸○で囲まれたｍで示
すノード）の節点指標ｍは、ＣＨＥＣＫ配列上の添字に
対応する。On the other hand, a node index m of a child node (a node indicated by m surrounded by a white circle する) corresponds to a subscript on the CHECK array.

【００１５】親の節点ｎに連なる枝ａ（文字「ａ」）の
節点指標ｍ＝ｇ（ｎ，ａ）を探索するとき、先ず、ＢＡ
ＳＥ配列で添字がｎと成っている配列箇所を見て、その
配列内容ＢＡＳＥ［ｎ］を得る。ここで、関数ｇは、ト
ライ構造上でのキーに対する状態遷移を規定する状態遷
移関数であり、ｇｏｔｏ関数と呼ばれている。このと
き、添字ｎで指定される配列内容ＢＡＳＥ［ｎ］が数値
ｄである（配列内容ＢＡＳＥ［ｎ］＝ｄ）場合、数値ｄ
はＣＨＥＣＫ配列上の添字ｍにおける一種の原点平行移
動量を示すことになる。When searching for a node index m = g (n, a) of a branch a (character "a") connected to a parent node n, first, BA
By looking at the sequence location where the subscript is n in the SE array, the sequence content BASE [n] is obtained. Here, the function g is a state transition function that defines a state transition for a key on the trie structure, and is called a goto function. At this time, if the array content BASE [n] specified by the subscript n is a numerical value d (array content BASE [n] = d), the numerical value d
Indicates a kind of origin parallel movement amount at the subscript m on the CHECK array.

【００１６】ＣＨＥＣＫ配列上で、添字ｄ（原点平行移
動値）の値だけ原点平行移動したＣＨＥＣＫ配列上の位
置から文字「ａ」のコード値の分だけ更に移動したＣＨ
ＥＣＫ配列上の添字をｍとする。ＣＨＥＣＫ配列上の添
字ｍ（＝ｄ＋文字「ａ」のコード値）に保持されている
値（ＣＨＥＣＫ配列［ｍ］）が、親ノードの節点指標の
値ｎ（すなわち、配列内容ＢＡＳＥ［ｎ］）と一致する
場合、節点ｎの下に文字「ａ」が登録されていることが
分かる。On the CHECK array, the CH further moved by the code value of the character "a" from the position on the CHECK array shifted by the value of the subscript d (origin translation value) in the origin.
The subscript on the ECK array is m. The value (CHECK array [m]) held in the subscript m (= d + code value of character “a”) on the CHECK array is the value n of the node index of the parent node (that is, array content BASE [n]). When it matches, it is understood that the character “a” is registered under the node n.

【００１７】通常のトライ構造における子ノードの探索
所要時間（マッチング時間）は、一般的に同じ親ノード
に連なる兄弟のノード数（すなわち、ノードから出る枝
数）に従って遅くなるが、このようなダブル配列を有す
るトライ構造を用いた文字コード文字列登録探索技術に
おける子ノードの探索所要時間では、兄弟のノード数に
依存せず、キーの長さにのみ依存する高速な探索が可能
であることが開示されている。In general, the time required for searching for a child node (matching time) in the trie structure is delayed according to the number of sibling nodes connected to the same parent node (ie, the number of branches from the node). The search time required for child nodes in the character code string registration search technology using a trie structure with an array can be a high-speed search that depends only on the key length, without depending on the number of sibling nodes It has been disclosed.

【００１８】図９（ａ）は、図８の従来技術において、
文字「電圧」、「電気」、「電車」、「電脳」、「電
話」等、文字「電」に連なる文字コードで構成された熟
語を、ダブル配列構造に追加登録する動作を説明するた
めの図であり、図９（ｂ）は、ダブル配列構造の拡大の
動作を説明するための図であり、図９（ｃ）は、図９
（ｂ）のダブル配列構造に対応するトライの拡大の動作
を説明するための図である。FIG. 9A shows the conventional technique of FIG.
To explain the operation of additionally registering idioms composed of character codes linked to the character "den" such as the characters "voltage", "electricity", "train", "denno", and "telephone" in the double array structure FIG. 9B is a diagram for explaining the operation of enlarging the double array structure, and FIG. 9C is a diagram for explaining the operation of FIG.
It is a figure for explaining operation of enlargement of a trie corresponding to a double arrangement structure of (b).

【００１９】このような従来の文字コード文字列登録探
索技術において、「電」に連なる文字コードを追加登録
する場合、文字「電」に連なる各文字コード（「電
圧」、「電気」、「電車」、「電脳」、「電話」、…）
は、それぞれのコード値による相対的な位置関係を保っ
ている。In such a conventional character code character string registration search technology, when a character code connected to "den" is additionally registered, each character code ("voltage", "electric", "train") connected to the character "den" is registered. "," Digital Cyber "," Phone ",…)
Maintain the relative positional relationship between the code values.

【００２０】一方、ＣＨＥＣＫ配列の方は、白丸○のつ
いた箇所が既に登録されて埋まっており、「電」に連な
る各文字が同時にＣＨＥＣＫ配列の空きに位置するわけ
ではない。On the other hand, in the case of the CHECK arrangement, the portions marked with white circles have already been registered and filled, and the characters connected to "den" are not simultaneously located in the empty space of the CHECK arrangement.

【００２１】そこで、これらの各文字「電圧」、「電
気」、「電車」、「電脳」、「電話」、…を同時にＣＨ
ＥＣＫ配列中の空きに登録するために、ＢＡＳＥ配列及
びＣＨＥＣＫ配列の両配列を拡張している。Therefore, these characters "voltage", "electricity", "train", "electronic brain", "telephone",.
In order to register an empty space in the ECK array, both the BASE array and the CHECK array are extended.

【００２２】すなわち図９（ｂ）に示すように、これら
の文字「電圧」、「電気」、「電車」、「電脳」、「電
話」、…がＣＨＥＣＫ配列において全て収まるような最
小の平行移動量ｄを算出し、ＢＡＳＥ配列中の文字
「電」のコード値ｄ（＝ＢＡＳＥ配列［ｎ］、ｎ＝ｇ
（根、電））の位置に、この値を書き込む。That is, as shown in FIG. 9 (b), the minimum parallel movement such that these characters "voltage", "electricity", "train", "electronic brain", "telephone",. The amount d is calculated, and the code value d of the character “den” in the BASE array (= BASE array [n], n = g
Write this value at the position of (root, den).

【００２３】一方、新たに得られた配列の添字の値ｐ，
ｑ，ｒ，ｓ，ｔは、上記の平行移動量ｄに各文字
「圧」、「気」、「車」、「脳」、「話」、…のコード
値を加算して得られた値である。すなわち、添字の値ｐ
＝ｇ（ｎ，圧），ｑ＝ｇ（ｎ，気），ｒ＝ｇ（ｎ，
車），ｓ＝ｇ（ｎ，脳），ｔ＝ｇ（ｎ，話）である。On the other hand, the subscript values p,
q, r, s, and t are values obtained by adding the code values of the characters “pressure”, “ki”, “car”, “brain”, “talk”,. It is. That is, the subscript value p
= G (n, pressure), q = g (n, qi), r = g (n,
Car), s = g (n, brain), t = g (n, story).

【００２４】そして、ＣＨＥＣＫ配列中の添字の値ｐ，
ｑ，ｒ，ｓ，ｔの位置に、親ノードである「電」の節点
指標であるｎを書き込む。すなわち、ＣＨＥＣＫ配列
［ｐ］＝ＣＨＥＣＫ配列［ｑ］＝ＣＨＥＣＫ配列［ｒ］
＝ＣＨＥＣＫ配列［ｓ］＝ＣＨＥＣＫ配列［ｔ］＝ｎと
なる。この様なノードｎ、ノードｐ、ノードｑ、ノード
ｒ、ノードｓ、ノードｔと、これに対応する枝「電」、
枝「圧」、枝「気」、枝「車」、枝「脳」、枝「話」で
形成されたトライ構造を図９（ｃ）に示す。Then, the subscript values p,
At a position of q, r, s, t, n which is a node index of “den” which is a parent node is written. That is, CHECK array [p] = CHECK array [q] = CHECK array [r]
= CHECK array [s] = CHECK array [t] = n. Such a node n, a node p, a node q, a node r, a node s, a node t, and a corresponding branch “den”,
FIG. 9C shows a trie structure formed by the branch “pressure”, the branch “ki”, the branch “car”, the branch “brain”, and the branch “talk”.

【００２５】[0025]

【発明が解決しようとする課題】文字コードに登録され
ている文字コードは、日本語でも中国語でも第１水準第
２水準合わせて７，０００字弱である。それらの中で
も熟語を作るのに用いられるものは数が限られている。
例えば中国語でも１０種類以上の熟語を生み出している
文字コードは、約５００字以下である。The character code registered in the character code is less than 7,000 characters in both Japanese and Chinese at the first and second levels. A limited number of them are used to create idioms.
For example, the character code that produces more than 10 idioms in Chinese is about 500 characters or less.

【００２６】トライ構造上で、ある文字コードに連なる
文字コードが多い程、図９に示したように、キーとして
の全ての文字コードをＣＨＥＣＫ配列上の空きに同時に
登録するためには、ＣＨＥＣＫ配列を大きくせざるを得
なくなる。In the trie structure, as the number of character codes connected to a certain character code increases, as shown in FIG. 9, in order to simultaneously register all character codes as keys in the empty space on the CHECK array, Must be increased.

【００２７】このため、このような従来の文字コード文
字列登録探索技術では、トライ構造上で、ある文字コ
ードに連なる文字コードが多い程、図９に示したよう
に、キーとしての全ての文字コードをＣＨＥＣＫ配列上
の空きに同時に登録するためには、ＣＨＥＣＫ配列を大
きくせざるを得なくなる。Therefore, in such a conventional character code character string registration and search technology, as the number of character codes connected to a certain character code on the trie structure increases, as shown in FIG. In order to register codes in the empty space on the CHECK array at the same time, the CHECK array must be enlarged.

【００２８】すなわち、アルファベット（ａ〜ｚ）と異
なり、文字コードの場合は文字が多く、ある文字コード
に連なる各文字コードの相対位置関係を維持したままで
ＣＨＥＣＫ配列に登録する場合、ＣＨＥＣＫ配列を大き
くせざるを得ないという問題点があった。更に加えて、
登録できずにすき間（スパース（疎））が多く空く可能
性があるという問題点もあった。That is, unlike the alphabets (a to z), a character code has many characters, and when the character code is registered in the CHECK array while maintaining the relative positional relationship between the character codes connected to the certain character code, the CHECK array is There was a problem that it had to be enlarged. In addition,
There was also a problem that there were many gaps (sparse (sparse)) because registration could not be made and there was a possibility that the space would be empty.

【００２９】本発明は、このような従来の問題点を解決
することを課題としており、特に、従来の高速、低容量
の辞書データ構造としての一次元配列であるダブル配列
（ｄｏｕｂｌｅ−ａｒｒａｙ）構造を更に発展させた新
たなデータ構造を提案し、頻出文字コードについては従
来のダブル配列とは異なる新たなデータ構造を導入し、
ＢＡＳＥ配列に登録される値を２種類とし、一方に値を
従来通りの平行移動量として余り多く用いられない（頻
度の低い）文字コードに適用し、他方の値を第２ＢＡＳ
Ｅ配列の添字のいずれか一つの添字として頻出文字コー
ドに適用し、第２ＢＡＳＥ配列の添字を頻出文字コード
に連なる文字のコード値に応じて所定種類（（具体的に
は、３種類）に分け、それぞれ独自の平行移動量を与え
ることにより、ＣＨＥＣＫ配列上で互いに重なるように
して各文字コードに空いた登録箇所を与え、その結果、
ＣＨＥＣＫ配列を極力拡張することなく、キーとしての
全ての文字コードをＣＨＥＣＫ配列上の空きに同時に登
録し、またＣＨＥＣＫ配列を極力拡張することなく、あ
る文字コードに連なる各文字コードの相対位置関係を維
持したままでＣＨＥＣＫ配列に登録し、更に加えて、登
録できずにすき間（スパース（疎））が多く空くことを
できるだけ回避する文字コード登録探索装置、及び文字
コード登録探索方法を実現することを課題としている。An object of the present invention is to solve such a conventional problem. In particular, a double-array structure, which is a one-dimensional array as a conventional high-speed, low-capacity dictionary data structure, is provided. We propose a new data structure that further develops, and introduce a new data structure that is different from the conventional double array for frequent character codes,
Two types of values are registered in the BASE array, and one of the values is applied to a character code that is not used very often (infrequently) as a conventional translation amount, and the other value is used as a second BAS
The suffix of the second BASE array is divided into a predetermined type (specifically, three types) according to the code value of a character connected to the frequent character code by applying the suffix of the second array to any one of the suffixes of the E array. , By giving a unique translation amount, each character code is given an empty registration point so as to overlap each other on the CHECK array, and as a result,
Without expanding the CHECK array as much as possible, all character codes as keys are registered simultaneously in the empty space on the CHECK array, and without expanding the CHECK array as much as possible, the relative positional relationship of each character code connected to a certain character code can be registered. A character code registration / search apparatus and a character code registration / search method for registering a character code in a CHECK array while maintaining the same and further avoiding as many gaps (sparseness) as possible due to failure to register. It is an issue.

【００３０】[0030]

【課題を解決するための手段】上記課題を解決するため
本発明により成された請求項１に記載の発明は、データ
構造としての一次元配列であるダブル配列構造にキー検
索対象となる文字コード文字列を登録し、文字列を探索
する文字コード登録探索装置において、キー検索対象と
なる各文字列の文字を登録するのに必要な平行移動量ｄ
を計算する平行移動量計算手段と、当該キー検索対象と
なる各文字列の語頭の指標１０６ｃを添字とする第１配
列と、当該第１配列での登録値を識別する識別手段と、
当該第１配列で示された文字列の内で、当該文字列の語
頭に連なる特定の文字に関する情報を登録した第２配列
と、当該第１配列と当該第２配列とに登録された平行移
動量ｄと当該文字列の語尾に連なる文字に相当する値と
の和を計算する鍵候補地点算出手段と、当該鍵候補地点
算出手段で得られた和を添字として当該文字列の語頭の
指標１０６ｃを登録した第２配列とを有する文字コード
登録探索装置である。According to the first aspect of the present invention, which is made by the present invention, there is provided a character code which is a key search target in a double array structure which is a one-dimensional array as a data structure. In a character code registration search device that registers a character string and searches for the character string, a translation amount d required to register a character of each character string to be a key search target
, A first array having a subscript of the index 106c at the beginning of each character string to be searched for the key, and an identification means for identifying a registered value in the first array.
Among the character strings shown in the first array, a second array in which information relating to a specific character connected to the beginning of the character string is registered, and a parallel movement registered in the first array and the second array. Key candidate point calculating means for calculating the sum of the quantity d and a value corresponding to a character connected to the end of the character string; and the index 106c of the beginning of the character string using the sum obtained by the key candidate point calculating means as a subscript And a second arrangement in which the character code registration and search are registered.

【００３１】請求項１に記載の発明によれば、従来の高
速、低容量の辞書データ構造としての一次元配列である
ダブル配列構造を更に発展させた新たなデータ構造とし
て、キー検索対象となる各文字列の語頭の指標１０６ｃ
を添字とする第１配列と、第１配列で示された文字列の
内で文字列の語頭に連なる特定の文字に関する情報を登
録した第２配列と、平行移動量計算手段が計算したキー
検索対象となる各文字列の文字を第１配列と第２配列と
に登録するのに必要な平行移動量ｄと文字列の語尾に連
なる文字に相当する値との和を添字として用いて文字列
の語頭の指標１０６ｃを登録した第２配列とを有する新
たなデータ構造を導入することにより、ＣＨＥＣＫ配列
上で互いに重なるようにして各文字コードに空いた登録
箇所を与えることができるようになり、その結果、ＣＨ
ＥＣＫ配列を極力拡張することなく、キーとしての全て
の文字コードをＣＨＥＣＫ配列上の空きに同時に登録で
き、またＣＨＥＣＫ配列を極力拡張することなく、ある
文字コードに連なる各文字コードの相対位置関係を維持
したままでＣＨＥＣＫ配列に登録できるようになり、更
に加えて、登録できずにすき間（スパース（疎））が多
く空くことをできるだけ回避することができるようにな
る。これにより、キー集合が予め分かっているような準
静的キー集合を検索対象として格納した辞書を構成し、
後で適宜キーを追加登録して拡張するようなトライ配列
構造の記憶容量を極力小さくすることができるようにな
る。According to the first aspect of the present invention, a key search target is obtained as a new data structure obtained by further developing the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. Index 106c at the beginning of each character string
, A second array in which information on a specific character connected to the beginning of the character string in the character string indicated by the first array is registered, and a key search calculated by the translation amount calculating means. A character string using, as a subscript, the sum of the translation amount d required to register the characters of each target character string in the first array and the second array and a value corresponding to the character connected to the end of the character string Introducing a new data structure having a second array in which an index 106c at the beginning of the character string is registered makes it possible to provide an empty registration location for each character code so as to overlap each other on the CHECK array, As a result, CH
Without expanding the ECK array as much as possible, all character codes as keys can be registered in the empty space on the CHECK array at the same time, and without expanding the CHECK array as much as possible, the relative positional relationship of each character code linked to a certain character code can be registered. It is possible to register in the CHECK array while maintaining it, and furthermore, it is possible to avoid as much as possible a large gap (sparse) due to failure to register. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured,
The storage capacity of a tri-array structure in which a key is added and registered later to expand the memory can be reduced as much as possible.

【００３２】上記課題を解決するため本発明により成さ
れた請求項２に記載の発明は、請求項１に記載の文字コ
ード登録探索装置において、熟語によく使用される文字
コードの一覧表を作成して当該文字コードの一覧表から
選択した選択文字コード１０３ｃを出力する一覧表手段
１０１と、頻度が何番目までの文字コードを選択するか
に関する頻度閾値を出力する頻出文字コード選択手段１
０４と、前記一覧表手段１０１から選択した頻出文字コ
ードを格納し、当該選択した頻出文字コード１０３ａ及
び選出した頻度文字コードの指標１０６ｃを出力する頻
出文字コード格納手段１０３と、文字コードで構成され
た熟語を登録した文字コード辞書であって、注目してい
る文字が選出した文字コードに基づいた熟語の語頭か否
かで作業を分け、語頭の頻出文字コードに連なる文字コ
ードを分類して得た各部類１０６ａを出力する辞書手段
１０６と、前記辞書手段１０６からの前記語頭の頻出文
字コードに連なる文字コードを分類して得た各部類１０
６ａを格納する部類格納手段１０８と、前記選出された
た文字コード１０１ｃの前記追番１０３ｂを算出すると
共に、ＢＡＳＥ配列上の同文字コードの指標１０６ｃに
格納する前記第２配列としてのＢＡＳＥ配列手段１０２
と、語頭の頻出文字コードに連なる文字を分類するため
に、熟語の２番目の文字コードを、当該２番目の文字コ
ードの文字コード中の幾つかのビットで分類するコード
分類手段１０７と、前記各部類１０６ａ毎に、任意の平
行移動量１０９ａを、同部類１０６ａの各文字コードの
コード値に加算した値が、何れも、ＣＨＥＣＫ配列上で
空きの箇所に来るような、最小の平行移動量１０９ａを
算出する平行移動量計算手段１０９と、前記平行移動量
計算手段１０９からの前記平行移動量１０９ａを格納
し、平行移動量１１０ａを前記ＢＡＳＥ配列手段１０２
の当該語頭の指標に相当する添字位置に格納する平行移
動量格納手段１１０と、前記各部類毎に、平行移動量計
算手段から入力した平行移動量を、同部類の各文字コー
ドのコード値に加算した和をＣＨＥＣｋ配列上で添字の
値とし、同添字位置に同部類の文字コードの親に当たる
当該語頭の指標を登録し、当該和の値を（（従来の語
頭）＋（注目の文字））からなる次の語頭の指標とする
鍵候補地点算出手段１１１と、前記コード分類手段１０
７からの前記コード値１０７ａと前記一覧表手段１０１
からの前記追番１０３ｂとに基づいて、前記平行移動量
格納手段１１０が出力する各部類毎の前記平行移動量１
１０ａを格納するための前記第２配列としての第２ＢＡ
ＳＥ配列１０５と、前記平行移動量格納手段１１０から
の前記平行移動量１１１ａと前記平行移動量計算手段１
０９からの前記部類の各文字コードにおける内部設定値
１０９ａを前記ＣＨＥＣＫ配列の語頭の指標１０６ｃの
箇所に登録する前記第１配列としてのＣＨＥＣＫ配列手
段１１２とを有する文字コード登録探索装置である。According to a second aspect of the present invention, which is made by the present invention to solve the above-mentioned problem, the character code registration / search apparatus according to the first aspect creates a list of character codes frequently used for idioms. List means 101 for outputting a selected character code 103c selected from the character code list, and frequent character code selecting means 1 for outputting a frequency threshold value as to how many character codes to select.
04, a frequent character code storage means 103 for storing the frequent character code selected from the list means 101, and outputting the selected frequent character code 103a and the index 106c of the selected frequency character code, and a character code. This is a character code dictionary in which idioms are registered.The work is divided according to whether or not the character of interest is the beginning of a idiom based on the selected character code. Means 106 for outputting each class 106a, and a class 10 obtained by classifying character codes connected to the frequent character codes at the beginning of the word from the dictionary means 106.
BASE array means as the second array for storing the category storage means 108 for storing the character code 6a and the additional number 103b of the selected character code 101c, and for storing the same in the index 106c of the same character code on the BASE array 102
A code classifying unit 107 for classifying a second character code of the idiom with some bits in the character code of the second character code in order to classify characters connected to the frequently occurring character code at the beginning of the word; A minimum parallel movement amount such that a value obtained by adding an arbitrary parallel movement amount 109a to the code value of each character code of the same class 106a for each class 106a comes to an empty place on the CHECK array. The parallel movement amount calculating means 109 for calculating the parallel movement amount 109a and the parallel movement amount 109a from the parallel movement amount calculating means 109 are stored.
And a translation amount storage means 110 for storing the subscript position corresponding to the index at the beginning of the word, and for each of the classes, the translation amount input from the translation amount calculating means for the code value of each character code of the class. The added sum is used as a subscript value on the CHECk array, an index of the head of the character corresponding to the parent of the character code of the same class is registered at the position of the subscript, and the value of the sum is calculated as ((conventional head) + (character of interest)) ), The key candidate point calculating means 111 as an index of the next word prefix, and the code classifying means 10
7 and the listing means 101
The parallel movement amount 1 for each class output from the parallel movement amount storage means 110 based on the serial number 103b from
A second BA as the second array for storing 10a
SE array 105, the translation amount 111a from the translation amount storage unit 110, and the translation amount calculation unit 1
And a CHECK array means 112 as the first array for registering the internal setting value 109a in each character code of the class from 09 to the index 106c at the beginning of the CHECK array.

【００３３】請求項２に記載の発明によれば、請求項１
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、平行移動量格納手
段１１０からの平行移動量１１１ａと平行移動量計算手
段１０９からの部類の各文字コードにおける内部設定値
１０９ａをＣＨＥＣＫ配列の語頭の指標１０６ｃの箇所
に登録する第１配列としてのＣＨＥＣＫ配列手段１１２
と、選出されたた文字コード１０１ｃの追番１０３ｂを
算出すると共に、ＢＡＳＥ配列上の同文字コードの指標
１０６ｃに格納する第２配列としてのＢＡＳＥ配列手段
１０２と、コード分類手段１０７からのコード値１０７
ａと一覧表手段１０１からの追番１０３ｂとに基づい
て、平行移動量格納手段１１０が出力する各部類毎の平
行移動量１１０ａを格納するための第２配列としての第
２ＢＡＳＥ配列１０５とを有する新たなデータ構造を導
入し、ＢＡＳＥ配列に登録される値を２種類とし、一方
に値を従来通りの平行移動量として余り多く用いられな
い（頻度の低い）文字コードに適用し、他方の値を第２
ＢＡＳＥ配列の添字のいずれか一つの添字として頻出文
字コードに適用し、第２ＢＡＳＥ配列の添字を頻出文字
コードに連なる文字のコード値に応じて３種類に分け、
それぞれ独自の平行移動量を与えることにより、ＣＨＥ
ＣＫ配列上で互いに重なるようにして各文字コードに空
いた登録箇所を与えることできるようになり、その結
果、ＣＨＥＣＫ配列を極力拡張することなく、キーとし
ての全ての文字コードをＣＨＥＣＫ配列上の空きに同時
に登録できるようになり、またＣＨＥＣＫ配列を極力拡
張することなく、ある文字コードに連なる各文字コード
の相対位置関係を維持したままでＣＨＥＣＫ配列に登録
できるようになり、更に加えて、登録できずにすき間
（スパース（疎））が多く空くことをできるだけ回避す
ることができるようになる。これにより、キー集合が予
め分かっているような準静的キー集合を検索対象として
格納した辞書を構成し、後で適宜キーを追加登録して拡
張するようなトライ配列構造の記憶容量を極力小さくす
ることができるようになる。According to the invention described in claim 2, according to claim 1
In addition to the effects described in (1), the parallel movement amount from the parallel movement amount storage means 110 is a new data structure that is a further development of the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. CHECK array means 112 as a first array for registering 111a and the internal setting value 109a in each character code of the class from the translation amount calculating means 109 at the index 106c at the beginning of the CHECK array.
BASE array means 102 as a second array to be stored in index 106c of the same character code on the BASE array, and code values from code classifying means 107. 107
a second BASE array 105 as a second array for storing the parallel movement amount 110a for each class output from the parallel movement amount storage means 110 based on "a" and the serial number 103b from the list means 101. A new data structure is introduced, two types of values are registered in the BASE array, and one of the values is applied to a character code that is not used very often (infrequently) as a parallel translation amount, and the other value is used. The second
The subscript of the second BASE array is applied to the frequent character code as one of the subscripts of the BASE array, and the subscript of the second BASE array is divided into three types according to the code value of the character connected to the frequent character code.
By giving each translation amount independently, CHE
It becomes possible to give an empty registration part to each character code by overlapping each other on the CK array, and as a result, without expanding the CHECK array as much as possible, all character codes as keys can be used in the CHECK array. Can be registered at the same time, and without expanding the CHECK sequence as much as possible, it is possible to register in the CHECK sequence while maintaining the relative positional relationship of each character code connected to a certain character code. As a result, it is possible to avoid as many gaps (sparseness) as possible. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【００３４】上記課題を解決するため本発明により成さ
れた請求項３に記載の発明は、請求項１に記載の文字コ
ード登録探索装置において、始めは語頭としてトライ構
造の根を指定すると共に、端記号としてのエンドマーク
＃を語頭Ｗに設定し、続いて、探索対象の文字としての
文字コードｂの入力を指示し、当該入力された文字コー
ドｂの語頭Ｗを検出するする文書入力手段２０１と、前
記語頭Ｗまたは文字コードの指標１０６ｃに相当する箇
所から数値１０２ａを入力するＢＡＳＥ配列手段１０２
と、前記ＢＡＳＥ配列手段１０２から入力した数値１０
２ａが、頻出する語頭文字コードの追番なのか、そうで
ない語頭文字コードの指標なのか、または前記文字列の
途中にある語頭Ｗの指標１０６ｃなのかを判定し、トラ
イを構成する指標の範囲を超えた指標が与えられた場合
にこの指標を、頻出文字コードの追番２０２ａとして出
力すると共に、当該ＢＡＳＥ配列手段１０２から入力し
た数値１０２ａが前記語頭Ｗの指標１０６ｃでなくかつ
頻出文字コードでない場合に平行移動量２０２ｂを出力
する登録値判断手段２０２と、語頭の頻出文字コードに
連なる文字を分類するために、前記ＢＡＳＥ配列手段か
ら入力した数値が頻出する語頭文字コードの追番がＢＡ
ＳＥ配列手段から入力された場合、この先頭漢字に連な
る文字コードの文字コード中の幾つかのビットで分類す
るコード分類手段１０７と、登録値判断手段２０２から
出力された前記頻出文字コードの追番２０２ａとし、文
字コードｂのコード値の分類に相当する箇所から、平行
移動量１０５ａを格納するための第２ＢＡＳＥ配列１０
５と、当該ＢＡＳＥ配列手段１０２から入力した数値１
０２ａが頻出でない語頭文字コードの指標または語頭Ｗ
の指標１０６ｃである場合に当該数値１０２ａを平行移
動量として格納する平行移動量格納手段１１０と、前記
各部類毎に、平行移動量計算手段から入力した平行移動
量を、同部類の各文字コードのコード値に加算した和を
ＣＨＥＣｋ配列上で添字の値とし、同添字位置に同部類
の文字コードの親に当たる当該語頭の指標を登録し、当
該和の値を（（従来の語頭）＋（注目の文字））からな
る次の語頭の指標とする鍵候補地点算出手段１１１と、
前記鍵候補地点算出手段１１１からの和１１１ａに相当
する箇所にキーを入力するＣＨＥＣＫ配列手段１１２
と、前記ＣＨＥＣＫ配列手段１１２が入力したキーが、
語頭文字コードの指標または語頭Ｗの指標１０６ｃに等
しいかを判断し、入力したキーが語頭文字コードの指標
または語頭Ｗの指標１０６ｃに等しいと判断した場合に
熟語が辞書に登録されていると判断する鍵・語頭照合手
段２０３とを有する文字コード登録探索装置である。According to a third aspect of the present invention, which is made by the present invention to solve the above problem, in the character code registration and search apparatus according to the first aspect, at first, the root of the trie structure is designated as a word prefix, A document input unit 201 for setting an end mark # as an end symbol at the beginning of a word W, subsequently instructing input of a character code b as a character to be searched, and detecting the beginning of the inputted character code b. BASE array means 102 for inputting a numerical value 102a from the position corresponding to the initial W or the character code index 106c
And the numerical value 10 inputted from the BASE array means 102
It is determined whether 2a is an additional number of a frequently occurring initial character code, an index of an initial character code that is not so, or an index 106c of an initial W in the middle of the character string. Is output as an additional number 202a of the frequently appearing character code, and the numerical value 102a input from the BASE array means 102 is not the index 106c of the initial W and the frequently appearing character is output. A registered value judging means 202 which outputs a parallel movement amount 202b when the code is not a code; and a serial number of an initial character code in which the numerical value input from the BASE array means frequently appears in order to classify characters connected to the frequently occurring character code at the beginning of the word. Is BA
When input from the SE array unit, a code classifying unit 107 that classifies by several bits in the character code of the character code connected to the leading kanji, and a serial number of the frequent character code output from the registered value determining unit 202 202a, a second BASE array 10 for storing the translation amount 105a from a location corresponding to the classification of the code value of the character code b.
5 and the numerical value 1 input from the BASE arrangement means 102
Indices of initial character codes where 02a is not frequent or initial W
And the parallel movement amount storage means 110 for storing the numerical value 102a as the parallel movement amount when the index 106c is the index 106c. Is added as a subscript value on the CHECk array, the index of the prefix corresponding to the parent of the character code of the same class is registered at the subscript position, and the value of the sum is ((conventional prefix) + ( Key candidate point calculating means 111 as an index of the next beginning of word consisting of the character of interest))
CHECK arrangement means 112 for inputting a key to a place corresponding to sum 111a from key candidate point calculation means 111
And the key input by the CHECK arrangement means 112,
It is determined whether the index is equal to the index of the initial letter code or the index 106c of the initial letter W. If it is determined that the input key is equal to the index of the initial letter code or the index 106c of the initial letter W, the idiom is registered in the dictionary. A character code registration and search device having a key / prefix comparison unit 203 for determining that

【００３５】請求項３に記載の発明によれば、請求項１
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、鍵候補地点算出手
段１１１からの和１１１ａに相当する箇所にキーを入力
するＣＨＥＣＫ配列手段１１２と、語頭Ｗまたは文字コ
ードの指標１０６ｃに相当する箇所から数値１０２ａを
入力する第２配列としてのＢＡＳＥ配列手段１０２と、
登録値判断手段２０２から出力された頻出文字コードの
追番２０２ａとし、文字コードｂのコード値の分類に相
当する箇所から、平行移動量１０５ａを格納するための
第２配列としての第２ＢＡＳＥ配列１０５とを有する新
たなデータ構造を導入することにより、キー集合が予め
分かっているような準静的キー集合を検索対象として格
納した辞書を構成し、後で適宜キーを追加登録して拡張
するようなトライ配列構造の記憶容量を極力小さくする
ことができるようになる。その結果、なるべく記憶容量
の小さいデータ構造としての一次元配列であるダブル配
列構造（すなわち、トライ配列構造）に格納し、このト
ライ配列構造を検索キーを用いて高速にパターンマッチ
ングすることができるようになる。According to the invention described in claim 3, according to claim 1
In addition to the effects described in (1), as a new data structure which is a further development of the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, the sum 111a from the key candidate point calculation means 111 CHECK arranging means 112 for inputting a key at a corresponding position, BASE arranging means 102 as a second array for inputting a numerical value 102a from a position corresponding to an index 106c of a letter W or a character code,
A second BASE array 105 as a second array for storing the parallel movement amount 105a from a location corresponding to the classification of the code value of the character code b as the additional number 202a of the frequently appearing character code output from the registered value determination unit 202 By introducing a new data structure having a key set, a dictionary is stored in which a quasi-static key set whose key set is known in advance is stored as a search target, and a key is registered and expanded later as needed. The storage capacity of a simple tri-array structure can be minimized. As a result, the data is stored in a double array structure (that is, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible, and the tri-array structure can be subjected to high-speed pattern matching using a search key. become.

【００３６】上記課題を解決するため本発明により成さ
れた請求項４に記載の発明は、データ構造としての一次
元配列であるダブル配列構造にキー検索対象となる文字
コード文字列を登録し、探索する文字コード登録探索方
法において、キー検索対象となる各文字列の文字を登録
するのに必要な平行移動量ｄを計算する平行移動量ｄ計
算工程と、当該キー検索対象となる各文字列の語頭の指
標１０６ｃを添字とする第１配列工程と、当該第１配列
工程での登録値を識別する識別工程と、当該第１配列工
程で示された文字列の内で、当該文字列の語頭に連なる
特定の文字に関する情報を登録した第２配列工程と、当
該第１配列工程と当該第２配列工程とに登録された平行
移動量ｄと当該文字列の語尾に連なる文字に相当する値
との和を計算する鍵候補地点算出工程と、当該鍵候補地
点算出工程で得られた和を添字として当該文字列の語頭
の指標１０６ｃを登録する第２配列工程とを有する文字
コード登録探索方法である。According to a fourth aspect of the present invention to solve the above-mentioned problem, a character code character string to be searched for a key is registered in a double array structure which is a one-dimensional array as a data structure. In the character code registration search method to be searched, a parallel movement amount d calculating step of calculating a parallel movement amount d required to register a character of each character string as a key search object, and each character string as the key search object , A first arranging step using the index 106c at the beginning of the subscript as a suffix, an identifying step of identifying a registered value in the first arranging step, and a character string of the A second arrangement step in which information relating to a specific character connected to the beginning of the word is registered, the translation amount d registered in the first arrangement step and the second arrangement step, and a value corresponding to the character connected to the end of the character string Calculate the sum with A candidate point calculating step, a character code register searching method and a second sequence step of registering the prefix indication 106c of the character string sum obtained in the key candidate point calculation step as a subscript.

【００３７】請求項４に記載の発明によれば、従来の高
速、低容量の辞書データ構造としての一次元配列である
ダブル配列構造を更に発展させた新たなデータ構造とし
て、キー検索対象となる各文字列の語頭の指標１０６ｃ
を添字とする第１配列工程と、第１配列工程で示された
文字列の内で文字列の語頭に連なる特定の文字に関する
情報を登録した第２配列工程と、平行移動量計算手段が
計算したキー検索対象となる各文字列の文字を第１配列
工程と第２配列工程とに登録するのに必要な平行移動量
ｄと文字列の語尾に連なる文字に相当する値との和を添
字として用いて文字列の語頭の指標１０６ｃを登録した
第２配列工程とを用いて形成する新たなデータ構造を導
入することにより、ＣＨＥＣＫ配列上で互いに重なるよ
うにして各文字コードに空いた登録箇所を与えることが
できるようになり、その結果、ＣＨＥＣＫ配列を極力拡
張することなく、キーとしての全ての文字コードをＣＨ
ＥＣＫ配列上の空きに同時に登録でき、またＣＨＥＣＫ
配列を極力拡張することなく、ある文字コードに連なる
各文字コードの相対位置関係を維持したままでＣＨＥＣ
Ｋ配列に登録できるようになり、更に加えて、登録でき
ずにすき間（スパース（疎））が多く空くことをできる
だけ回避することができるようになる。これにより、キ
ー集合が予め分かっているような準静的キー集合を検索
対象として格納した辞書を構成し、後で適宜キーを追加
登録して拡張するようなトライ配列構造の記憶容量を極
力小さくすることができるようになる。According to the fourth aspect of the present invention, a key search is performed as a new data structure which is a further development of the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. Index 106c at the beginning of each character string
A first arranging step having a subscript as a suffix, a second arranging step in which information on a specific character connected to the beginning of the character string in the character string indicated in the first arranging step is registered, The sum of the translation amount d required to register the characters of each character string to be searched for the key in the first arrangement step and the second arrangement step and the value corresponding to the character connected to the end of the character string And the second arrangement step in which the index 106c at the beginning of the character string is registered is used to introduce a new data structure to be formed. Can be given, and as a result, all character codes as keys can be changed to CH
Can be simultaneously registered in the space on the ECK array, and CHECK
CHEC while maintaining the relative positional relationship of each character code connected to a certain character code without expanding the array as much as possible
It becomes possible to register in the K array, and in addition, it is possible to avoid as many gaps (sparseness) as possible without registering as much as possible. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【００３８】上記課題を解決するため本発明により成さ
れた請求項５に記載の発明は、請求項４に記載の文字コ
ード登録探索方法において、前記識別工程が、前記第１
配列工程において登録されている登録内容を、熟語の先
頭に位置する特定の文字コードの追番Ｉ、他の先頭の文
字コード、または文字列の語頭の平行移動量ｄのいずれ
かに識別する工程と、当該第１配列工程に登録されてい
る登録内容が当該先頭特定文字コードの追番Ｉであると
識別された場合、前記第２配列工程における前記追番Ｉ
の指示する配列箇所を参照して平行移動量ｄを得る工程
とを有する文字コード登録探索方法である。According to a fifth aspect of the present invention, which is made by the present invention to solve the above problem, in the character code registration and search method according to the fourth aspect, the identification step includes the first step.
A step of identifying the registered contents registered in the arrangement step as one of the serial number I of the specific character code located at the head of the idiom, another character code at the head, or the translation amount d of the head of the character string And if the registration content registered in the first arrangement step is identified as the additional number I of the head specific character code, the additional number I in the second arrangement step
And obtaining a translation amount d by referring to the array location indicated by the character code registration search method.

【００３９】請求項５に記載の発明によれば、請求項４
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、キー検索対象とな
る各文字列の語頭の指標１０６ｃを添字とする第１配列
工程と、第１配列工程で示された文字列の内で文字列の
語頭に連なる特定の文字に関する情報を登録した第２配
列工程と、平行移動量計算手段が計算したキー検索対象
となる各文字列の文字を第１配列工程と第２配列工程と
に登録するのに必要な平行移動量ｄと文字列の語尾に連
なる文字に相当する値との和を添字として用いて文字列
の語頭の指標１０６ｃを登録した第２配列工程とを用い
て形成する新たなデータ構造を導入し、第１配列工程に
おいて登録されている登録内容を、熟語の先頭に位置す
る特定の文字コードの追番Ｉ、他の先頭の文字コード、
または文字列の語頭の平行移動量ｄのいずれかに識別
し、第１配列工程に登録されている登録内容が先頭特定
文字コードの追番Ｉであると識別された場合に第２配列
工程における追番Ｉの指示する配列箇所を参照して平行
移動量ｄを得る識別工程を実行することにより、ＣＨＥ
ＣＫ配列上で互いに重なるようにして各文字コードに空
いた登録箇所を与えることができるようになり、その結
果、ＣＨＥＣＫ配列を極力拡張することなく、キーとし
ての全ての文字コードをＣＨＥＣＫ配列上の空きに同時
に登録でき、またＣＨＥＣＫ配列を極力拡張することな
く、ある文字コードに連なる各文字コードの相対位置関
係を維持したままでＣＨＥＣＫ配列に登録できるように
なり、更に加えて、登録できずにすき間（スパース
（疎））が多く空くことをできるだけ回避することがで
きるようになる。これにより、キー集合が予め分かって
いるような準静的キー集合を検索対象として格納した辞
書を構成し、後で適宜キーを追加登録して拡張するよう
なトライ配列構造の記憶容量を極力小さくすることがで
きるようになり、従来の各文字コード毎に一律にＣＨＥ
ＣＫ配列上の平行移動量を与える場合に比べ、ＣＨＥＣ
Ｋ配列の増大を抑えることができ、ダブル配列の空間的
効率化を図ることができる。According to the invention set forth in claim 5, according to claim 4,
In addition to the effects described in (1), as a new data structure that is a further development of the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, the prefix of each character string to be searched for a key is A first arrangement step using the index 106c as a suffix, a second arrangement step in which information relating to a specific character connected to the beginning of the character string in the character string indicated in the first arrangement step is registered, and a translation amount calculating means. Is the sum of the translation amount d required to register the characters of each character string to be searched for the key in the first and second arrangement steps and the value corresponding to the character connected to the end of the character string. A new data structure formed by using the second arraying step in which the index 106c of the beginning of the character string is registered by using as a subscript, and the registered content registered in the first arraying step is added to the beginning of the idiom. The specific character code Serial number I, the other at the beginning of the character code,
Or, if the registered content registered in the first arrangement step is identified as the additional number I of the first specific character code, the parallel movement amount d is identified as one of the parallel movement amounts d of the beginning of the character string. By executing the identification step of obtaining the parallel movement amount d with reference to the array position indicated by the serial number I, the CHE
It is possible to provide an empty registration location for each character code by overlapping each other on the CK array. As a result, all character codes as keys can be replaced on the CHECK array without expanding the CHECK array as much as possible. It is possible to register in a vacant space at the same time, and it is possible to register in the CHECK array while maintaining the relative positional relationship of each character code connected to a certain character code without expanding the CHECK array as much as possible. It is possible to avoid as many gaps (sparseness) as possible. Thus, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and a storage capacity of a tri-array structure in which a key is registered and expanded later as appropriate is minimized. CHEs for each character code
Compared to the case of giving the amount of translation on the CK array,
An increase in the K array can be suppressed, and the spatial efficiency of the double array can be improved.

【００４０】上記課題を解決するため本発明により成さ
れた請求項６に記載の発明は、請求項５に記載の文字コ
ード登録探索方法において、前記文字列の先頭特定文字
コードの前記追番Ｉ及び前記先頭特定文字コードに連な
る文字コードの分類に基づいて前記第２配列工程を参照
する工程を含む文字コード登録探索方法である。According to a sixth aspect of the present invention, which is made by the present invention to solve the above-mentioned problem, in the character code registration and search method according to the fifth aspect, the additional number I of the head specific character code of the character string is provided. And a step of referring to the second arrangement step based on a classification of a character code connected to the leading specific character code.

【００４１】請求項６に記載の発明によれば、請求項５
に記載の効果と同様の効果を奏する。According to the invention described in claim 6, according to claim 5,
The same effect as the effect described in (1) is obtained.

【００４２】上記課題を解決するため本発明により成さ
れた請求項７に記載の発明は、請求項６に記載の文字コ
ード登録探索方法において、前記第２配列工程における
前記先頭特定文字コードに連なる文字コードの分類を行
う工程が、当該文字コードのコード値を利用する文字コ
ード登録探索方法である。According to a seventh aspect of the present invention, which is made by the present invention to solve the above-mentioned problem, in the character code registration search method according to the sixth aspect, the character code registration search method is linked to the first specific character code in the second arrangement step. The step of classifying character codes is a character code registration search method using the code value of the character code.

【００４３】請求項７に記載の発明によれば、請求項５
に記載の効果と同様の効果を奏する。According to the invention of claim 7, according to claim 5,
The same effect as the effect described in (1) is obtained.

【００４４】上記課題を解決するため本発明により成さ
れた請求項８に記載の発明は、請求項４に記載の文字コ
ード登録探索方法において、前記第２配列工程が、熟語
を作る上で頻繁に利用される文字を前記文字列の語頭に
連なる特定の文字として選出する工程を含む文字コード
登録探索方法である。According to an eighth aspect of the present invention, which has been made to solve the above-mentioned problem, in the character code registration / search method according to the fourth aspect, the second arrangement step is frequently performed in forming idioms. Is a character code registration search method including a step of selecting a character used for a character string as a specific character connected to the beginning of the character string.

【００４５】請求項８に記載の発明によれば、請求項５
に記載の効果と同様の効果を奏する。According to the invention described in claim 8, according to claim 5,
The same effect as the effect described in (1) is obtained.

【００４６】上記課題を解決するため本発明により成さ
れた請求項９に記載の発明は、請求項４に記載の文字コ
ード登録探索方法において、前記第２配列工程が、同文
字を熟語とした場合、連なる文字コードのコード値の幅
が所定いき値以上となるものを前記文字列の語頭に連な
る特定の文字として選出する工程を含む文字コード登録
探索方法である。According to a ninth aspect of the present invention to solve the above-mentioned problem, in the character code registration search method according to the fourth aspect, the second arrangement step uses the same character as a idiom. In this case, there is provided a character code registration search method including a step of selecting, as a specific character connected to the beginning of the character string, a character string whose code value width is equal to or greater than a predetermined threshold value.

【００４７】請求項９に記載の発明によれば、請求項５
に記載の効果と同様の効果を奏する。According to the ninth aspect of the present invention, the fifth aspect
The same effect as the effect described in (1) is obtained.

【００４８】上記課題を解決するため本発明により成さ
れた請求項１０に記載の発明は、請求項４に記載の文字
コード登録探索方法において、熟語によく使用される文
字コードの一覧表を作成して当該文字コードの一覧表か
ら選択した選択文字コード１０３ｃを出力すると共に、
所定種類以上の文字コードが子ノードとして連結されて
いる親ノードの文字コードにおける接頭辞を選択して新
たに追番１０３ｂを付けて出力する一覧表工程（ステッ
プＳ２）と、頻度が何番目までの文字コードを選択する
かに関する頻度閾値を出力する頻出文字コード選択工程
（ステップＳ２）と、前記一覧表工程（ステップＳ２）
から選択した頻出文字コードを格納し、当該選択した頻
出文字コード１０３ａ及び選出した頻度文字コードの指
標１０６ｃを出力する頻出文字コード格納工程（ステッ
プＳ２）と、文字コードで構成された熟語を登録した文
字コード辞書であって、注目している文字が選出した文
字コードに基づいた熟語の語頭か否かで作業を分け、語
頭の頻出文字コードに連なる文字コードを分類して得た
各部類１０６ａを出力する辞書工程（ステップＳ３）
と、前記辞書工程（ステップＳ３）が生成した前記語頭
の頻出文字コードに連なる文字コードを分類して得た各
部類１０６ａを格納する分類結果格納工程（ステップＳ
９）と、前記選出されたた文字コード１０１ｃの前記追
番１０３ｂを算出すると共に、ＢＡＳＥ配列上の同文字
コードの指標１０６ｃに格納する前記第２配列工程とし
てのＢＡＳＥ配列工程（ステップＳ６，１１）と、語頭
の頻出文字コードに連なる文字を分類するために、熟語
の２番目の文字コードを、当該２番目の文字コードの文
字コード中の幾つかのビットで分類するコード分類工程
（ステップＳ７）と、前記各部類１０６ａ毎に、任意の
平行移動量１０９ａを、同部類１０６ａの各文字コード
のコード値に加算した値が、何れも、ＣＨＥＣＫ配列上
で空きの箇所に来るような、最小の平行移動量１０９ａ
を計算する平行移動量計算工程（ステップＳ８，１０）
と、前記平行移動量計算工程（ステップＳ８，１０）が
生成した前記平行移動量１０９ａを格納し、平行移動量
１１０ａを前記ＢＡＳＥ配列手段１０２の当該語頭の指
標に相当する添字位置に格納する平行移動量格納工程
（ステップＳ８，１０，１１）と、前記各部類毎に、平
行移動量計算手段から入力した平行移動量を、同部類の
各文字コードのコード値に加算した和をＣＨＥＣｋ配列
上で添字の値とし、同添字位置に同部類の文字コードの
親に当たる当該語頭の指標を登録し、当該和の値を
（（従来の語頭）＋（注目の文字））からなる次の語頭
の指標とする鍵候補地点算出工程（ステップＳ９，１
２）と、前記コード分類工程（ステップＳ７）が生成し
た前記コード値１０７ａと前記一覧表工程（ステップＳ
２）が生成した前記追番１０３ｂとに基づいて、前記平
行移動量格納工程（ステップＳ８，１０，１１）が出力
する各部類毎の前記平行移動量１１０ａを格納するため
の前記第２配列工程としての第２ＢＡＳＥ配列工程（ス
テップＳ９）と、前記平行移動量格納工程（ステップＳ
８，１０，１１）が生成した前記平行移動量１１１ａと
前記平行移動量計算工程（ステップＳ８，１０）が生成
した前記部類の各文字コードにおける内部設定値１０９
ａを前記ＣＨＥＣＫ配列の語頭の指標１０６ｃの箇所に
登録する前記第１配列工程としてのＣＨＥＣＫ配列工程
（ステップＳ１２）とを有する文字コード登録探索方法
である。According to a tenth aspect of the present invention, which has been made to solve the above problem, in the character code registration and search method according to the fourth aspect, a list of character codes frequently used for idioms is created. And outputs the selected character code 103c selected from the list of character codes.
A list process (step S2) for selecting a prefix in a character code of a parent node to which character codes of a predetermined type or more are connected as child nodes and adding a new additional number 103b to the list and outputting the list; Frequent character code selecting step (step S2) for outputting a frequency threshold value regarding whether to select the character code of (1) and the list step (step S2)
A frequently-used character code storing step (step S2) of storing the frequently-used character code selected from the above and outputting the selected frequently-used character code 103a and the index 106c of the selected frequency character code; In the character code dictionary, the work is divided depending on whether or not the character of interest is the beginning of an idiom based on the selected character code. Dictionary process to output (step S3)
And a classification result storing step (step S3) for storing each class 106a obtained by classifying the character codes connected to the frequent character codes at the beginning of the word generated by the dictionary step (step S3).
9) and calculating the additional number 103b of the selected character code 101c, and storing it in the index 106c of the same character code in the BASE array, as the second array process (step S6, 11). ), And a code classification step of classifying the second character code of the idiom with some bits in the character code of the second character code in order to classify characters connected to the frequently appearing character code at the beginning of the word (step S7). ) And a value obtained by adding an arbitrary parallel movement amount 109a to the code value of each character code of the class 106a for each class 106a. Of parallel movement 109a
For calculating the amount of parallel movement (steps S8, S10)
And the parallel movement amount 109a generated by the parallel movement amount calculation step (steps S8 and S10) is stored, and the parallel movement amount 110a is stored in a subscript position corresponding to the index of the head of the BASE array means 102. A movement amount storing step (steps S8, S10, S11) and, for each class, the sum of the parallel movement amount input from the parallel movement amount calculation means added to the code value of each character code of the same class in the CHECk array. Is registered as a subscript value, and the index of the head of the character which is the parent of the character code of the same class is registered at the position of the subscript, and the sum value is set to the value of the next prefix consisting of ((conventional prefix) + (character of interest)) Key candidate point calculation step for use as an index (steps S9, 1
2), the code value 107a generated by the code classification step (step S7) and the list step (step S7).
The second arrangement step for storing the parallel movement amount 110a for each class output by the parallel movement amount storage step (steps S8, S10, S11) based on the additional number 103b generated in 2). A second BASE arraying step (step S9), and the parallel movement amount storing step (step S9).
8, 10, 11) and the internal setting value 109 in each character code of the class generated by the parallel movement amount calculating step (steps S8, 10).
A character code registration and search method including a CHECK arrangement step (step S12) as the first arrangement step of registering a at the index 106c at the beginning of the CHECK arrangement.

【００４９】請求項１０に記載の発明によれば、請求項
４に記載の効果に加えて、従来の高速、低容量の辞書デ
ータ構造としての一次元配列であるダブル配列構造を更
に発展させた新たなデータ構造として、平行移動量格納
工程（ステップＳ８，１０，１１）が生成した平行移動
量１１１ａと平行移動量計算工程（ステップＳ８，１
０）が生成した最小の平行移動量１０９ａをＣＨＥＣＫ
配列の語頭の指標１０６ｃの箇所に登録する第１配列工
程としてのＣＨＥＣＫ配列工程（ステップＳ１２）と、
選出されたた文字コード１０１ｃの追番１０３ｂを算出
すると共に、ＢＡＳＥ配列上の同文字コードの指標１０
６ｃに格納する第２配列工程としてのＢＡＳＥ配列工程
（ステップＳ６，１１）と、平行移動量格納工程（ステ
ップＳ８，１０，１１）が出力する各部類毎の平行移動
量１１０ａを格納するための第２配列工程としての第２
ＢＡＳＥ配列工程（ステップＳ９）と実行して作成した
新たなデータ構造を導入し、ＢＡＳＥ配列に登録される
値を２種類とし、一方に値を従来通りの平行移動量とし
て余り多く用いられない（頻度の低い）文字コードに適
用し、他方の値を第２ＢＡＳＥ配列の添字のいずれか一
つの添字として頻出文字コードに適用し、第２ＢＡＳＥ
配列の添字を頻出文字コードに連なる文字のコード値に
応じて３種類に分け、それぞれ独自の平行移動量を与え
ることにより、ＣＨＥＣＫ配列上で互いに重なるように
して各文字コードに空いた登録箇所を与えることできる
ようになり、その結果、ＣＨＥＣＫ配列を極力拡張する
ことなく、キーとしての全ての文字コードをＣＨＥＣＫ
配列上の空きに同時に登録できるようになり、またＣＨ
ＥＣＫ配列を極力拡張することなく、ある文字コードに
連なる各文字コードの相対位置関係を維持したままでＣ
ＨＥＣＫ配列に登録できるようになり、更に加えて、登
録できずにすき間（スパース（疎））が多く空くことを
できるだけ回避することができるようになる。これによ
り、キー集合が予め分かっているような準静的キー集合
を検索対象として格納した辞書を構成し、後で適宜キー
を追加登録して拡張するようなトライ配列構造の記憶容
量を極力小さくすることができるようになる。According to the tenth aspect, in addition to the effect of the fourth aspect, the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, is further developed. As a new data structure, the parallel movement amount 111a and the parallel movement amount calculation step (steps S8, 1) generated by the parallel movement amount storage step (steps S8, S11) are used.
0) generates the minimum translation amount 109a by CHECK.
A CHECK arrangement step (step S12) as a first arrangement step for registration at the position of the index 106c at the beginning of the arrangement;
The additional number 103b of the selected character code 101c is calculated, and the index 10 of the same character code on the BASE array is calculated.
6c, a BASE arrangement step (steps S6, 11) as a second arrangement step, and a parallel movement amount 110a for each class output by the parallel movement amount storage step (steps S8, 10, 11). The second as the second arrangement step
A new data structure created by executing the BASE array process (step S9) is introduced, and two types of values are registered in the BASE array, and one of the values is not used much as the conventional parallel movement amount ( The second value is applied to the frequent character code, and the other value is applied to the frequent character code as one of the subscripts of the second BASE array.
The subscripts of the array are divided into three types according to the code values of the characters connected to the frequently appearing character codes, and by giving their own parallel movement amounts, the registered locations that are vacant in each character code are overlapped on the CHECK array. As a result, all character codes as keys can be changed to CHECK without expanding the CHECK array as much as possible.
It becomes possible to register at the same time in the space on the array, and CH
Without expanding the ECK array as much as possible, maintain the relative positional relationship of each character code connected to a certain character code
It becomes possible to register in the HECK array, and in addition, it is possible to avoid as many gaps (sparseness) as possible without registering as much as possible. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【００５０】上記課題を解決するため本発明により成さ
れた請求項１１に記載の発明は、請求項４に記載の文字
コード登録探索方法において、始めは語頭としてトライ
構造の根を指定すると共に、端記号としてのエンドマー
ク＃を語頭Ｗに設定し、続いて、探索対象の文字として
の文字コードｂの入力を指示し、当該入力された文字コ
ードｂの語頭Ｗを検出するする文書入力工程（ステップ
Ｐ２，Ｐ３）と、前記語頭Ｗまたは文字コードの指標１
０６ｃに相当する箇所から数値１０２ａを入力するＢＡ
ＳＥ配列工程（ステップＰ４）と、前記第２配列工程と
してのＢＡＳＥ配列工程（ステップＰ４）が生成した数
値１０２ａが、頻出する語頭文字コードの追番なのか、
そうでない語頭文字コードの指標なのか、または前記文
字列の途中にある語頭Ｗの指標１０６ｃなのかを判定
し、トライを構成する指標の範囲を超えた指標が与えら
れた場合にこの指標を、頻出文字コードの追番２０２ａ
として出力すると共に、当該ＢＡＳＥ配列工程（ステッ
プＰ４）が生成した数値１０２ａが前記語頭Ｗの指標１
０６ｃでなくかつ頻出文字コードでない場合に平行移動
量２０２ｂを出力する登録値判断工程（ステップＰ５）
と、語頭の頻出文字コードに連なる文字を分類するため
に、ＢＡＳＥ配列工程１０２（ステップＰ４）から入力
した数値が頻出する語頭文字コードの追番がＢＡＳＥ配
列手段から入力された場合、この先頭漢字に連なる文字
コードの文字コード中の幾つかのビットで分類するコー
ド分類工程（ステップＰ６）と、登録値判断工程（ステ
ップＰ５）が生成した前記頻出文字コードの追番２０２
ａとし、文字コードｂのコード値の分類に相当する箇所
から、平行移動量１０５ａを格納するための前記第２配
列工程としての第２ＢＡＳＥ配列工程（ステップＰ７）
と、当該ＢＡＳＥ配列工程（ステップＰ４）が生成した
数値１０２ａが頻出でない語頭文字コードの指標または
語頭Ｗの指標１０６ｃである場合に当該数値１０２ａを
平行移動量として格納する平行移動量格納工程（ステッ
プＰ７，Ｐ８）と、前記各部類毎に、平行移動量計算手
段から入力した平行移動量を、同部類の各文字コードの
コード値に加算した和をＣＨＥＣｋ配列上で添字の値と
し、同添字位置に同部類の文字コードの親に当たる当該
語頭の指標を登録し、当該和の値を（（従来の語頭）＋
（注目の文字））からなる次の語頭の指標とする鍵候補
地点算出工程（ステップＰ９）と、前記鍵候補地点算出
工程（ステップＰ９）からの和１１１ａに相当する箇所
にキーを入力する前記第１配列工程としてのＣＨＥＣＫ
配列工程（ステップＰ９）と、前記第１配列工程として
のＣＨＥＣＫ配列工程（ステップＰ９）が生成したキー
が、語頭文字コードの指標または語頭Ｗの指標１０６ｃ
に等しいかを判断し、入力したキーが語頭文字コードの
指標または語頭Ｗの指標１０６ｃに等しいと判断した場
合に熟語が辞書に登録されていると判断する鍵・語頭照
合工程（ステップＰ１０，ｐ１１，Ｐ１２）とを有する
文字コード登録探索方法である。According to an eleventh aspect of the present invention for solving the above-mentioned problem, in the character code registration and search method according to the fourth aspect, a root of a trie structure is first specified as a word prefix, A document input step of setting an end mark # as an end symbol at the beginning W, instructing input of a character code b as a search target character, and detecting the beginning W of the input character code b ( Steps P2 and P3) and index 1 of the initial W or character code
BA to input numerical value 102a from the position corresponding to 06c
Whether the numerical value 102a generated by the SE arrangement step (step P4) and the BASE arrangement step (step P4) as the second arrangement step is an additional number of the frequently occurring initial character code,
It is determined whether it is an index of the initial character code that is not so, or it is an index 106c of the initial W in the middle of the character string, and when an index exceeding the range of the index constituting the trie is given, this index is determined. , Frequently added character code additional number 202a
And the numerical value 102a generated by the BASE arrangement step (step P4) is the index 1 of the prefix W.
Registered value judging step of outputting parallel movement amount 202b when character code is not 06c and not frequently appearing character code (step P5)
In order to classify characters connected to the frequently appearing character code at the beginning of the word, if the serial number of the initial character code in which the numerical value inputted from the BASE arranging step 102 (step P4) frequently appears is inputted from the BASE arranging means, A code classification step (step P6) of classifying the character codes of the character codes connected to the kanji by some bits, and an additional number 202 of the frequent character code generated by the registration value determination step (step P5)
a, a second BASE arrangement step as the second arrangement step for storing the translation amount 105a from a position corresponding to the classification of the code value of the character code b (step P7)
And when the numerical value 102a generated by the BASE arrangement step (step P4) is the index of the initial character code or the index 106c of the initial W, which is not frequently used, stores the numerical value 102a as the translation amount. Steps P7 and P8) and, for each class, the sum of the translation amount input from the translation amount calculating means and the code value of each character code of the class is used as a subscript value on the CHECk array. Register the index of the prefix corresponding to the parent of the character code of the same class in the subscript position, and change the value of the sum to ((conventional prefix) +
(Step P9), which is the index of the next word consisting of (character of interest)), and inputting a key to a place corresponding to the sum 111a from the key candidate point calculation step (step P9). CHECK as first arrangement step
The key generated by the arrangement step (Step P9) and the CHECK arrangement step (Step P9) as the first arrangement step is the first letter code index or the first letter W index 106c.
Key / initial matching step (step P10, step P10) in which if the input key is determined to be equal to the index of the initial character code or the index 106c of the initial W, it is determined that the idiom is registered in the dictionary. p11, P12).

【００５１】請求項１１に記載の発明によれば、請求項
４に記載の効果に加えて、従来の高速、低容量の辞書デ
ータ構造としての一次元配列であるダブル配列構造を更
に発展させた新たなデータ構造として、鍵候補地点算出
工程（ステップＰ９）からの和１１１ａに相当する箇所
にキーを入力する第１配列工程としてのＣＨＥＣＫ配列
工程（ステップＰ９）と、語頭Ｗまたは文字コードの指
標１０６ｃに相当する箇所から数値１０２ａを入力する
第２配列工程としてのＢＡＳＥ配列工程（ステップＰ
４）と、登録値判断工程（ステップＰ５）が生成した頻
出文字コードの追番２０２ａとし、文字コードｂのコー
ド値の分類に相当する箇所から、平行移動量１０５ａを
格納するための第２配列工程としての第２ＢＡＳＥ配列
工程（ステップＰ７）とを有する新たなデータ構造を導
入することにより、キー集合が予め分かっているような
準静的キー集合を検索対象として格納した辞書を構成
し、後で適宜キーを追加登録して拡張するようなトライ
配列構造の記憶容量を極力小さくすることができるよう
になる。その結果、なるべく記憶容量の小さいデータ構
造としての一次元配列であるダブル配列構造（すなわ
ち、トライ配列構造）に格納し、このトライ配列構造を
検索キーを用いて高速にパターンマッチングすることが
できるようになる。According to the eleventh aspect of the present invention, in addition to the effect of the fourth aspect, the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, is further developed. As a new data structure, a CHECK arrangement step (step P9) as a first arrangement step of inputting a key to a place corresponding to the sum 111a from the key candidate point calculation step (step P9), and a prefix W or a character code index BASE arranging step (step P) as a second arranging step of inputting a numerical value 102a from a position corresponding to 106c
4) and a second array for storing the parallel movement amount 105a from a location corresponding to the classification of the code value of the character code b as the serial number 202a of the frequent character code generated in the registered value determination step (step P5) By introducing a new data structure having a second BASE array process (step P7) as a process, a dictionary is stored in which a quasi-static key set whose key set is known in advance is stored as a search target. Thus, the storage capacity of a tri-array structure in which a key is additionally registered as appropriate and expanded can be reduced as much as possible. As a result, the data is stored in a double array structure (that is, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible, and the tri-array structure can be subjected to high-speed pattern matching using a search key. become.

【００５２】上記課題を解決するため本発明により成さ
れた請求項１２に記載の発明は、請求項１０または１１
に記載の文字コード登録探索方法において、文字列の語
頭に連なる特定の文字が熟語であった場合に当該文字列
の語頭に連なる特定の文字として、連なる文字コードの
コード値の幅が所定いき値以上となる文字を選出する工
程を有する文字コード登録探索方法である。According to the twelfth aspect of the present invention, which has been made by the present invention to solve the above-mentioned problems, the tenth or eleventh aspects of the present invention are described.
In the character code registration search method described in, when the specific character connected to the beginning of the character string is an idiom, the width of the code value of the connected character code is determined as a specific character connected to the beginning of the character string. A character code registration search method including a step of selecting a character as described above.

【００５３】請求項１２に記載の発明によれば、請求項
１０または１１に記載の効果に加えて、従来方式で埋め
られなかった空きの箇所もより多く埋められることにな
り、ＢＡＳＥ配列工程（ステップＰ４）で作成されるＢ
ＡＳＥ配列及びＣＨＥＣＫ配列工程（ステップＰ９）で
作成されるＣＨＥＣＫ配列の両配列の増大も適度に抑え
られる。しかも、処理回数は第２ＢＡＳＥ配列工程（ス
テップＰ７）を参照するため、１回増えるだけであるの
で、ほぼ同じ処理回数で済む。According to the twelfth aspect of the present invention, in addition to the effects of the tenth or eleventh aspects, more vacant portions not filled by the conventional method are filled, and the BASE arrangement step ( B created in step P4)
The increase in both the ASE sequence and the CHECK sequence created in the CHECK sequence process (step P9) is also appropriately suppressed. In addition, since the number of times of processing refers to the second BASE arranging step (step P7), the number of times of processing is increased only once, so that the number of times of processing is almost the same.

【００５４】[0054]

【発明の実施の形態】以下の各実施形態では、文字コー
ドとして漢字コードを用いて説明を行うが、これに特に
限定されることなく、中国語や韓国語等ように、数１０
００種類ある単独文字を複数組み合わせて単語や熟語を
表現するような言語体系に用いられる２バイト文字符号
に対して本発明の文字コード登録探索装置、及び文字コ
ード登録探索方法を適用可能であり、具体的には、７ビ
ットコードの領域（図７（ａ））で表現できる日本語の
ＪＩＳコードや中国語のＧＢ２３１２−８０の７ビット
コード、あるいは８ビットコードの領域（図７（ｂ））
で表現できる日本語のＥＵＣコードの領域や中国語のＧ
Ｂ２３１２−８０の８ビットコードが考えられる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following embodiments, a description will be given using a kanji code as a character code.
The character code registration and search device and the character code registration and search method of the present invention can be applied to a 2-byte character code used in a language system such as expressing a word or an idiom by combining a plurality of single characters having 00 types, More specifically, a JIS code in Japanese that can be expressed in a 7-bit code area (FIG. 7A), a 7-bit code in Chinese GB2312-80, or an 8-bit code area (FIG. 7B)
Area of Japanese EUC code and Chinese G
An 8-bit code of B2312-80 is conceivable.

【００５５】さて、漢字コードに登録されている漢字数
は、日本語でも中国語でも第１水準第２水準合わせて
７，０００字弱である。それらの中でも熟語を作るのに
用いられるものは数が限られている。例えば中国語でも
１０種類以上の熟語を生み出している漢字コードは、約
５００字以下である。The number of kanji registered in the kanji code is less than 7,000 in both Japanese and Chinese at the first and second levels. A limited number of them are used to create idioms. For example, the number of Chinese character codes that produce more than 10 idioms in Chinese is about 500 characters or less.

【００５６】トライ構造上で、ある漢字コードに連なる
漢字コードが多い程、図９に示したように、キーとして
の全ての文字コードをＣＨＥＣＫ配列上の空きに同時に
登録するためには、ＣＨＥＣＫ配列を大きくせざるを得
なくなる。In the trie structure, as the number of kanji codes connected to a certain kanji code increases, as shown in FIG. 9, in order to simultaneously register all character codes as keys in an empty space on the CHECK array, Must be increased.

【００５７】そこで今回、従来の高速、低容量の辞書デ
ータ構造としての一次元配列であるダブル配列（ｄｏｕ
ｂｌｅ−ａｒｒａｙ）構造を更に発展された形で考案さ
れたのが、本発明である。Therefore, this time, a double array (dou), which is a one-dimensional array as a conventional high-speed, low-capacity dictionary data structure, is used.
It is the present invention that has been devised in a further developed form of the ble-array structure.

【００５８】そこで、これらの頻出漢字コードについて
は、従来のダブル配列とは異なる新たなデータ構造を導
入する。Therefore, for these frequent kanji codes, a new data structure different from the conventional double arrangement is introduced.

【００５９】図１に本発明の新たなデータ構造を示す。
図１（ａ）は、本発明の文字コード登録探索装置、及び
文字コード登録探索方法で適用される新たなデータ構造
において、余り多く用いられない（頻度の低い）漢字コ
ードに適用する平行移動量並びに頻出度または追番が登
録されたＢＡＳＥ配列の基本構造を示し、図１（ｂ）
は、本発明の新たなデータ構造において、頻出漢字コー
ドについて用いるの平行移動量が登録される第２ＢＡＳ
Ｅ配列の基本構造を示し、図１（ｃ）は、本発明の新た
なデータ構造において、図１（ｂ）の添字に対応して頻
出漢字コードが登録されるＣＨＥＣＫ配列の基本構造を
説明している。FIG. 1 shows a new data structure of the present invention.
FIG. 1A shows a translation amount applied to a kanji code that is not used very often (infrequently) in a new data structure applied in the character code registration search device and the character code registration search method of the present invention. FIG. 1B shows the basic structure of a BASE array in which the frequent or additional numbers are registered.
In the new data structure of the present invention, the second BAS in which the translation amount used for the frequent kanji code is registered
FIG. 1C shows a basic structure of an E array, and FIG. 1C shows a basic structure of a CHECK array in which frequent kanji codes are registered corresponding to the subscripts of FIG. 1B in a new data structure of the present invention. ing.

【００６０】図１（ａ）のＢＡＳＥ配列構造に示すよう
に、ＢＡＳＥ配列には２種類の値を入れる。一方に値
は、従来通りの平行移動量ｄであり、これには、余り多
く用いられない（頻度の低い）漢字コードに適用する。
他方、頻出漢字コードについては、それらと異なる値
（すなわち、ＢＡＳＥ配列には２種類の値の内の他方の
値）を登録する。この他方の値は、図１（ｂ）の第２Ｂ
ＡＳＥ配列の横方向の添字Ｉ１に相当する。As shown in the BASE array structure of FIG. 1A, two kinds of values are put in the BASE array. On the other hand, the value is a conventional parallel movement amount d, which is applied to a kanji code which is not used often (infrequently).
On the other hand, for frequently appearing kanji codes, values different from those (that is, the other of the two values in the BASE array) are registered. This other value corresponds to the second B value in FIG.
This corresponds to a horizontal suffix I1 of the ASE array.

【００６１】図７は２バイト文字符号の領域を説明する
ための図であって、図７（ａ）は、７ビットコードの領
域（日本語のＪＩＳコードの領域、中国語のＧＢ２３１
２−８０の７ビットコード領域）であり、図７（ｂ）
は、８ビットコードの領域（日本語のＥＵＣコードの領
域、中国語のＧＢ２３１２−８０の８ビットコード領
域）である。FIG. 7 is a diagram for explaining a 2-byte character code area. FIG. 7A shows a 7-bit code area (Japanese JIS code area, Chinese GB231).
2-80 7-bit code area), and FIG.
Is an 8-bit code area (Japanese EUC code area, Chinese GB2312-80 8-bit code area).

【００６２】日本語のＪＩＳコードや中国語のＧＢ２３
１２−８０の７ビットコードのような７ビットコードの
領域は、図７（ａ）に示すように、７ビットコードを構
成する第１バイト（区）、第２バイト（点）の両方と
も、３３〜１２６の９４通り（１〜９４）のアドレスで
表現されている。同様の主旨で、日本語のＥＵＣコード
の領域や中国語のＧＢ２３１２−８０の８ビットコード
領域のような８ビットコードの領域は、図７（ｂ）に示
すように、８ビットコードを構成する第１バイト
（区）、第２バイト（点）の両方とも、１６１〜２５４
の９４通り（１〜９４）のアドレスで表現されている。Japanese JIS code and Chinese GB23
As shown in FIG. 7A, the area of the 7-bit code such as the 12-80 7-bit code includes both the first byte (division) and the second byte (point) of the 7-bit code. 33 to 126 addresses (1 to 94). For the same purpose, an 8-bit code region such as a Japanese EUC code region or a Chinese GB2312-80 8-bit code region constitutes an 8-bit code as shown in FIG. 7B. Both the first byte (ku) and the second byte (point) are 161-254
94 addresses (1 to 94).

【００６３】そこで本実施形態では、７ビットコードを
構成する第１バイト（区）、第２バイト（点）を表現す
る領域であるアドレス３３〜１２６を３つに分割して３
種類の部類ｄ１，ｄ２，ｄ３に分けを行っている。同様
の主旨で、８ビットコードを構成する第１バイト
（区）、第２バイト（点）を表現する領域であるアドレ
ス１６１〜２５４を３つに分割して３種類の部類ｄ１，
ｄ２，ｄ３に分けを行っている。Therefore, in the present embodiment, the addresses 33 to 126, which are the areas representing the first byte (division) and the second byte (dot) of the 7-bit code, are divided into three addresses.
Classification is made into the categories d1, d2, and d3. In the same manner, the addresses 161 to 254, which are the areas representing the first byte (division) and the second byte (dot) constituting the 8-bit code, are divided into three, and three types of d1,
Classification is performed for d2 and d3.

【００６４】そこで本実施形態では、図１（ｂ）に示す
第２ＢＡＳＥ配列の添字を決める際、頻出漢字コードに
連なる文字をコード値に応じて、３種類の部類に分け、
それぞれ独自の平行移動量を与えている。Therefore, in the present embodiment, when determining the subscript of the second BASE array shown in FIG.
Each has its own translation amount.

【００６５】すなわち、図１（ｃ）のＣＨＥＣＫ配列構
造に示すように、同じの文字（すなわち、親ノード）に
連なる文字コード（すなわち、子ノード）でも、そのコ
ード値（７ビットコードや８ビットコードのような範
囲）に応じて独自の平行移動量ｄ１，ｄ２，ｄ３を与え
ることにより、ＣＨＥＣＫ配列上で互いに重なるように
して、各漢字コードに空いた登録箇所を与えることがで
きる。That is, as shown in the CHECK array structure of FIG. 1C, even if a character code (that is, a child node) linked to the same character (that is, a parent node) has its code value (7-bit code or 8-bit code) By providing unique translation amounts d1, d2, and d3 in accordance with (a range like a code), it is possible to provide an empty registration location for each kanji code so that they overlap each other on the CHECK arrangement.

【００６６】換言すれば、同じ「大」の文字（親ノー
ド）に連なる漢字コード「円」、「王」、「家」、
「火」、「会」、「河」、「概」、「学」、「器」（い
ずれも、子ノード）でも、そのコード値に応じて、独自
の平行移動量ｄ１，ｄ２，ｄ３をＣＨＥＣＫ配列上で与
える。In other words, the kanji codes “yen”, “king”, “house”,
In the case of “fire”, “meeting”, “river”, “approximate”, “study”, and “vessel” (all of them are child nodes), depending on their code values, their own translation amounts d1, d2, d3 Provided on CHECK sequence.

【００６７】具体的には、親ノード「大」に連なる漢字
コード「円」、「王」、「家」、「火」、「河」の子ノ
ードに対しては、これらのコード値が８Ｋ〜１６Ｋに含
まれるので、平行移動量＝ｄ1がＣＨＥＣＫ配列上で与
えられる。同様の主旨で、親ノード「大」に連なる漢字
コード「会」、「概」、「学」、「器」の子ノードに対
しては、これらのコード値が１６〜２４Ｋに含まれるの
で、平行移動量＝ｄ2がＣＨＥＣＫ配列上で与えられ
る。このようにして、同じ「大」の文字（親ノード）に
連なる漢字コード「円」、「王」、「家」、「火」、
「会」、「河」、「概」、「学」、「器」（子ノード）
がＣＨＥＣＫ配列上で互いに重なるようにして、各漢字
コードに空いた登録箇所を与えることができる。Specifically, for the child nodes of the kanji codes "yen", "king", "house", "fire", and "kawa" connected to the parent node "large", these code values are 8K. Since it is included in Ｋ16K, the translation amount = d1 is given on the CHECK array. For the same purpose, for the child nodes of the kanji codes "kai", "approximately", "study", and "vessel" connected to the parent node "large", these code values are included in 16 to 24K. The translation amount = d2 is given on the CHECK array. In this way, the kanji codes "yen", "king", "house", "fire",
"Kai", "river", "about", "study", "vessel" (child node)
Can overlap with each other on the CHECK arrangement to provide an empty registration location for each kanji code.

【００６８】これにより、従来の各漢字コード毎に一律
にＣＨＥＣＫ配列上の平行移動量を与える場合に比べ、
ＣＨＥＣＫ配列の増大を抑えることができ、ダブル配列
の空間的効率化を図ることができる。As a result, compared with the conventional case where the translation amount on the CHECK arrangement is uniformly provided for each kanji code,
The increase in the CHECK arrangement can be suppressed, and the spatial efficiency of the double arrangement can be improved.

【００６９】以上説明したように、本発明によれば、従
来の高速、低容量の辞書データ構造としての一次元配列
であるダブル配列構造を更に発展させた新たなデータ構
造として、キー検索対象となる各文字列の語頭の指標１
０６ｃを添字とする第１配列と、第１配列で示された文
字列の内で文字列の語頭に連なる特定の文字に関する情
報を登録した第２配列と、平行移動量計算手段が計算し
たキー検索対象となる各文字列の文字を第１配列と第２
配列とに登録するのに必要な平行移動量ｄと文字列の語
尾に連なる文字に相当する値との和を添字として用いて
文字列の語頭の指標１０６ｃを登録した第２配列とを有
する新たなデータ構造を導入することにより、ＣＨＥＣ
Ｋ配列上で互いに重なるようにして各漢字コードに空い
た登録箇所を与えることができるようになり、その結
果、ＣＨＥＣＫ配列を極力拡張することなく、キーとし
ての全ての漢字コードをＣＨＥＣＫ配列上の空きに同時
に登録でき、またＣＨＥＣＫ配列を極力拡張することな
く、ある漢字コードに連なる各漢字コードの相対位置関
係を維持したままでＣＨＥＣＫ配列に登録できるように
なり、更に加えて、登録できずにすき間（スパース
（疎））が多く空くことをできるだけ回避することがで
きるようになる。これにより、キー集合が予め分かって
いるような準静的キー集合を検索対象として格納した辞
書を構成し、後で適宜キーを追加登録して拡張するよう
なトライ配列構造の記憶容量を極力小さくすることがで
きるようになる。（第１実施形態）図２は、本発明の文字コード登録探索
装置の第１実施形態を説明するための機能ブロック図で
ある。As described above, according to the present invention, the key search target is a new data structure which is a further development of the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. Index 1 at the beginning of each character string
A first array having a subscript of 06c, a second array in which information relating to a specific character connected to the beginning of the character string in the character string indicated by the first array is registered, and a key calculated by the translation amount calculating means. The characters of each character string to be searched are set in the first array and the second
A second array in which the index 106c of the beginning of the character string is registered using the sum of the translation amount d required for registration in the array and the value corresponding to the character connected to the end of the character string as a subscript. CHEC by introducing a simple data structure
It becomes possible to give an empty registration location to each kanji code by overlapping each other on the K array, and as a result, without expanding the CHECK array as much as possible, all the kanji codes as keys can be assigned on the CHECK array. It is possible to register in the vacant space at the same time, and it is possible to register in the CHECK array while maintaining the relative positional relationship of each kanji code linked to a certain kanji code without expanding the CHECK array as much as possible. It is possible to avoid as many gaps (sparseness) as possible. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to (First Embodiment) FIG. 2 is a functional block diagram for explaining a first embodiment of the character code registration and search device of the present invention.

【００７０】図２に示す文字コード登録探索装置１０
は、データ構造としての一次元配列であるダブル配列構
造にキー検索対象となる漢字コード等の文字列を登録
し、文字列を探索する機能を有し、一覧表手段１０１、
頻出文字コード格納手段１０３、頻出文字コード選択手
段１０４、辞書手段１０６、部類格納手段１０８、ＢＡ
ＳＥ配列手段１０２、コード分類手段１０７、平行移動
量計算手段１０９、平行移動量格納手段１１０、鍵候補
地点算出手段１１１、第２ＢＡＳＥ配列１０５、ＣＨＥ
ＣＫ配列手段１１２をを中心とする構成になっている。
これらの各構成要素は、マイクロコンピュータを用いた
プログラミングによって実現されている。The character code registration search device 10 shown in FIG.
Has a function of registering a character string such as a kanji code as a key search target in a double array structure which is a one-dimensional array as a data structure, and searching for the character string.
Frequent character code storage means 103, frequent character code selection means 104, dictionary means 106, category storage means 108, BA
SE arrangement means 102, code classification means 107, parallel movement amount calculation means 109, parallel movement amount storage means 110, key candidate point calculation means 111, second BASE arrangement 105, CHE
The configuration is centered on the CK arrangement means 112.
Each of these components is realized by programming using a microcomputer.

【００７１】一覧表手段１０１は、熟語によく使用され
る漢字コードの一覧表を作成して当該漢字コードの一覧
表から選択した選択文字コード１０３ｃを出力する機能
を有している。The list means 101 has a function of creating a list of kanji codes frequently used for idioms and outputting a selected character code 103c selected from the kanji code list.

【００７２】頻出文字コード選択手段１０４は、頻度が
何番目までの漢字コードを選択するかに関する頻度閾値
を出力する機能を有している。The frequent character code selecting means 104 has a function of outputting a frequency threshold value as to how many kanji codes are to be selected.

【００７３】頻出文字コード格納手段１０３は、一覧表
手段１０１から選択した頻出文字コードを格納し、選択
した頻出文字コード１０３ａ及び選出した頻度文字コー
ドの指標１０６ｃを出力する機能を有している。The frequent character code storage means 103 has a function of storing the frequent character codes selected from the list means 101 and outputting the selected frequent character codes 103a and the index 106c of the selected frequency character codes.

【００７４】辞書手段１０６は、漢字コードで構成され
た熟語を登録した文字コード辞書であって、注目してい
る文字が選出した漢字コードに基づいた熟語の語頭か否
かで作業を分け、語頭の頻出漢字コードに連なる漢字コ
ードを分類して得た各部類１０６ａを出力する機能を有
している。The dictionary means 106 is a character code dictionary in which idioms composed of kanji codes are registered. The dictionary means 106 divides the work depending on whether or not the character of interest is the beginning of an idiom based on the selected kanji code. Has a function of outputting each class 106a obtained by classifying the kanji codes linked to the frequently appearing kanji codes.

【００７５】部類格納手段１０８は、辞書手段１０６か
らの語頭の頻出漢字コードに連なる漢字コードを分類し
て得た各部類１０６ａを格納する機能を有している。The category storage means 108 has a function of storing the respective categories 106a obtained by classifying the kanji codes linked to the frequently appearing kanji codes from the dictionary means 106.

【００７６】ＢＡＳＥ配列手段１０２は、選出されたた
文字コード１０１ｃの追番１０３ｂを算出すると同時
に、ＢＡＳＥ配列上の同漢字コードの指標１０６ｃに格
納する第２配列として機能する。The BASE arrangement means 102 functions as a second arrangement for calculating the serial number 103b of the selected character code 101c and for storing it in the index 106c of the same kanji code on the BASE arrangement.

【００７７】コード分類手段１０７は、語頭の頻出漢字
コードに連なる文字を分類するために、熟語の２番目の
漢字コードを、２番目の漢字コードの漢字コード中の幾
つかのビットで分類する機能を有している。The code classifying means 107 has a function of classifying the second kanji code of the idiom with some bits in the kanji code of the second kanji code in order to classify the characters connected to the frequent kanji code at the beginning of the word. have.

【００７８】平行移動量計算手段１０９は、前記各部類
１０６ａ毎に、任意の平行移動量１０９ａを、同部類１
０６ａの各漢字コードのコード値に加算した値が、何れ
も、ＣＨＥＣＫ配列上で空きの箇所に来るような、最小
の平行移動量１０９ａを計算する機能を有している。The parallel movement amount calculating means 109 calculates an arbitrary parallel movement amount 109a for each class 106a.
It has a function of calculating the minimum translation amount 109a such that any value added to the code value of each kanji code 06a comes to an empty place on the CHECK array.

【００７９】平行移動量格納手段１１０は、平行移動量
計算手段１０９からの平行移動量１０９ａを格納し、平
行移動量１１０ａをＢＡＳＥ配列手段１０２の語頭の指
標に相当する添字位置に格納する機能を有している。The parallel movement amount storage means 110 has a function of storing the parallel movement amount 109a from the parallel movement amount calculating means 109 and storing the parallel movement amount 110a in a subscript position corresponding to the index at the beginning of the BASE array means 102. Have.

【００８０】鍵候補地点算出手段１１１は、前記各部類
毎に、平行移動量計算手段から入力した平行移動量を、
同部類の各漢字コードのコード値に加算した和をＣＨＥ
Ｃｋ配列上で添字の値とし、同添字位置に同部類の漢字
コードの親に当たる当該語頭の指標を登録し、当該和の
値を（（従来の語頭）＋（注目の文字））からなる次の
語頭の指標とする機能を有している。The key candidate point calculating means 111 calculates, for each class, the parallel movement amount input from the parallel movement amount calculating means,
The sum added to the code value of each kanji code of the same class is CHE
A subscript value is set on the Ck array, an index at the beginning of the word corresponding to the parent of the kanji code of the same class is registered at the subscript position, and the sum value is expressed by ((conventional beginning) + (character of interest)) It has a function as an index of the beginning of the word.

【００８１】第２ＢＡＳＥ配列１０５は、コード分類手
段１０７からのコード値１０７ａと一覧表手段１０１か
らの追番１０３ｂとに基づいて、平行移動量格納手段１
１０が出力する各部類毎の平行移動量１１０ａを格納す
るための第２配列として機能する。The second BASE array 105 stores the translation amount storage unit 1 based on the code value 107a from the code classification unit 107 and the serial number 103b from the list unit 101.
It functions as a second array for storing the parallel movement amount 110a for each class output by 10.

【００８２】ＣＨＥＣＫ配列手段１１２は、平行移動量
格納手段１１０からの平行移動量１１１ａと平行移動量
計算手段１０９からの部類の各漢字コードにおける内部
設定値１０９ａをＣＨＥＣＫ配列の語頭の指標１０６ｃ
の箇所に登録する第１配列として機能する。The CHECK arrangement means 112 stores the parallel movement amount 111a from the parallel movement amount storage means 110 and the internal set value 109a in each kanji code of the class from the parallel movement amount calculation means 109 as an index 106c at the beginning of the CHECK arrangement.
Function as the first array registered in the location of

【００８３】以上説明したように、第１実施形態の文字
コード登録探索装置１０によれば、従来の高速、低容量
の辞書データ構造としての一次元配列であるダブル配列
構造を更に発展させた新たなデータ構造として、平行移
動量格納手段１１０からの平行移動量１１１ａと平行移
動量計算手段１０９からの部類の各漢字コードにおける
内部設定値１０９ａをＣＨＥＣＫ配列の語頭の指標１０
６ｃの箇所に登録する第１配列としてのＣＨＥＣＫ配列
手段１１２と、選出されたた漢字コード１０１ｃの追番
１０３ｂを算出すると同時に、ＢＡＳＥ配列上の同漢字
コードの指標１０６ｃに格納する第２配列としてのＢＡ
ＳＥ配列手段１０２と、コード分類手段１０７からのコ
ード値１０７ａと一覧表手段１０１からの追番１０３ｂ
とに基づいて、平行移動量格納手段１１０が出力する各
部類毎の平行移動量１１０ａを格納するための第２配列
としての第２ＢＡＳＥ配列１０５とを有する新たなデー
タ構造を導入し、ＢＡＳＥ配列に登録される値を２種類
とし、一方に値を従来通りの平行移動量として余り多く
用いられない（頻度の低い）漢字コードに適用し、他方
の値を第２ＢＡＳＥ配列の添字のいずれか一つの添字と
して頻出漢字コードに適用し、第２ＢＡＳＥ配列の添字
を頻出漢字コードに連なる文字のコード値に応じて３種
類に分け、それぞれ独自の平行移動量を与えることによ
り、ＣＨＥＣＫ配列上で互いに重なるようにして各漢字
コードに空いた登録箇所を与えることできるようにな
り、その結果、ＣＨＥＣＫ配列を極力拡張することな
く、キーとしての全ての漢字コードをＣＨＥＣＫ配列上
の空きに同時に登録できるようになり、またＣＨＥＣＫ
配列を極力拡張することなく、ある漢字コードに連なる
各漢字コードの相対位置関係を維持したままでＣＨＥＣ
Ｋ配列に登録できるようになり、更に加えて、登録でき
ずにすき間（スパース（疎））が多く空くことをできる
だけ回避することができるようになる。これにより、キ
ー集合が予め分かっているような準静的キー集合を検索
対象として格納した辞書を構成し、後で適宜キーを追加
登録して拡張するようなトライ配列構造の記憶容量を極
力小さくすることができるようになる。As described above, according to the character code registration / search apparatus 10 of the first embodiment, a new double array structure which is a one-dimensional array as a conventional high-speed, low-capacity dictionary data structure is further developed. As a simple data structure, the internal setting value 109a in each kanji code of the class from the parallel movement amount calculating means 109 and the parallel movement amount 111a from the parallel movement amount storing means 110 is stored in the CHECK array at the beginning of the index 10.
CHECK array means 112 as a first array to be registered at the position 6c and a second array to be stored in the index 106c of the same kanji code on the BASE array while calculating the additional number 103b of the selected kanji code 101c. BA
SE array means 102, code value 107a from code classification means 107, and serial number 103b from list means 101
Based on the above, a new data structure having a second base array 105 as a second array for storing the parallel movement amount 110a for each class output by the parallel movement amount storage unit 110 is introduced, and Two types of values are registered. One value is applied to a kanji code that is not used so often as a parallel translation amount (low frequency), and the other value is any one of the subscripts of the second BASE array. Applied to the frequent kanji code as a subscript, the subscripts of the second BASE array are divided into three types according to the code values of the characters connected to the frequent kanji code, and each is given its own parallel movement amount, so that they overlap each other on the CHECK array. It is possible to give an empty registration location to each kanji code, and as a result, without expanding the CHECK arrangement as much as possible, It becomes the kanji code can be registered at the same time free on CHECK sequence, also CHECK
CHEC while maintaining the relative positional relationship of each kanji code connected to a certain kanji code without expanding the arrangement as much as possible
It becomes possible to register in the K array, and in addition, it is possible to avoid as many gaps (sparseness) as possible without registering as much as possible. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【００８４】図３は、図２の文字コード登録探索装置で
実行される文字コード登録探索方法を用いて漢字コード
の登録を行う場合の一実施形態を説明するためのフロー
チャートである。FIG. 3 is a flowchart for explaining an embodiment in which a kanji code is registered using the character code registration and search method executed by the character code registration and search device of FIG.

【００８５】図３に示す文字コード登録探索方法の実施
形態は、第１実施形態の文字コード登録探索装置１０で
実行されるものであって、一覧表工程（ステップＳ
２）、頻出文字コード選択工程（ステップＳ２）、頻出
文字コード格納工程（ステップＳ２）、辞書工程（ステ
ップＳ３）、分類結果格納工程（ステップＳ９）、ＢＡ
ＳＥ配列工程（ステップＳ６，１１）、コード分類工程
（ステップＳ７）、平行移動量計算工程（ステップＳ
８，１０）、平行移動量格納工程（ステップＳ８，１
０，１１）、鍵候補地点算出工程（ステップＳ９，１
２）、第２ＢＡＳＥ配列工程（ステップＳ９）、ＣＨＥ
ＣＫ配列工程（ステップＳ１２）を中心とする論理構成
になっており、第１実施形態の文字コード登録探索装置
１０で実行可能なプログラムコードで記述されている。The embodiment of the character code registration and search method shown in FIG. 3 is executed by the character code registration and search device 10 of the first embodiment.
2), frequent character code selection step (step S2), frequent character code storage step (step S2), dictionary step (step S3), classification result storage step (step S9), BA
SE array process (steps S6, S11), code classification process (step S7), translation amount calculation process (step S7)
8, 10), the parallel movement amount storing step (step S8, 1)
0, 11), a key candidate point calculation step (steps S9, 1)
2), second BASE arrangement step (step S9), CHE
It has a logical configuration centering on the CK arrangement process (step S12), and is described in a program code executable by the character code registration and search device 10 of the first embodiment.

【００８６】一覧表工程（ステップＳ２）は、熟語によ
く使用される漢字コードの一覧表を作成して当該漢字コ
ードの一覧表から選択した選択漢字コード１０３ｃを出
力する機能を有し、一覧表手段１０１が中心になって実
行する処理工程である。The list step (step S2) has a function of creating a list of kanji codes frequently used for idioms and outputting the selected kanji code 103c selected from the kanji code list. This is a processing step mainly executed by the means 101.

【００８７】頻出文字コード選択工程（ステップＳ２）
は、頻度が何番目までの漢字コードを選択するかに関す
る頻度閾値を出力する機能を有し、頻出文字コード格納
手段１０３が中心になって実行する処理工程である。Frequent character code selection step (step S2)
Is a processing step which has a function of outputting a frequency threshold value as to how many kanji codes are to be selected, and is executed mainly by the frequently appearing character code storage means 103.

【００８８】頻出文字コード格納工程（ステップＳ２）
は、一覧表工程（ステップＳ２）から選択した頻出漢字
コードを格納し、選択した頻出文字コード１０３ａ及び
選出した頻度文字コードの指標１０６ｃを出力する機能
を有し、部類格納手段１０８が中心になって実行する処
理工程である。Frequent character code storage step (step S2)
Has a function of storing frequently-used kanji codes selected from the list process (step S2) and outputting a selected frequently-used character code 103a and an index 106c of a selected frequency character code. This is a processing step to be executed.

【００８９】辞書工程（ステップＳ３）は、漢字コード
で構成された熟語を登録した文字コード辞書であって、
注目している文字が選出した漢字コードに基づいた熟語
の語頭か否かで作業を分け、語頭の頻出漢字コードに連
なる漢字コードを分類して得た各部類１０６ａを出力す
る機能を有し、辞書手段１０６が中心になって実行する
処理工程である。The dictionary step (step S3) is a character code dictionary in which idioms composed of kanji codes are registered.
It has a function to divide the work depending on whether the character of interest is the beginning of a idiom based on the selected kanji code, and to output each class 106a obtained by classifying the kanji code connected to the frequent kanji code at the beginning of the word, This is a processing step mainly executed by the dictionary unit 106.

【００９０】分類結果格納工程（ステップＳ９）は、辞
書工程（ステップＳ３）が生成した語頭の頻出漢字コー
ドに連なる漢字コードを分類して得た各部類１０６ａを
格納する機能を有し、部類格納手段１０８が中心になっ
て実行する処理工程である。The classification result storing step (step S9) has a function of storing each class 106a obtained by classifying the kanji codes linked to the frequently appearing kanji codes at the beginning of the word generated by the dictionary step (step S3). This is a processing step mainly executed by the means 108.

【００９１】第２配列工程としてのＢＡＳＥ配列工程
（ステップＳ６，１１）は、選出されたた文字コード１
０１ｃの追番１０３ｂを算出すると同時に、ＢＡＳＥ配
列上の同漢字コードの指標１０６ｃに格納する機能を有
し、ＢＡＳＥ配列手段１０２が中心になって実行する処
理工程である。The BASE arrangement step (steps S6 and S11) as the second arrangement step includes the selected character code 1
This is a processing step that has a function of calculating the additional number 103b of 01c and storing the same in the index 106c of the same kanji code on the BASE array, and is executed mainly by the BASE array unit 102.

【００９２】コード分類工程（ステップＳ７）は、語頭
の頻出漢字コードに連なる文字を分類するために、熟語
の２番目の漢字コードを、２番目の漢字コードの漢字コ
ード中の幾つかのビットで分類する機能を有し、コード
分類手段１０７が中心になって実行する処理工程であ
る。In the code classification step (step S7), the second kanji code of the idiom is classified by several bits in the kanji code of the second kanji code in order to classify characters connected to the frequent kanji code at the beginning of the word. This is a processing step that has a function of classifying and is executed mainly by the code classifying unit 107.

【００９３】平行移動量計算工程（ステップＳ８，１
０）は、各部類１０６ａ毎に、任意の平行移動量１０９
ａを、同部類１０６ａの各漢字コードのコード値に加算
した値が、何れも、ＣＨＥＣＫ配列上で空きの箇所に来
るような、最小の平行移動量１０９ａを計算する機能を
有し、平行移動量計算手段１０９が中心になって実行す
る処理工程である。Parallel movement amount calculating step (steps S8, 1)
0) indicates an arbitrary parallel movement amount 109 for each class 106a.
has the function of calculating the minimum parallel movement amount 109a such that any value obtained by adding a to the code value of each kanji code of the same class 106a comes to an empty place on the CHECK array. This is a processing step executed mainly by the quantity calculation unit 109.

【００９４】平行移動量格納工程（ステップＳ８，１
０，１１）は、平行移動量計算工程（ステップＳ８，１
０）が生成した平行移動量１０９ａを格納し、平行移動
量１１０ａをＢＡＳＥ配列手段１０２の語頭の指標に相
当する添字位置に格納する機能を有し、平行移動量格納
手段１１０が中心になって実行する処理工程である。Parallel movement amount storing step (steps S8, S1)
(0, 11) is a parallel movement amount calculating step (steps S8, 1).
0) has the function of storing the generated parallel movement amount 109a and storing the parallel movement amount 110a at a subscript position corresponding to the index at the beginning of the BASE arrangement means 102, with the parallel movement amount storage means 110 as the center. This is the processing step to be executed.

【００９５】鍵候補地点算出工程（ステップＳ９，１
２）は、各部類毎に、（ステップＳ８，１０，１１）か
ら入力した平行移動量を、同部類の各漢字コードのコー
ド値に加算した和をＣＨＥＣｋ配列上で添字の値とし、
同添字位置に同部類の漢字コードの親に当たる当該語頭
の指標を登録し、当該和の値を（（従来の語頭）＋（注
目の文字））からなる次の語頭の指標とする機能を有
し、鍵候補地点算出手段１１１が中心になって実行する
処理工程である。Key candidate point calculation step (steps S9, 1)
2) For each class, the sum of the translation amount input from (Steps S8, 10, 11) and the code value of each kanji code of the same class is used as a subscript value on the CHECk array,
Registers the index of the beginning of the word that is the parent of the kanji code of the same class at the same subscript position, and uses the sum value as the index of the next beginning of ((conventional beginning) + (character of interest)) Then, this is a processing step executed mainly by the key candidate point calculation means 111.

【００９６】第２配列工程としての第２ＢＡＳＥ配列工
程（ステップＳ９）は、コード分類工程（ステップＳ
７）が生成したコード値１０７ａと一覧表工程（ステッ
プＳ２）が生成した追番１０３ｂとに基づいて、平行移
動量格納工程（ステップＳ８，１０，１１）が出力する
各部類毎の平行移動量１１０ａを格納する機能を有し、
第２ＢＡＳＥ配列１０５が中心になって実行する処理工
程である。The second BASE arranging step (step S9) as the second arranging step is a code classifying step (step S9).
Based on the code value 107a generated by 7) and the serial number 103b generated by the list process (step S2), the parallel movement amount for each class output by the parallel movement amount storage step (steps S8, 10, 11). 110a has a function of storing
This is a processing step mainly executed by the second BASE array 105.

【００９７】第１配列工程としてのＣＨＥＣＫ配列工程
（ステップＳ１２）は、平行移動量格納工程（ステップ
Ｓ８，１０，１１）が生成した平行移動量１１１ａと平
行移動量計算工程（ステップＳ８，１０）が生成した最
小の平行移動量１０９ａをＣＨＥＣＫ配列の語頭の指標
１０６ｃの箇所に登録する機能を有し、ＣＨＥＣＫ配列
手段１１２が中心になって実行する処理工程である。The CHECK arrangement step (step S12) as the first arrangement step includes a parallel movement amount 111a and a parallel movement amount calculation step (steps S8, 10) generated by the parallel movement amount storing step (steps S8, S11). Has the function of registering the minimum translation amount 109a generated by the CHECK array at the position of the index 106c at the beginning of the CHECK array, and is a processing step mainly executed by the CHECK array unit 112.

【００９８】以上説明したように、図３に示す文字コー
ド登録探索方法の実施形態によれば、従来の高速、低容
量の辞書データ構造としての一次元配列であるダブル配
列構造を更に発展させた新たなデータ構造として、平行
移動量格納工程（ステップＳ８，１０，１１）が生成し
た平行移動量１１１ａと平行移動量計算工程（ステップ
Ｓ８，１０）が生成した最小の平行移動量１０９ａをＣ
ＨＥＣＫ配列の語頭の指標１０６ｃの箇所に登録する第
１配列工程としてのＣＨＥＣＫ配列工程（ステップＳ１
２）と、選出されたた漢字コード１０１ｃの追番１０３
ｂを算出すると同時に、ＢＡＳＥ配列上の同漢字コード
の指標１０６ｃに格納する第２配列工程としてのＢＡＳ
Ｅ配列工程（ステップＳ６，１１）と、平行移動量格納
工程（ステップＳ８，１０，１１）が出力する各部類毎
の平行移動量１１０ａを格納するための第２配列工程と
しての第２ＢＡＳＥ配列工程（ステップＳ９）と実行し
て作成した新たなデータ構造を導入し、ＢＡＳＥ配列に
登録される値を２種類とし、一方に値を従来通りの平行
移動量として余り多く用いられない（頻度の低い）漢字
コードに適用し、他方の値を第２ＢＡＳＥ配列の添字の
いずれか一つの添字として頻出漢字コードに適用し、第
２ＢＡＳＥ配列の添字を頻出漢字コードに連なる文字の
コード値に応じて３種類に分け、それぞれ独自の平行移
動量を与えることにより、ＣＨＥＣＫ配列上で互いに重
なるようにして各漢字コードに空いた登録箇所を与える
ことできるようになり、その結果、ＣＨＥＣＫ配列を極
力拡張することなく、キーとしての全ての漢字コードを
ＣＨＥＣＫ配列上の空きに同時に登録できるようにな
り、またＣＨＥＣＫ配列を極力拡張することなく、ある
漢字コードに連なる各漢字コードの相対位置関係を維持
したままでＣＨＥＣＫ配列に登録できるようになり、更
に加えて、登録できずにすき間（スパース（疎））が多
く空くことをできるだけ回避することができるようにな
る。これにより、キー集合が予め分かっているような準
静的キー集合を検索対象として格納した辞書を構成し、
後で適宜キーを追加登録して拡張するようなトライ配列
構造の記憶容量を極力小さくすることができるようにな
る。As described above, according to the embodiment of the character code registration search method shown in FIG. 3, the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure is further developed. As a new data structure, the parallel movement amount 111a generated by the parallel movement amount storing step (steps S8, 10, 11) and the minimum parallel movement amount 109a generated by the parallel movement amount calculating step (steps S8, 10) are represented by C.
A CHECK arrangement step as a first arrangement step (step S1) to be registered at the index 106c at the beginning of the HECK arrangement
2) and the additional number 103 of the selected kanji code 101c
b, and at the same time, the BAS as a second arraying step is stored in the index 106c of the same kanji code on the BASE array.
A second BASE arrangement step as a second arrangement step for storing the parallel movement amount 110a for each class output by the E arrangement step (steps S6, 11) and the parallel movement amount storage step (steps S8, 10, 11). (Step S9) and a new data structure created by execution are introduced, and two types of values are registered in the BASE array, and one of the values is not used as a parallel translation amount as in the past (infrequently used values). ) Applied to the kanji code, the other value is applied to the frequent kanji code as one of the subscripts of the second BASE array, and the subscript of the second BASE array is divided into three types according to the code values of the characters connected to the frequent kanji code By giving unique translation amounts to each other, it is possible to provide an empty registration location for each kanji code so that they overlap each other on the CHECK array. As a result, all the kanji codes as keys can be registered simultaneously in the empty space on the CHECK array without expanding the CHECK array as much as possible. It is possible to register in the CHECK array while maintaining the relative positional relationship of each kanji code, and in addition, it is possible to avoid as much as possible a gap (sparse) due to failure to register. . Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured,
The storage capacity of a tri-array structure in which a key is added and registered later to expand the memory can be reduced as much as possible.

【００９９】（第２実施形態）図４は、本発明の文字コ
ード登録探索装置の第２実施形態を説明するための機能
ブロック図である。(Second Embodiment) FIG. 4 is a functional block diagram for explaining a character code registration and search device according to a second embodiment of the present invention.

【０１００】図４に示す文字コード登録探索装置１０
は、データ構造としての一次元配列であるダブル配列構
造にキー検索対象となる漢字コード等の文字列を登録
し、文字列を探索する機能を有し、文書入力手段２０
１、文書入力手段２０１、ＢＡＳＥ配列手段１０２、登
録値判断手段２０２、コード分類手段１０７、第２ＢＡ
ＳＥ配列１０５、平行移動量格納手段１１０、鍵候補地
点算出手段１１１、ＣＨＥＣＫ配列手段１１２、鍵・語
頭照合手段２０３を中心とする構成になっている。これ
らの各構成要素は、マイクロコンピュータを用いたプロ
グラミングによって実現されている。The character code registration search device 10 shown in FIG.
Has a function of registering a character string such as a kanji code as a key search target in a double array structure which is a one-dimensional array as a data structure, and searching for the character string.
1. Document input means 201, BASE arrangement means 102, registration value judgment means 202, code classification means 107, second BA
The configuration is centered on the SE array 105, the translation amount storage unit 110, the key candidate point calculation unit 111, the CHECK array unit 112, and the key / head prefix matching unit 203. Each of these components is realized by programming using a microcomputer.

【０１０１】文書入力手段２０１は、始めは語頭として
トライ構造の根を指定すると同時に、端記号としてのエ
ンドマーク＃を語頭Ｗに設定し、続いて、探索対象の文
字としての漢字コードｂの入力を指示し、入力された漢
字コードｂの語頭Ｗを検出するする機能を有している。The document input means 201 first designates the root of the trie structure as the beginning of a word, sets the end mark # as an end symbol to the beginning of the word W, and then inputs the kanji code b as the character to be searched. And detects the beginning W of the input kanji code b.

【０１０２】文書入力手段２０１は、語頭Ｗまたは漢字
コードの指標１０６ｃに相当する箇所から数値１０２ａ
を入力する機能を有している。The document input means 201 inputs the numerical value 102a from the position corresponding to the initial W or the kanji code index 106c.
Is provided.

【０１０３】登録値判断手段２０２は、ＢＡＳＥ配列手
段１０２から入力した数値１０２ａが、頻出する語頭文
字コードの追番なのか、そうでない語頭文字コードの指
標なのか、または文字列の途中にある語頭Ｗの指標１０
６ｃなのかを判定し、トライを構成する指標の範囲を超
えた指標が与えられた場合にこの指標を、頻出文字コー
ドの追番２０２ａとして出力すると同時に、ＢＡＳＥ配
列手段１０２から入力した数値１０２ａが語頭Ｗの指標
１０６ｃでなくかつ頻出文字コードでない場合に平行移
動量２０２ｂを出力する機能を有している。The registered value judging means 202 determines whether the numerical value 102a inputted from the BASE arranging means 102 is an additional number of a frequently appearing initial character code, an index of an infrequent initial character code, or a part of the character string. Index 10 of a certain initial W
6c, and when an index exceeding the range of the index constituting the trie is given, this index is output as an additional number 202a of the frequently appearing character code, and at the same time, the numerical value 102a input from the BASE It has a function of outputting the parallel movement amount 202b when the index is not the index 106c of the initial W and the character code is not a frequent character code.

【０１０４】コード分類手段１０７は、語頭の頻出漢字
コードに連なる文字を分類するために、ＢＡＳＥ配列手
段から入力した数値が頻出する語頭文字コードの追番が
ＢＡＳＥ配列手段から入力された場合、この先頭漢字に
連なる漢字コードの漢字コード中の幾つかのビットで分
類する機能を有している。The code classifying means 107 classifies characters connected to the frequent kanji code at the beginning of the word, if the serial number of the initial character code in which the numerical value input from the BASE array means frequently appears is input from the BASE array means, It has a function of classifying by some bits in the kanji code of the kanji code linked to the leading kanji.

【０１０５】第２ＢＡＳＥ配列１０５は、登録値判断手
段２０２から出力された頻出文字コードの追番２０２ａ
とし、漢字コードｂのコード値の分類に相当する箇所か
ら、平行移動量１０５ａを格納する機能を有している。The second BASE array 105 includes the additional number 202a of the frequent character code output from the registered value determination means 202.
And has a function of storing the parallel movement amount 105a from a location corresponding to the classification of the code value of the kanji code b.

【０１０６】平行移動量格納手段１１０は、ＢＡＳＥ配
列手段１０２から入力した数値１０２ａが頻出でない語
頭文字コードの指標または語頭Ｗの指標１０６ｃである
場合に数値１０２ａを平行移動量として格納する機能を
有している。The parallel movement amount storing means 110 has a function of storing the numerical value 102a as the parallel movement amount when the numerical value 102a input from the BASE arranging means 102 is the index of the initial character code or the index 106c of the initial W which is not frequent. Have.

【０１０７】鍵候補地点算出手段１１１は、ＣＨＥＣＫ
配列上の平行移動量と連なる文字ｂのコード値との和を
算出して出力する機能を有している。The key candidate point calculation means 111 checks the CHECK
It has a function of calculating and outputting the sum of the parallel movement amount on the array and the code value of the consecutive character b.

【０１０８】ＣＨＥＣＫ配列手段１１２は、鍵候補地点
算出手段１１１からの和１１１ａに相当する箇所にキー
を入力する機能を有している。The CHECK arrangement means 112 has a function of inputting a key to a position corresponding to the sum 111a from the key candidate point calculation means 111.

【０１０９】鍵・語頭照合手段２０３は、ＣＨＥＣＫ配
列手段１１２が入力したキーが、語頭文字コードの指標
または語頭Ｗの指標１０６ｃに等しいかを判断し、入力
したキーが語頭文字コードの指標または語頭Ｗの指標１
０６ｃに等しいと判断した場合に熟語が辞書に登録され
ていると判断する機能を有している。The key / prefix collation means 203 determines whether the key input by the CHECK arrangement means 112 is equal to the index of the initial character code or the index 106c of the initial W, and determines whether the input key is the index of the initial character code. Or index 1 of initial W
It has a function of determining that the idiom is registered in the dictionary when it is determined that it is equal to 06c.

【０１１０】以上説明したように、第２実施形態の文字
コード登録探索装置１０によれば、従来の高速、低容量
の辞書データ構造としての一次元配列であるダブル配列
構造を更に発展させた新たなデータ構造として、鍵候補
地点算出手段１１１からの和１１１ａに相当する箇所に
キーを入力するＣＨＥＣＫ配列手段１１２と、語頭Ｗま
たは漢字コードの指標１０６ｃに相当する箇所から数値
１０２ａを入力する第２配列としてのＢＡＳＥ配列手段
１０２と、登録値判断手段２０２から出力された頻出文
字コードの追番２０２ａとし、漢字コードｂのコード値
の分類に相当する箇所から、平行移動量１０５ａを格納
するための第２配列としての第２ＢＡＳＥ配列１０５と
を有する新たなデータ構造を導入することにより、キー
集合が予め分かっているような準静的キー集合を検索対
象として格納した辞書を構成し、後で適宜キーを追加登
録して拡張するようなトライ配列構造の記憶容量を極力
小さくすることができるようになる。その結果、なるべ
く記憶容量の小さいデータ構造としての一次元配列であ
るダブル配列構造（すなわち、トライ配列構造）に格納
し、このトライ配列構造を検索キーを用いて高速にパタ
ーンマッチングすることができるようになる。As described above, according to the character code registration / search apparatus 10 of the second embodiment, a new double array structure, which is a one-dimensional array as a conventional high-speed, low-capacity dictionary data structure, is further developed. CHECK arrangement means 112 for inputting a key to a place corresponding to the sum 111a from the key candidate point calculation means 111, and a second value 102a for inputting a numerical value 102a from a place corresponding to the index 106c of the initial W or the kanji code. The BASE array means 102 as an array and the additional number 202a of the frequently appearing character code output from the registered value determination means 202, for storing the parallel movement amount 105a from a position corresponding to the classification of the code value of the kanji code b By introducing a new data structure with the second base array 105 as the second array, the key set can be known in advance. The quasi-static key set like being constitutes a dictionary with a search target, so the storage capacity of the tri-array structured to extend additionally registered later appropriate key can be minimized. As a result, the data is stored in a double array structure (that is, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible, and the tri-array structure can be subjected to high-speed pattern matching using a search key. become.

【０１１１】図５は、図４の文字コード登録探索装置で
実行される文字コード登録探索方法を用いて漢字コード
の登録を行う場合の一実施形態を説明するためのフロー
チャートである。FIG. 5 is a flowchart for explaining an embodiment in which a kanji code is registered using the character code registration and search method executed by the character code registration and search device of FIG.

【０１１２】図５に示す文字コード登録探索方法の実施
形態は、第２実施形態の文字コード登録探索装置１０で
実行されるものであって、文書入力工程（ステップＰ
２，Ｐ３）、ＢＡＳＥ配列工程（ステップＰ４）、登録
値判断工程（ステップＰ５）、コード分類工程（ステッ
プＰ６）、第２ＢＡＳＥ配列工程（ステップＰ７）、平
行移動量格納工程（ステップＰ７，Ｐ８）、鍵候補地点
算出工程（ステップＰ９）、ＣＨＥＣＫ配列工程（ステ
ップＰ９）、鍵・語頭照合工程（ステップＰ１０，ｐ１
１，Ｐ１２）を中心とする構成になっている。これらの
各構成要素は、マイクロコンピュータを用いたプログラ
ミングによって実現されている。The embodiment of the character code registration and search method shown in FIG. 5 is executed by the character code registration and search device 10 of the second embodiment, and includes a document input step (step P).
2, P3), BASE arranging step (step P4), registered value judging step (step P5), code classification step (step P6), second BASE arranging step (step P7), parallel movement amount storing step (steps P7, P8) Key candidate point calculation step (step P9), CHECK arrangement step (step P9), key / prefix verification step (steps P10, p1)
1, P12). Each of these components is realized by programming using a microcomputer.

【０１１３】文書入力工程（ステップＰ２，Ｐ３）は、
始めは語頭としてトライ構造の根を指定すると同時に、
端記号としてのエンドマーク＃を語頭Ｗに設定し、続い
て、探索対象の文字としての漢字コードｂの入力を指示
し、入力された漢字コードｂの語頭Ｗを検出するする機
能を有し、文書入力手段２０１が中心になって実行する
処理工程である。The document input process (steps P2 and P3)
At first, specify the root of the trie structure as the beginning of the word,
A function of setting an end mark # as an end symbol at the beginning of a word W, subsequently instructing input of a kanji code b as a character to be searched, and detecting the beginning of the kanji code b that has been input, This is a processing step executed mainly by the document input unit 201.

【０１１４】ＢＡＳＥ配列工程（ステップＰ４）は、語
頭Ｗまたは漢字コードの指標１０６ｃに相当する箇所か
ら数値１０２ａを入力する機能を有し、ＢＡＳＥ配列手
段１０２が中心になって実行する処理工程である。The BASE arranging step (step P4) has a function of inputting a numerical value 102a from a position corresponding to the initial W or the kanji code index 106c, and is a processing step mainly executed by the BASE arranging means 102. .

【０１１５】登録値判断工程（ステップＰ５）は、第２
配列工程としてのＢＡＳＥ配列工程１０２（ステップＰ
４）が生成した数値１０２ａが、頻出する語頭文字コー
ドの追番なのか、そうでない語頭文字コードの指標なの
か、または文字列の途中にある語頭Ｗの指標１０６ｃな
のかを判定し、トライを構成する指標の範囲を超えた指
標が与えられた場合にこの指標を、頻出文字コードの追
番２０２ａとして出力すると同時に、ＢＡＳＥ配列工程
１０２（ステップＰ４）が生成した数値１０２ａが語頭
Ｗの指標１０６ｃでなくかつ頻出文字コードでない場合
に平行移動量２０２ｂを出力する機能を有し、登録値判
断手段２０２が中心になって実行する処理工程である。The registered value determining step (step P5)
BASE arranging step 102 (step P
It is determined whether the numerical value 102a generated in 4) is an additional number of a frequently occurring initial character code, an index of an initial character code that is not so, or an index 106c of an initial W code in the middle of a character string, When an index exceeding the range of the index constituting the trie is given, this index is output as the additional number 202a of the frequently appearing character code, and at the same time, the numerical value 102a generated by the BASE array process 102 (step P4) is This is a processing step that has a function of outputting the parallel movement amount 202b when the index is not the index 106c and is not a frequently appearing character code, and is executed mainly by the registered value determination unit 202.

【０１１６】コード分類工程（ステップＰ６）は、語頭
の頻出漢字コードに連なる文字を分類するために、ＢＡ
ＳＥ配列工程１０２（ステップＰ４）から入力した数値
が頻出する語頭文字コードの追番がＢＡＳＥ配列手段か
ら入力された場合、この先頭漢字に連なる漢字コードの
漢字コード中の幾つかのビットで分類する機能を有し、
コード分類手段１０７が中心になって実行する処理工程
である。In the code classification step (step P6), a BA is used to classify characters connected to the frequent kanji code at the beginning of the word.
When an additional number of the initial character code in which the numerical value frequently input from the SE array process 102 (step P4) is input from the BASE array means, classification is performed using several bits in the kanji code of the kanji code linked to the first kanji. Has the function of
This is a processing step mainly executed by the code classification unit 107.

【０１１７】第２配列工程としての第２ＢＡＳＥ配列工
程（ステップＰ７）は、登録値判断工程２０２（ステッ
プＰ５）が生成した頻出文字コードの追番２０２ａと
し、漢字コードｂのコード値の分類に相当する箇所か
ら、平行移動量１０５ａを格納する機能を有し、第２Ｂ
ＡＳＥ配列１０５が中心になって実行する処理工程であ
る。The second BASE arranging step (step P7) as the second arranging step is an additional number 202a of the frequently appearing character code generated by the registered value judging step 202 (step P5), which corresponds to the classification of the code value of the kanji code b. From the location where the parallel movement amount 105a is stored.
This is a processing step mainly executed by the ASE array 105.

【０１１８】平行移動量格納工程（ステップＰ７，Ｐ
８）は、ＢＡＳＥ配列工程１０２（ステップＰ４）が生
成した数値１０２ａが頻出でない語頭文字コードの指標
または語頭Ｗの指標１０６ｃである場合に数値１０２ａ
を平行移動量として格納する機能を有し、平行移動量格
納手段１１０が中心になって実行する処理工程である。Parallel movement amount storing step (steps P7, P
8) is a numerical value 102a when the numerical value 102a generated by the BASE arranging step 102 (step P4) is an index of an infrequent initial character code or an index 106c of an initial W character.
Is a processing step that has a function of storing as a parallel movement amount, and is executed mainly by the parallel movement amount storage unit 110.

【０１１９】鍵候補地点算出工程１１１（ステップＰ
９）は、ＣＨＥＣＫ配列上の平行移動量と連なる文字ｂ
のコード値との和を算出して出力する機能を有し、が中
心になって実行する処理工程である。The key candidate point calculation step 111 (step P
9) is a character b connected to the amount of translation on the CHECK array
This is a processing step that has a function of calculating and outputting the sum with the code value of the above, and is mainly executed.

【０１２０】第１配列工程としてのＣＨＥＣＫ配列工程
（ステップＰ９）は、鍵候補地点算出工程１１１（ステ
ップＰ９）からの和１１１ａに相当する箇所にキーを入
力する機能を有し、鍵候補地点算出手段１１１が中心に
なって実行する処理工程である。The CHECK arrangement step (step P9) as the first arrangement step has a function of inputting a key to a place corresponding to the sum 111a from the key candidate point calculation step 111 (step P9). This is a processing step mainly executed by the means 111.

【０１２１】鍵・語頭照合工程（ステップＰ１０，ｐ１
１，Ｐ１２）は、第１配列工程としてのＣＨＥＣＫ配列
工程１１２（ステップＰ９）が生成したキーが、語頭文
字コードの指標または語頭Ｗの指標１０６ｃに等しいか
を判断し、入力したキーが語頭文字コードの指標または
語頭Ｗの指標１０６ｃに等しいと判断した場合に熟語が
辞書に登録されていると判断する機能を有し、鍵・語頭
照合手段２０３が中心になって実行する処理工程であ
る。Key / Initial Verification Step (Steps P10 and p1)
1, P12) determines whether the key generated by the CHECK arrangement step 112 (step P9) as the first arrangement step is equal to the index of the initial character code or the index 106c of the initial W, and A processing step that has a function of determining that an idiom is registered in the dictionary when it is determined that it is equal to the index of the initial code or the index 106c of the initial W, and which is executed mainly by the key / initial verification means 203. is there.

【０１２２】以上説明したように、図５に示す文字コー
ド登録探索方法の実施形態によれば、従来の高速、低容
量の辞書データ構造としての一次元配列であるダブル配
列構造を更に発展させた新たなデータ構造として、鍵候
補地点算出工程１１１（ステップＰ９）からの和１１１
ａに相当する箇所にキーを入力する第１配列工程として
のＣＨＥＣＫ配列工程１１２（ステップＰ９）と、語頭
Ｗまたは漢字コードの指標１０６ｃに相当する箇所から
数値１０２ａを入力する第２配列工程としてのＢＡＳＥ
配列工程１０２（ステップＰ４）と、登録値判断工程２
０２（ステップＰ５）が生成した頻出文字コードの追番
２０２ａとし、漢字コードｂのコード値の分類に相当す
る箇所から、平行移動量１０５ａを格納するための第２
配列工程としての第２ＢＡＳＥ配列工程（ステップＰ
７）とを有する新たなデータ構造を導入することによ
り、キー集合が予め分かっているような準静的キー集合
を検索対象として格納した辞書を構成し、後で適宜キー
を追加登録して拡張するようなトライ配列構造の記憶容
量を極力小さくすることができるようになる。その結
果、なるべく記憶容量の小さいデータ構造としての一次
元配列であるダブル配列構造（すなわち、トライ配列構
造）に格納し、このトライ配列構造を検索キーを用いて
高速にパターンマッチングすることができるようにな
る。（第３実施形態）図６（ａ）は、頻出漢字コード（連な
る漢字コード数が多いもの）に基づいて、図１または図
２の文字コード登録探索装置で実行される特定の文字の
選択動作を説明するための図であり、連なる漢字コード
のコード値の幅に基づいて、図１または図２の文字コー
ド登録探索装置で実行される特定の文字の選択動作を説
明するための図である。As described above, according to the embodiment of the character code registration search method shown in FIG. 5, the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, is further developed. As a new data structure, the sum 111 from the key candidate point calculation step 111 (step P9)
A CHECK arrangement step 112 (step P9) as a first arrangement step of inputting a key at a position corresponding to a, and a second arrangement step of inputting a numerical value 102a from a position corresponding to an initial letter W or a kanji code index 106c. BASE
Arrangement step 102 (step P4) and registered value determination step 2
02 (step P5) is used as the additional number 202a of the frequent character code generated, and a second translation amount 105a is stored from a location corresponding to the classification of the code value of the kanji code b.
Second BASE arranging step (step P
7), a dictionary is stored in which a quasi-static key set whose key set is known in advance is stored as a search target, and the key is added and registered as needed to expand the dictionary. This makes it possible to minimize the storage capacity of the tri-array structure. As a result, the data is stored in a double array structure (that is, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible, and the tri-array structure can be subjected to high-speed pattern matching using a search key. become. (Third Embodiment) FIG. 6A shows an operation of selecting a specific character executed by the character code registration / search apparatus of FIG. 1 or FIG. 2 based on frequent kanji codes (those having a large number of consecutive kanji codes). FIG. 5 is a diagram for explaining a specific character selecting operation performed by the character code registration and search device of FIG. 1 or FIG. 2 based on the width of a code value of a series of kanji codes. .

【０１２３】第３実施形態の文字コード登録探索方法
は、第１実施形態または第２実施形態の文字コード登録
探索方法に加えて、文字列の語頭に連なる特定の文字が
熟語であった場合に文字列の語頭に連なる特定の文字と
して、連なる漢字コードのコード値の幅が所定いき値以
上となる文字を選出する工程実行している点に特徴を有
している。The character code registration search method according to the third embodiment is different from the character code registration search method according to the first or second embodiment in that a specific character connected to the beginning of a character string is an idiom. The method is characterized in that a step of selecting a character having a code value width of a continuous kanji code equal to or greater than a predetermined threshold value as a specific character connected to the beginning of a character string is performed.

【０１２４】これにより、従来方式で埋められなかった
空きの箇所もより多く埋められることになり、ＢＡＳＥ
配列工程１０２（ステップＰ４）で作成されるＢＡＳＥ
配列及びＣＨＥＣＫ配列工程１１２（ステップＰ９）で
作成されるＣＨＥＣＫ配列の両配列の増大も適度に抑え
られる。しかも、処理回数は第２ＢＡＳＥ配列工程（ス
テップＰ７）を参照するため、１回増えるだけであるの
で、ほぼ同じ処理回数で済む。As a result, more vacant portions that cannot be filled by the conventional method can be filled more.
BASE created in arrangement process 102 (step P4)
An increase in both the array and the CHECK array created in the CHECK array step 112 (step P9) is also moderately suppressed. In addition, since the number of times of processing refers to the second BASE arranging step (step P7), the number of times of processing is increased only once, so that the number of times of processing is almost the same.

【０１２５】以上説明したように、本発明によれば、元
のＢＡＳＥ配列及びＣＨＥＣＫ配列の大きさをそれぞれ
６４ＫＷとして、更に第２ＢＡＳＥ配列は選ばれた漢字
コードが０．５ＫＷ、分類が３通りとすると、１．５
（＝０．５×３）ＫＷとなる。これは、元のＢＡＳＥ配
列及びＣＨＥＣＫ配列の大きさの１／６４でしかない。
一方、従来方式による、ＢＡＳＥ配列及びＣＨＥＣＫ配
列の大きさの増大は不明である。しかし、本発明によ
り、従来方式で埋められなかった空きの箇所もより多く
埋められることになり、ＢＡＳＥ配列及びＣＨＥＣＫ配
列の両配列の増大も適度に抑えられる。しかも、処理回
数は第２ＢＡＳＥ配列を参照するため、１回増えるだけ
で、ほぼ同じである。As described above, according to the present invention, the size of the original BASE array and the size of the CHECK array are each 64 KW, and the second BASE array is such that the selected kanji code is 0.5 KW and the classification is three. Then 1.5
(= 0.5 × 3) KW. This is only 1/64 of the size of the original BASE and CHECK sequences.
On the other hand, the increase in the size of the BASE array and the CHECK array according to the conventional method is unknown. However, according to the present invention, more vacant portions not filled by the conventional method are filled more, and the increase of both the BASE array and the CHECK array is appropriately suppressed. In addition, since the number of times of processing refers to the second BASE array, the number of times of processing is increased by one, and is substantially the same.

【０１２６】[0126]

【発明の効果】請求項１に記載の発明によれば、従来の
高速、低容量の辞書データ構造としての一次元配列であ
るダブル配列構造を更に発展させた新たなデータ構造を
導入することにより、ＣＨＥＣＫ配列上で互いに重なる
ようにして各文字コードに空いた登録箇所を与えること
ができるようになり、その結果、ＣＨＥＣＫ配列を極力
拡張することなく、キーとしての全ての文字コードをＣ
ＨＥＣＫ配列上の空きに同時に登録でき、またＣＨＥＣ
Ｋ配列を極力拡張することなく、ある文字コードに連な
る各文字コードの相対位置関係を維持したままでＣＨＥ
ＣＫ配列に登録できるようになり、更に加えて、登録で
きずにスパースが多く空くことをできるだけ回避するこ
とができるようになる。これにより、キー集合が予め分
かっているような準静的キー集合を検索対象として格納
した辞書を構成し、後で適宜キーを追加登録して拡張す
るようなトライ配列構造の記憶容量を極力小さくするこ
とができるようになる。According to the first aspect of the present invention, a new data structure is introduced by further developing the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. , Vacant registration points can be given to the respective character codes by overlapping each other on the CHECK array. As a result, all character codes as keys can be replaced with C characters without expanding the CHECK array as much as possible.
You can register in the empty space on the HECK array at the same time.
Without expanding the K array as much as possible, the CHE is maintained while maintaining the relative positional relationship of each character code connected to a certain character code.
It becomes possible to register in the CK sequence, and in addition, it is possible to avoid as much as possible sparse vacancies due to failure to register. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【０１２７】請求項２に記載の発明によれば、請求項１
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、平行移動量格納手
段からの平行移動量と平行移動量計算手段からの部類の
各文字コードにおける内部設定値をＣＨＥＣＫ配列の語
頭の指標の箇所に登録する第１配列としてのＣＨＥＣＫ
配列手段と、選出されたた文字コードの追番を算出する
と共に、ＢＡＳＥ配列上の同文字コードの指標に格納す
る第２配列としてのＢＡＳＥ配列手段と、コード分類手
段からのコード値と一覧表手段からの追番とに基づい
て、平行移動量格納手段が出力する各部類毎の平行移動
量を格納するための第２配列としての第２ＢＡＳＥ配列
とを有する新たなデータ構造を導入し、ＢＡＳＥ配列に
登録される値を２種類とし、一方に値を従来通りの平行
移動量として余り多く用いられない頻度の低い文字コー
ドに適用し、他方の値を第２ＢＡＳＥ配列の添字のいず
れか一つの添字として頻出文字コードに適用し、第２Ｂ
ＡＳＥ配列の添字を頻出文字コードに連なる文字のコー
ド値に応じて３種類に分け、それぞれ独自の平行移動量
を与えることにより、ＣＨＥＣＫ配列上で互いに重なる
ようにして各文字コードに空いた登録箇所を与えること
できるようになり、その結果、ＣＨＥＣＫ配列を極力拡
張することなく、キーとしての全ての文字コードをＣＨ
ＥＣＫ配列上の空きに同時に登録できるようになり、ま
たＣＨＥＣＫ配列を極力拡張することなく、ある文字コ
ードに連なる各文字コードの相対位置関係を維持したま
までＣＨＥＣＫ配列に登録できるようになり、更に加え
て、登録できずにスパースが多く空くことをできるだけ
回避することができるようになる。これにより、キー集
合が予め分かっているような準静的キー集合を検索対象
として格納した辞書を構成し、後で適宜キーを追加登録
して拡張するようなトライ配列構造の記憶容量を極力小
さくすることができるようになる。According to the invention described in claim 2, according to claim 1
In addition to the effects described in the above, as a new data structure further developed from the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, the translation amount from the translation amount storage means CHECK as a first array for registering an internal set value in each character code of the class from the parallel movement amount calculating means at the index position at the beginning of the CHECK array
Arranging means, BASE array means as a second array for calculating the serial number of the selected character code, and storing in the index of the same character code on the BASE array, code values and a list from the code classifying means A new data structure having a second BASE array as a second array for storing the translation amount of each class output from the translation amount storage means based on the serial number from the means is introduced. Two types of values are registered in the array, one is applied to a character code that is not frequently used as a conventional translation amount, and the other value is one of the subscripts of the second BASE array. Applied to frequent character codes as subscripts,
The subscripts of the ASE array are divided into three types according to the code values of the characters linked to the frequently appearing character codes, and each of them is given its own parallel movement amount, so that the registration positions are vacant for each character code so that they overlap each other on the CHECK array Can be given, and as a result, all character codes as keys can be changed to CH without expanding the CHECK array as much as possible.
It becomes possible to register simultaneously in the empty space on the ECK array, and it is possible to register in the CHECK array while maintaining the relative positional relationship of each character code connected to a certain character code without expanding the CHECK array as much as possible. In addition, it is possible to avoid as much as possible sparse vacancies due to registration failure. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【０１２８】請求項３に記載の発明によれば、請求項１
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、鍵候補地点算出手
段からの和に相当する箇所にキーを入力するＣＨＥＣＫ
配列手段と、語頭または文字コードの指標に相当する箇
所から数値を入力する第２配列としてのＢＡＳＥ配列手
段と、登録値判断手段２０２から出力された頻出文字コ
ードの追番とし、文字コードのコード値の分類に相当す
る箇所から、平行移動量を格納するための第２配列とし
ての第２ＢＡＳＥ配列とを有する新たなデータ構造を導
入することにより、キー集合が予め分かっているような
準静的キー集合を検索対象として格納した辞書を構成
し、後で適宜キーを追加登録して拡張するようなトライ
配列構造の記憶容量を極力小さくすることができるよう
になる。その結果、なるべく記憶容量の小さいデータ構
造としての一次元配列であるダブル配列構造（すなわ
ち、トライ配列構造）に格納し、このトライ配列構造を
検索キーを用いて高速にパターンマッチングすることが
できるようになる。According to the third aspect of the present invention, the first aspect
In addition to the effects described in (1), as a new data structure obtained by further developing the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, it corresponds to the sum from the key candidate point calculation means. CHECK to enter the key in the place
Arranging means, BASE arranging means as a second array for inputting a numerical value from a position corresponding to the beginning of a word or an index of a character code, and a character code code By introducing a new data structure having a second base array as a second array for storing the amount of translation from a position corresponding to the value classification, a quasi-static method in which a key set is known in advance is provided. A dictionary in which a key set is stored as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate can be minimized. As a result, the data is stored in a double array structure (that is, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible, and the tri-array structure can be subjected to high-speed pattern matching using a search key. become.

【０１２９】請求項４に記載の発明によれば、従来の高
速、低容量の辞書データ構造としての一次元配列である
ダブル配列構造を更に発展させた新たなデータ構造とし
て、キー検索対象となる各文字列の語頭の指標を添字と
する第１配列工程と、第１配列工程で示された文字列の
内で文字列の語頭に連なる特定の文字に関する情報を登
録した第２配列工程と、平行移動量計算手段が計算した
キー検索対象となる各文字列の文字を第１配列工程と第
２配列工程とに登録するのに必要な平行移動量と文字列
の語尾に連なる文字に相当する値との和を添字として用
いて文字列の語頭の指標を登録した第２配列工程とを用
いて形成する新たなデータ構造を導入することにより、
ＣＨＥＣＫ配列上で互いに重なるようにして各文字コー
ドに空いた登録箇所を与えることができるようになり、
その結果、ＣＨＥＣＫ配列を極力拡張することなく、キ
ーとしての全ての文字コードをＣＨＥＣＫ配列上の空き
に同時に登録でき、またＣＨＥＣＫ配列を極力拡張する
ことなく、ある文字コードに連なる各文字コードの相対
位置関係を維持したままでＣＨＥＣＫ配列に登録できる
ようになり、更に加えて、登録できずにスパースが多く
空くことをできるだけ回避することができるようにな
る。これにより、キー集合が予め分かっているような準
静的キー集合を検索対象として格納した辞書を構成し、
後で適宜キーを追加登録して拡張するようなトライ配列
構造の記憶容量を極力小さくすることができるようにな
る。According to the fourth aspect of the present invention, a key search target is obtained as a new data structure obtained by further developing the conventional double array structure which is a one-dimensional array as a high-speed, low-capacity dictionary data structure. A first arranging step in which an index at the beginning of each character string is a subscript, a second arranging step in which information on a specific character connected to the beginning of the character string in the character string indicated in the first arranging step is registered, The translation amount required for registering the character of each character string to be a key search target calculated by the translation amount calculating means in the first arrangement step and the second arrangement step corresponds to the character connected to the end of the character string. By introducing a new data structure formed by using the second arrangement step in which the index of the beginning of the character string is registered using the sum of the value and the index as a subscript,
It becomes possible to give an empty registration place to each character code by overlapping each other on the CHECK array,
As a result, all character codes as keys can be simultaneously registered in a space on the CHECK array without expanding the CHECK array as much as possible, and the relative values of each character code connected to a certain character code can be registered without expanding the CHECK array as much as possible. It becomes possible to register in the CHECK array while maintaining the positional relationship, and in addition, it is possible to avoid as much as possible sparse vacancies due to registration failure. Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured,
The storage capacity of a tri-array structure in which a key is added and registered later to expand the memory can be reduced as much as possible.

【０１３０】請求項５に記載の発明によれば、請求項４
に記載の効果に加えて、従来の高速、低容量の辞書デー
タ構造としての一次元配列であるダブル配列構造を更に
発展させた新たなデータ構造として、キー検索対象とな
る各文字列の語頭の指標を添字とする第１配列工程と、
第１配列工程で示された文字列の内で文字列の語頭に連
なる特定の文字に関する情報を登録した第２配列工程
と、平行移動量計算手段が計算したキー検索対象となる
各文字列の文字を第１配列工程と第２配列工程とに登録
するのに必要な平行移動量と文字列の語尾に連なる文字
に相当する値との和を添字として用いて文字列の語頭の
指標を登録した第２配列工程とを用いて形成する新たな
データ構造を導入し、第１配列工程において登録されて
いる登録内容を、熟語の先頭に位置する特定の文字コー
ドの追番、他の先頭の文字コード、または文字列の語頭
の平行移動量のいずれかに識別し、第１配列工程に登録
されている登録内容が先頭特定文字コードの追番である
と識別された場合に第２配列工程における追番の指示す
る配列箇所を参照して平行移動量を得る識別工程を実行
することにより、ＣＨＥＣＫ配列上で互いに重なるよう
にして各文字コードに空いた登録箇所を与えることがで
きるようになり、その結果、ＣＨＥＣＫ配列を極力拡張
することなく、キーとしての全ての文字コードをＣＨＥ
ＣＫ配列上の空きに同時に登録でき、またＣＨＥＣＫ配
列を極力拡張することなく、ある文字コードに連なる各
文字コードの相対位置関係を維持したままでＣＨＥＣＫ
配列に登録できるようになり、更に加えて、登録できず
にスパースが多く空くことをできるだけ回避することが
できるようになる。これにより、キー集合が予め分かっ
ているような準静的キー集合を検索対象として格納した
辞書を構成し、後で適宜キーを追加登録して拡張するよ
うなトライ配列構造の記憶容量を極力小さくすることが
できるようになり、従来の各文字コード毎に一律にＣＨ
ＥＣＫ配列上の平行移動量を与える場合に比べ、ＣＨＥ
ＣＫ配列の増大を抑えることができ、ダブル配列の空間
的効率化を図ることができる。According to the invention described in claim 5, according to claim 4,
In addition to the effects described in (1), as a new data structure that is a further development of the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, the prefix of each character string to be searched for a key is A first arrangement step using the index as a subscript,
A second arrangement step in which information relating to a specific character connected to the beginning of the character string in the character string indicated in the first arrangement step is registered; and a key search target character string calculated by the translation amount calculating means. Registers the index of the beginning of the character string using the sum of the amount of translation required for registering the character in the first arrangement step and the second arrangement step and the value corresponding to the character connected to the end of the character string as a subscript A new data structure formed using the second arrangement step described above is introduced, and the registered contents registered in the first arrangement step are added to a specific character code located at the beginning of the idiom, If the registered content registered in the first arrangement step is identified as either a character code or a parallel translation amount of the beginning of a character string, and if the registered content registered in the first arrangement step is an additional number of the first specific character code, the second arrangement step Refer to the sequence number indicated by the serial number in By executing the identification process for obtaining the amount of parallel movement, it becomes possible to give an empty registration place to each character code so as to overlap each other on the CHECK array, and as a result, without expanding the CHECK array as much as possible. , All character codes as keys are CHE
CHECK can be registered in the empty space on the CK array at the same time, and the CHECK array can be maintained without expanding the CHECK array as much as possible while maintaining the relative position of each character code connected to a certain character code
It becomes possible to register in an array, and in addition, it is possible to avoid as much as possible sparse vacancies due to registration failure. Thus, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and a storage capacity of a tri-array structure in which a key is registered and expanded later as appropriate is minimized. CH can be changed for each character code.
Compared to the case of giving the amount of translation on the ECK array,
An increase in the CK arrangement can be suppressed, and the spatial efficiency of the double arrangement can be improved.

【０１３１】請求項６乃至８に記載の発明によれば、請
求項５に記載の効果と同様の効果を奏する。According to the inventions set forth in claims 6 to 8, the same effects as the effects set forth in claim 5 can be obtained.

【０１３２】請求項１０に記載の発明によれば、請求項
４に記載の効果に加えて、従来の高速、低容量の辞書デ
ータ構造としての一次元配列であるダブル配列構造を更
に発展させた新たなデータ構造として、平行移動量格納
工程が生成した平行移動量と平行移動量計算工程が生成
した部類の各文字コードにおける内部設定値をＣＨＥＣ
Ｋ配列の語頭の指標の箇所に登録する第１配列工程とし
てのＣＨＥＣＫ配列工程と、選出されたた文字コードの
追番を算出すると共に、ＢＡＳＥ配列上の同文字コード
の指標に格納する第２配列工程としてのＢＡＳＥ配列工
程と、平行移動量格納工程が出力する各部類毎の平行移
動量を格納するための第２配列工程としての第２ＢＡＳ
Ｅ配列工程と実行して作成した新たなデータ構造を導入
し、ＢＡＳＥ配列に登録される値を２種類とし、一方に
値を従来通りの平行移動量として余り多く用いられない
頻度の低い文字コードに適用し、他方の値を第２ＢＡＳ
Ｅ配列の添字のいずれか一つの添字として頻出文字コー
ドに適用し、第２ＢＡＳＥ配列の添字を頻出文字コード
に連なる文字のコード値に応じて３種類に分け、それぞ
れ独自の平行移動量を与えることにより、ＣＨＥＣＫ配
列上で互いに重なるようにして各文字コードに空いた登
録箇所を与えることできるようになり、その結果、ＣＨ
ＥＣＫ配列を極力拡張することなく、キーとしての全て
の文字コードをＣＨＥＣＫ配列上の空きに同時に登録で
きるようになり、またＣＨＥＣＫ配列を極力拡張するこ
となく、ある文字コードに連なる各文字コードの相対位
置関係を維持したままでＣＨＥＣＫ配列に登録できるよ
うになり、更に加えて、登録できずにスパースが多く空
くことをできるだけ回避することができるようになる。
これにより、キー集合が予め分かっているような準静的
キー集合を検索対象として格納した辞書を構成し、後で
適宜キーを追加登録して拡張するようなトライ配列構造
の記憶容量を極力小さくすることができるようになる。According to the tenth aspect, in addition to the effect of the fourth aspect, the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, is further developed. As a new data structure, the internal setting value of each character code of the parallel movement amount generated by the parallel movement amount storing step and the class generated by the parallel movement amount calculating step is CHEC.
A CHECK arrangement step as a first arrangement step to be registered at the index position at the beginning of the K arrangement, a second number of the selected character code is calculated, and a second number is stored in the index of the same character code on the BASE arrangement. A BASE arranging step as an arranging step and a second BASE as a second arranging step for storing a parallel moving amount for each class output by the parallel moving amount storing step
Introduces a new data structure created by executing the E array process and makes two types of values registered in the BASE array. One of them is a character code that is not frequently used as a parallel translation amount. To the second BAS
Apply the frequently occurring character code as one of the subscripts of the E array, divide the subscript of the second BASE array into three types according to the code value of the character connected to the frequently occurring character code, and give each unique translation amount Thus, it becomes possible to provide an empty registration location for each character code so that they overlap each other on the CHECK array, and as a result, CH
All character codes as keys can be registered in the CHECK array at the same time without expanding the ECK array as much as possible, and the relative values of each character code linked to a certain character code can be registered without expanding the CHECK array as much as possible. It becomes possible to register in the CHECK array while maintaining the positional relationship, and in addition, it is possible to avoid as much as possible sparse vacancies due to registration failure.
Thereby, a dictionary storing a quasi-static key set whose key set is known in advance as a search target is configured, and the storage capacity of a tri-array structure in which keys are registered and expanded later as appropriate is reduced as much as possible. Will be able to

【０１３３】請求項１１に記載の発明によれば、請求項
４に記載の効果に加えて、従来の高速、低容量の辞書デ
ータ構造としての一次元配列であるダブル配列構造を更
に発展させた新たなデータ構造として、鍵候補地点算出
工程からの和に相当する箇所にキーを入力する第１配列
工程としてのＣＨＥＣＫ配列工程と、語頭または文字コ
ードの指標に相当する箇所から数値を入力する第２配列
工程としてのＢＡＳＥ配列工程と、登録値判断工程が生
成した頻出文字コードの追番とし、文字コードのコード
値の分類に相当する箇所から、平行移動量を格納するた
めの第２配列工程としての第２ＢＡＳＥ配列工程とを有
する新たなデータ構造を導入することにより、キー集合
が予め分かっているような準静的キー集合を検索対象と
して格納した辞書を構成し、後で適宜キーを追加登録し
て拡張するようなトライ配列構造の記憶容量を極力小さ
くすることができるようになる。その結果、なるべく記
憶容量の小さいデータ構造としての一次元配列であるダ
ブル配列構造（すなわち、トライ配列構造）に格納し、
このトライ配列構造を検索キーを用いて高速にパターン
マッチングすることができるようになる。According to the eleventh aspect of the present invention, in addition to the effect of the fourth aspect, the conventional double array structure, which is a one-dimensional array as a high-speed, low-capacity dictionary data structure, is further developed. As a new data structure, a CHECK arrangement step as a first arrangement step of inputting a key to a place corresponding to the sum from the key candidate point calculation step, and a second step of inputting a numerical value from a place corresponding to an index of a head or a character code. A second arraying step for storing a parallel movement amount from a position corresponding to a code value classification of a character code as a serial number of a frequently occurring character code generated by a BASE arraying step as a two-arraying step and a registered value judging step; A dictionary storing a quasi-static key set whose key set is known in advance as a search target by introducing a new data structure having a second BASE arrangement process as Configured, so the storage capacity of the tri-array structured to extend additionally registered later appropriate key can be minimized. As a result, the data is stored in a double-array structure (ie, a tri-array structure), which is a one-dimensional array as a data structure with as small a storage capacity as possible,
This trie array structure can be subjected to high-speed pattern matching using a search key.

【０１３４】請求項１２に記載の発明によれば、請求項
１０または１１に記載の効果に加えて、従来方式で埋め
られなかった空きの箇所もより多く埋められることにな
り、ＢＡＳＥ配列工程で作成されるＢＡＳＥ配列及びＣ
ＨＥＣＫ配列工程で作成されるＣＨＥＣＫ配列の両配列
の増大も適度に抑えられる。しかも、処理回数は第２Ｂ
ＡＳＥ配列工程を参照するため、１回増えるだけである
ので、ほぼ同じ処理回数で済む。According to the twelfth aspect of the present invention, in addition to the effect of the tenth or eleventh aspect, more vacant portions that have not been filled by the conventional method are buried. BASE array to be created and C
The increase in both sequences of the CHECK sequence prepared in the HECK sequence step is also appropriately suppressed. Moreover, the number of processing is 2B
Since the number is increased only once because the ASE arrangement step is referred to, the number of processes is almost the same.

[Brief description of the drawings]

【図１】図１（ａ）は、本発明の文字コード登録探索装
置、及び文字コード登録探索方法で適用される新たなデ
ータ構造において、余り多く用いられない（頻度の低
い）文字コードに適用する平行移動量並びに頻出度また
は追番が登録されたＢＡＳＥ配列の基本構造を示し、図
１（ｂ）は、本発明の新たなデータ構造において、頻出
文字コードについて用いるの平行移動量が登録される第
２ＢＡＳＥ配列の基本構造を示し、図１（ｃ）は、本発
明の新たなデータ構造において、図１（ｂ）の添字に対
応して頻出文字コードが登録されるＣＨＥＣＫ配列の基
本構造を説明している。FIG. 1A is a diagram illustrating a character code registration / search apparatus and a character code registration / search method according to the present invention, in which a new data structure is applied to a character code that is not used frequently (infrequently). FIG. 1B shows the basic structure of a BASE array in which the amount of parallel movement and the frequency of occurrence or the serial number are registered. FIG. 1B shows the new data structure of the present invention in which the amount of parallel movement used for frequently appearing character codes is registered. FIG. 1C shows the basic structure of a CHECK array in which frequently appearing character codes are registered corresponding to the subscripts of FIG. 1B in the new data structure of the present invention. Explain.

【図２】本発明の文字コード登録探索装置の第１実施形
態を説明するための機能ブロック図である。FIG. 2 is a functional block diagram illustrating a first embodiment of a character code registration and search device according to the present invention.

【図３】図２の文字コード登録探索装置で実行される文
字コード登録探索方法を用いて文字コードの登録を行う
場合の一実施形態を説明するためのフローチャートであ
る。FIG. 3 is a flowchart for explaining an embodiment in which a character code is registered using a character code registration search method executed by the character code registration search device of FIG. 2;

【図４】本発明の文字コード登録探索装置の第２実施形
態を説明するための機能ブロック図である。FIG. 4 is a functional block diagram for explaining a second embodiment of the character code registration and search device of the present invention.

【図５】図４の文字コード登録探索装置で実行される文
字コード登録探索方法を用いて文字コードの登録を行う
場合の一実施形態を説明するためのフローチャートであ
る。FIG. 5 is a flowchart illustrating an embodiment in which a character code is registered using the character code registration and search method executed by the character code registration and search device of FIG. 4;

【図６】図６（ａ）は、頻出文字コード（連なる文字コ
ード数が多いもの）に基づいて、図１または図２の文字
コード登録探索装置で実行される特定の文字の選択動作
を説明するための図であり、連なる文字コードのコード
値の幅に基づいて、図１または図２の文字コード登録探
索装置で実行される特定の文字の選択動作を説明するた
めの図である。FIG. 6 (a) illustrates a specific character selecting operation performed by the character code registration and search apparatus of FIG. 1 or FIG. 2 based on frequently occurring character codes (those having a large number of consecutive character codes). FIG. 3 is a diagram for explaining a specific character selecting operation performed by the character code registration and search device of FIG. 1 or FIG. 2 based on the width of code values of consecutive character codes.

【図７】２バイト文字符号の領域を説明するための図で
あって、図７（ａ）は、７ビットコードの領域（日本語
のＪＩＳコードの領域、中国語のＧＢ２３１２−８０の
７ビットコード領域）であり、図７（ｂ）は、８ビット
コードの領域（日本語のＥＵＣコードの領域、中国語の
ＧＢ２３１２−８０の８ビットコード領域）である。7A and 7B are diagrams for explaining a 2-byte character code area. FIG. 7A shows a 7-bit code area (Japanese JIS code area, Chinese GB2312-80 7-bit code area). FIG. 7B shows an 8-bit code area (Japanese EUC code area, Chinese GB2312-80 8-bit code area).

【図８】従来技術を説明するための図であって、図８
（ａ）は、ダブル配列構造のＢＡＳＥ配列及びＣＨＥＣ
Ｋ配列の二つの一次元配列、及びこれらのＢＡＳＥ配列
内容及びＣＨＥＣＫ配列内容を示し、図８（ｂ）は、ｂ
ａｂｙ＃，ｂａｃｈｅｌｏｒ＃，ｂａｄｇｅｒ＃，ｂａ
ｄｇｅ＃，ｊａｒ＃が格納されているトライ（ｔｒｉ
ｅ）構造（状態遷移図）を示し、図８（ｃ）は、トライ
構造中の親子のノード関係、ダブル配列による探索の動
作を説明するための図である。FIG. 8 is a diagram for explaining the prior art, and FIG.
(A) shows a BASE sequence having a double sequence structure and CHEC
FIG. 8 (b) shows two one-dimensional arrays of K arrays, their BASE array contents and CHECK array contents.
aby #, batchor #, badger #, ba
Trie (tri) in which dge # and jar # are stored
e) A structure (state transition diagram) is shown, and FIG. 8C is a diagram for explaining a parent-child node relationship in a trie structure and a search operation using a double array.

【図９】図９（ａ）は、図８の従来技術において、文字
「電圧」、「電気」、「電車」、「電脳」、「電話」
等、文字「電」に連なる文字コードで構成された熟語
を、ダブル配列構造に追加登録する動作を説明するため
の図であり、図９（ｂ）は、ダブル配列構造の拡大の動
作を説明するための図であり、図９（ｃ）は、図９
（ｂ）のダブル配列構造に対応するトライの拡大の動作
を説明するための図である。FIG. 9 (a) shows characters “voltage”, “electricity”, “train”, “denno”, “telephone” in the prior art of FIG.
FIG. 9B is a diagram for explaining an operation of additionally registering a idiom composed of a character code connected to the character "den" in the double array structure, and FIG. 9B illustrates an operation of expanding the double array structure. FIG. 9 (c) is a diagram for
It is a figure for explaining operation of enlargement of a trie corresponding to a double arrangement structure of (b).

【００００】[0000]

[Explanation of symbols]

１０…文字コード登録探索装置１０１…一覧表手段１０１ｂ…追番１０２…ＢＡＳＥ配列手段（第２配列）１０２ａ…数値１０３…頻出文字コード格納手段１０３ａ…頻出文字コード１０３ｂ…追番１０３ｃ…選択文字コード１０４…頻出文字コード選択手段１０４ａ…頻出文字コード１０５…第２ＢＡＳＥ配列（第２配列）１０６…辞書手段１０６ａ，１０８ａ…語頭の頻出文字コードに連なる文
字コードを分類して得た各部類１０６ｂ…非頻出の語頭文字コードまたは２文字以降の
語頭に連なる文字コードの部類１０６ｃ…頻度文字コードの指標１０７…コード分類手段１０７ａ…コード値１０８…部類格納手段１０９…平行移動量計算手段１０９ａ…平行移動量１１０…平行移動量格納手段１１０ａ…平行移動量１１１…鍵候補地点算出手段１１１ａ…平行移動量の最小値または和１１２…ＣＨＥＣＫ配列手段（第１配列）２０１…文書入力手段２０２…登録値判断手段２０２ａ…頻出文字コードの追番２０２ｂ…平行移動量２０３…鍵・語頭照合手段２０４…単語単位圧縮手段DESCRIPTION OF SYMBOLS 10 ... Character code registration search apparatus 101 ... List means 101b ... Serial number 102 ... BASE array means (second array) 102a ... Numerical value 103 ... Frequently appearing character code storage means 103a ... Frequently appearing character code 103b ... Serial number 103c ... Selected character code 104: Frequent character code selection means 104a ... Frequent character codes 105 ... Second BASE array (second array) 106 ... Dictionary means 106a, 108a ... Classes obtained by classifying character codes linked to frequently occurring character codes at the beginning of words 106b ... Class of frequently occurring initial character codes or character codes linked to the initials of the second and subsequent characters 106c Index of frequency character codes 107 Code classification means 107a Code values 108 Class storage means 109 Parallel movement amount calculation means 109a Parallel movement Amount 110: parallel movement amount storage means 110a: parallel movement amount 11 ... key candidate point calculation means 111a ... minimum value or sum of parallel movement amount 112 ... CHECK arrangement means (first arrangement) 201 ... document input means 202 ... registration value judgment means 202a ... frequently added character code serial number 202b ... parallel movement amount 203: Key / initial verification unit 204: Word unit compression unit

Claims

[Claims]

1. A character code registration / search apparatus that registers a character code string to be searched for a key in a double array structure that is a one-dimensional array as a data structure, and searches for the character string. A translation amount calculating means for calculating a translation amount necessary to register a character in a column; a first array having a prefix of a prefix of each character string to be searched for the key as a subscript; An identification means for identifying a registered value in the first array; a second array in which information on a specific character connected to the beginning of the character string in the character string shown in the first array is registered; Key candidate point calculating means for calculating a sum of a translation amount registered in the first array and the second array and a value corresponding to a character connected to a suffix which is a suffix of the character string; The sum obtained by means as a subscript And a second array in which the index of the beginning of the character string is registered.

2. A list means for preparing a list of character codes frequently used for idioms and outputting a selected character code selected from the list of character codes, and selecting a character code up to what frequency. A frequent character code selecting means for outputting a frequency threshold regarding whether to do, storing a frequent character code selected from the list means,
A frequent character code storage means for outputting an index of the selected frequent character code and the selected frequency character code, and a character code dictionary in which idioms composed of character codes are registered, wherein the character of interest is the selected character A dictionary unit that divides the work according to whether or not the beginning of a idiom based on the code and outputs a class obtained by classifying a character code connected to a frequently occurring character code at the beginning of the word, and a frequent character code of the beginning of the word from the dictionary unit. Class storing means for storing each class obtained by classifying the character codes connected to the character string; and BA for inputting the serial number of the selected character codes
In order to classify the characters connected to the frequently occurring character code at the beginning of the SE array means,
Code classifying means for classifying the second character code of the idiom by some bits in the character code of the second character code; and The value added to the code value of the code is CHECK
A translation amount calculating means for calculating a minimum translation amount so as to come to a vacant position on the array; storing the translation amount from the translation amount calculating means; Means for storing a translation amount stored in a subscript position corresponding to the index at the beginning of the word, and for each of the classes, the translation amount input from the translation amount calculating means is added to the code value of each character code of the class The sum thus obtained is used as a subscript value on the CHECk array, and the index of the head of the character which is the parent of the character code of the same class is registered at the position of the subscript, and the value of the sum is ((conventional head) + (character of interest))
Key candidate point calculation means as an index of the next word prefix, and each of the parallel movement amount storage means based on the code value from the code classification means and the serial number from the list means. A second BASE array as the second array for storing the translation amount for each class; and the minimum translation amount from the translation amount calculation means as each of the classes from the translation amount calculation means. 2. The character code registration and search device according to claim 1, further comprising: a CHECK arrangement means as the first arrangement for registering an internal set value in a character code at a position of an index at the beginning of the CHECK arrangement.

3. At the beginning, the root of the trie structure is specified as the beginning of a word, and an end mark # as an end symbol is set at the beginning of the word. Subsequently, the input of a character code as a character to be searched is instructed. Document input means for detecting the beginning of a given character code, BASE array means for inputting a numerical value from a position corresponding to the beginning or an index of the character code, and a numerical prefix input from the BASE array means, It is determined whether it is an additional number of the character code, an index of the initial character code that is not the other, or an index of the first letter in the middle of the character string, and an index exceeding the range of the index constituting the trie is given. In this case, this index is output as an additional number of the frequent character code, and when the numerical value input from the BASE arrangement means is not the index at the beginning of the word and not the frequent character code. A registration value determination means for outputting a parallel movement amount in order to classify a character leading to prefix frequently appearing character codes,
Code classification means for classifying with some bits in the character code of the character code connected to the leading kanji when the serial number of the initial character code in which the numerical value inputted from the BASE arrangement means frequently appears is inputted from the BASE arrangement means A second BASE array for storing the parallel movement amount from a location corresponding to the classification of the code value of the character code as an additional number of the frequent character code output from the registered value determination means; When the input numerical value is an index of an infrequent initial character code or an index of an initial word, a translation amount storing means for storing the numerical value as a translation amount, and for each of the classes, an input from the translation amount calculating means. The sum of the translation amount and the code value of each character code of the same class is used as a subscript value on the CHECk array, and the parent of the character code of the same class is placed at the same subscript position. Register an index of the prefix striking, the value of the sum ((conventional prefix) + (character interest))
Key candidate point calculation means as an index of the next word prefix consisting of: a CHECK arrangement means for inputting a key to a place corresponding to the sum from the key candidate point calculation means; and a key input by the CHECK arrangement means, Key / head matching means for judging whether or not the input key is equal to the initial code or initial index, and judging that the idiom is registered in the dictionary when determining that the input key is equal to the initial code or initial index; The character code registration search device according to claim 1, comprising:

4. A method of registering and searching for a character code character string to be searched for a key in a double array structure, which is a one-dimensional array as a data structure, and searching for a character code. A translation amount calculating step of calculating a translation amount necessary for registering a character string; a first arraying step in which an index at the beginning of each character string to be searched for a key is used as a subscript; An identification step of identifying a registration value; and a second step of registering information relating to a specific character connected to the beginning of the character string in the character string indicated in the first arrangement step.
An arrangement step; a key candidate point calculation step of calculating a sum of a translation amount registered in the first arrangement step and the second arrangement step and a value corresponding to a character connected to the end of the character string; A second arrangement step of registering an index at the beginning of the character string using the sum obtained in the key candidate point calculation step as a subscript, and a second arrangement step.

5. The method according to claim 1, wherein the identifying step includes registering the registered contents in the first arrangement step.
A step of identifying one of the serial number I of the specific character code located at the beginning of the idiom, the other leading character code, and the amount of parallel translation of the beginning of the character string; When the registered content is identified as the serial number I of the head specific character code, a step of obtaining a translation amount by referring to an array location indicated by the serial number I in the second arraying step. The character code registration search method according to claim 4, characterized in that:

6. The method according to claim 1, further comprising a step of referring to the second arrangement step based on the additional number I of the leading specific character code of the character string and a classification of a character code connected to the leading specific character code. Item 5. The character code registration search method according to Item 5.

7. The character code registration according to claim 6, wherein the step of classifying a character code linked to the leading specific character code in the second arrangement step uses a code value of the character code. Search method.

8. The method according to claim 4, wherein the second arranging step includes a step of selecting a character frequently used in forming a idiom as a specific character connected to the beginning of the character string. Character code registration search method.

9. In the second arrangement step, when the same character is an idiom, a character having a code value width of a continuous character code equal to or greater than a predetermined threshold is selected as a specific character continuous to the beginning of the character string. 5. The method according to claim 4, further comprising the steps of:

10. A list process for creating a list of character codes frequently used in idioms and outputting a selected character code 103c selected from the list of character codes; A frequent character code selecting step of outputting a frequency threshold regarding whether to select, and storing the frequent character code selected from the list step,
A frequent character code storage step of outputting an index of the selected frequent character code and the selected frequency character code, and a character code dictionary in which idioms composed of character codes are registered, wherein the character of interest is the selected character A dictionary step of dividing each work based on whether or not the beginning of a idiom based on the code and classifying a character code connected to a frequently occurring character code at the beginning of the word, and outputting a class obtained by the dictionary step; A classification result storing step of storing each class obtained by classifying the character codes connected to the codes; calculating the serial number of the selected character codes; and storing the serial numbers in the index of the same character codes in the BASE array A BASE arrangement step as the second arrangement step; and
A code classification step of classifying the second character code of the idiom by some bits in the character code of the second character code; and an arbitrary parallel movement amount 109a for each of the classes 106a.
A translation amount calculating step of calculating a minimum translation amount 109a such that any value obtained by adding to the code value of each character code of the same class 106a comes to an empty place on the CHECK array, A parallel movement amount storing step of storing the parallel movement amount generated by the parallel movement amount calculating step, and storing the parallel movement amount at a subscript position corresponding to the index of the beginning of the BASE arrangement means; The sum of the parallel movement amount input from the parallel movement amount calculation means and the code value of each character code of the same class is used as a subscript value on the CHECk array, and the prefix corresponding to the parent of the character code of the same class at the subscript position Is registered, and the value of the sum is ((conventional prefix) + (character of interest))
A key candidate point calculation step as an index of the next prefix consisting of: the parallel movement amount storage step outputs based on the code value generated by the code classification step and the serial number generated by the list step. A second BASE arrangement step as the second arrangement step for storing the translation amount for each class to be performed; and a translation amount and a translation amount calculation step generated by the translation amount storage step. CHECK as the first arrangement step of registering an internal set value in each character code of the class at the index position at the beginning of the CHECK array
5. The character code registration search method according to claim 4, further comprising a K arrangement step.

11. At the beginning, the root of the trie structure is specified as the beginning of a word, and an end mark # as an end symbol is set at the beginning of the word. Subsequently, an input of a character code as a character to be searched is instructed. A document input step of detecting the beginning of a given character code, a BASE arrangement step of inputting a numerical value from a position corresponding to the beginning or an index of the character code, and a numerical value generated by the BASE arrangement step as the second arrangement step Is the additional number of the initial character code that frequently appears, is it the index of the initial character code that is not, or is it the index of the initial character in the middle of the character string, and determines the range of the index that constitutes the trie When an index exceeding the index is given, this index is output as an additional number of the frequently appearing character code, and the numerical value generated by the BASE arrangement process is not the index at the beginning of the word and frequently appears. A registration value determination step of outputting a parallel movement amount if not character codes, in order to classify a character leading to prefix frequently appearing character codes,
When the additional number of the initial character code in which the numerical value input from the BASE arrangement step frequently appears is input from the BASE arrangement means,
A code classification step of classifying with some bits in the character code of the character code connected to the leading kanji, and an additional number of the frequent character code generated by the registered value determination step, which corresponds to the classification of the code value of the character code. From the point
A second BASE arranging step as the second arranging step for storing the amount of parallel movement; and, if the numerical value generated by the BASE arranging step is an index of an infrequent initial character code or an index of an initial word, the numerical value is parallelized. A parallel movement amount storing step of storing the movement amount as a movement amount, and adding, for each of the classes, the parallel movement amount input from the parallel movement amount calculating means to the code value of each character code of the class, a subscript on the CHECk array. And the index of the beginning of the letter corresponding to the parent of the character code of the same class is registered in the subscript position, and the value of the sum is ((conventional beginning) + (character of interest))
A key candidate point calculation step as an index of the next initial word consisting of: a CHECK arrangement step as the first arrangement step of inputting a key to a place corresponding to the sum from the key candidate point calculation step; and the first arrangement CHECK arrangement as a process It is determined whether the key generated by the process is equal to the initial character code or the index of the initial. If it is determined that the input key is equal to the initial code or the index of the initial, the idiom is added to the dictionary. 5. The character code registration search method according to claim 4, further comprising a key / prefix matching step of determining that the character code has been registered.

12. When a specific character connected to the beginning of a character string is an idiom, a character whose code value width is equal to or larger than a predetermined threshold is specified as a specific character connected to the beginning of the character string. The method according to claim 10 or 11, further comprising a step of selecting.