JPH0335697B2

JPH0335697B2 -

Info

Publication number: JPH0335697B2
Application number: JP58115613A
Authority: JP
Inventors: Katsumi Hayashi
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-06-27
Filing date: 1983-06-27
Publication date: 1991-05-29
Also published as: JPS607557A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、文字型データの区分化圧縮法、特に
データベースのインデツクスを作成する際に用い
られる文字型データの区分化圧縮法に関するもの
である。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a segmented compression method for character type data, and particularly to a segmented compression method for character type data used when creating an index for a database.

[Conventional technology and problems]

区分化圧縮法は、可変長文字テキスト・キー又
はこれが複合キーを構成するときのインデツクス
作成において、データ圧縮を能率よく行うととも
に、目的とするキー値との大小一致比較を早期に
完結されるものである。 The segmented compression method is a method that efficiently compresses data when creating an index when a variable-length character text key or a composite key is constructed, and also quickly completes a size match comparison with the target key value. It is.

データベース管理システムのインデツクス部に
用いられる区分化圧縮されたキーは、単一の文字
キーと形式上同じに扱えるので、前方圧縮を施し
た後、インデツクス部に格納される。 The segmented and compressed key used in the index section of a database management system can be handled in the same way as a single character key, so it is stored in the index section after being subjected to forward compression.

従来の文字型データの区分化圧縮法（参考文献
１を参照）では、コード系によらず通常はブラン
クが最小値に位置付けられるので、それ以下の制
御コード等が包含されてはならず、また、PL／
ＩやＣ言語のデータ型に現れるような可変長デー
タ項目処理での比較は文字列の最後に続くブラン
ク及び制御文字を有効とし、データ長の復元を可
能とするといつたことは不可能であつた。 In conventional segmented compression methods for character type data (see Reference 1), blanks are usually positioned as the minimum value regardless of the code system, so control codes below that value must not be included, and , PL/
Comparisons in variable-length data item processing, such as those that appear in I and C language data types, consider trailing blanks and control characters in character strings valid, making it impossible to restore the data length. Ta.

また、未婚者の旧姓のように適用されないこ
と、統計データでの欠測値のように未知であるこ
となどを表すNullを包含したデータ処理が多く
実現されている。データ項目の定義域の外にある
データ項目の存在は許されず区分化圧縮法の範疇
で処理は不可能であるので、これを実現するには
Nullか否かを示す項目を別に立て、これを複合
キーのデータ項目として先頭に付加することが行
われていた。 Furthermore, many data processes have been implemented that include nulls, which represent things that are not applicable, such as the maiden name of an unmarried person, or unknown values, such as missing values in statistical data. The existence of data items outside the data item's domain is not allowed and cannot be processed within the segmented compression method, so to achieve this
A separate item was created to indicate whether the item was null or not, and this item was added to the beginning of the composite key as a data item.

なお、Null値を示すためにデータ項目の特別
な値をNullとして約束するという手法が存在す
る。しかし、Nullとは姓名は判るけれども生年
月日は不明といつた例で考えると、DDD＝000は
不詳といつた指定が可能である。しかし、この情
報量は「無い」ということが単に分明すれば済む
のであるから、わざわざ６桁も保持する必要はな
いはずである。 Note that in order to indicate a null value, there is a method of promising a special value of a data item as null. However, if we consider Null as an example where the name is known but the date of birth is unknown, DDD=000 can be specified as unknown. However, since it is only necessary to clarify that this amount of information is ``non-existent,'' there should be no need to go to the trouble of storing six digits.

次に従来方式について更に詳しく説明する。本
発明はM.W.Blasgen et al（参考文献１を参照）
に基づく改良案（参考文献２を参照）を更に改良
したものである。 Next, the conventional method will be explained in more detail. The present invention is based on MWBlasgen et al. (see reference 1)
This is a further improvement of the improvement plan based on (see Reference 2).

例えば、（姓、名、生年月日）という３つのデ
ータ項目からなる複合キーを、文字は五十音順
で、数字は昇順に並べるといつた場合が問題であ
る。 For example, a problem arises when a composite key consisting of three data items (surname, given name, date of birth) is arranged with letters in alphabetical order and numbers in ascending order.

カナ文字名などを許すとき姓、名は通常各20文
字程度の領域を確保しておかなければならない。
生年月日は６桁程度（YYYDDD）で済むだろ
う。このようなキー値を例えば十万件を格納しよ
うとしたら、一件について20＋20＋６＝46文字
分、計470万件文字を要する。所が、日本人の平
均的な姓名なら1.7＋２＋６＝9.7文字程度で十分
であろうから、実際には97万文字もあれば情報量
としては十分だろうということは簡単に想像でき
る。 When allowing kana characters for names, you must usually reserve space for about 20 characters each for the surname and first name.
The date of birth should be around 6 digits (YYYDDD). If you try to store, for example, 100,000 such key values, each value will require 20+20+6=46 characters, a total of 4.7 million characters. However, for the average Japanese person's name, 1.7 + 2 + 6 = 9.7 characters would be sufficient, so it is easy to imagine that 970,000 characters would actually be enough for the amount of information.

そこで、姓、名の部分を個別に必要な長さを勝
手に取つて一つの文字列にするというようなもの
であるけれども、（林、葉三朗、990115）、（林葉、
二朗、990115）というキー値を辞書順にならべる
と、「林」の方が「林葉」よりも文字の長さが短
いので前者が後者よりも先にこなければならない
筈なのに、‘二’は‘三’より小さいという文字
のコード系を採用しているときに、姓名を連結す
ると順序が逆転してしまう。 Therefore, it is like taking the required length of the surname and first name individually and making them into a single string, (Hayashi, Yozaburou, 990115), (Hayashi Yo,
If we arrange the key values ``Jiro, 990115'' in lexicographical order, the character length of ``林'' is shorter than that of ``林波'', so the former should come before the latter, but ``二'' is When using a code system for characters less than 'three', the order will be reversed if the first and last names are concatenated.

このような副作用がなく、しかも圧縮後のキー
値が文字列全体の単純比較であり且つ元のキー値
が復元可能な復合キーの圧縮手法として、文字列
の区分化格納方式が提案された。 A segmented storage method for character strings has been proposed as a decoded key compression method that does not have such side effects, the key value after compression is a simple comparison of the entire string, and the original key value can be restored.

すなわち、元データ項目をデータ項目の種類毎
に一定長に分割して、その間に制御文字を挟み、
データはデータ同士で制御部は制御部同士で比較
されるようにし、制御部の部分で圧縮されたデー
タの扱いを決めてしまえるようにすることが目的
である。 In other words, the original data item is divided into fixed lengths for each type of data item, and control characters are inserted between them.
The purpose is to allow data to be compared between data and control units to be compared between control units, so that the control unit can decide how to handle the compressed data.

参考文献１の方式では、（Σ₁Σ₂…Σ_o）というｎ個のデータ項目からな
る複合キーを想定するとき、このΣ_iがσ₁σ₂…σ_Miという文字列（σは最小値を
ブランクとする文字コードだけからなるものとす
る）であるとき、これを次の関係とK_i＊l_i＞M_iを
満足する適当な長さ１のｋ区間に分割する。πは
適当なパツデイング文字とする。 In the method of Reference 1, when assuming a composite key consisting of n data items (Σ ₁ Σ ₂ ...Σ _o ), this Σ _i is a character string σ ₁ σ ₂ ...σ _Mi (σ is the minimum value (consists of only blank character codes), this is divided into k sections of appropriate length 1 that satisfy the following relationship and K _i *l _i >M _i . π is a suitable padding character.

σ₁σ₂…σ_L｜σ_L+1σ_L+2…σ_2*L｜………｜σ
_(k-1)*L+1…σ_Miπ………π この切れ目「｜」の部分をΣの実際に必要な長
さＭにしたがつて、次の規則で制御コードと置換
する。ただし、の制御文字の長さはσと同じ
とする。 σ ₁ σ ₂ …σ _L ｜σ _L+1 σ _L+2 …σ _2*L ｜………｜σ
_(k-1)*L+1 ...σ _Mi π......π This break "|" is replaced with the control code according to the following rule according to the actually required length M of Σ. However, the length of the control character is the same as σ.

切れ目の右の全区間でブランク以外の文字が
存在するならHigh−Valueを入れる。 If there are characters other than blank in the entire interval to the right of the break, enter High-Value.

※∀ｘ（‘σ_x’＜High−Value）＆文字数、で
High−Valueを定義する。 *∀x ('σ _x '＜High−Value) & number of characters,
Define High−Value.

右の全区間にブランク以外の文字がなく、残
り文字が区間長以下の数のとき最後のブランク
以外の文字数を保持する。 If there are no characters other than blanks in the entire interval on the right, and the number of remaining characters is less than or equal to the interval length, the number of characters other than the last blank is retained.

その区間を含み右の全区間にブランク以外の
文字がない区間は省略する。 A section that includes that section and has no characters other than blanks in all sections to the right is omitted.

これを各Σと置換して、Σ₁Σ₂…Σ_oを一つにス
トリングとして扱う。 Replace this with each Σ and treat Σ ₁ Σ ₂ ...Σ _o as one string.

もしΣが一致していれば、その区間は完全に一
致する。不一致の場合は大小関係の決定される区
間で比較は完了する。一致する文字以降に出現す
る長さが異なる文字は長い方が又はの理由で
大きいことになる。Σ_iまで一致している場合は、
その次の区間について同じ規則で比較が繰り返さ
れる。降順指定のキーについては、コードの補数
をとる。 If Σ match, the intervals match perfectly. If they do not match, the comparison is completed in the interval where the magnitude relationship is determined. Characters of different lengths that appear after the matching character are larger because the longer one is or. If it matches up to Σ _i ,
The comparison is repeated using the same rules for the next interval. For keys specified in descending order, the complement of the code is taken.

参考文献２の方式は、参考文献１の方式ではブ
ランクを最小コードとする文字列だけにデータ項
目を限定されていたが、これを文字以外のデータ
項目を含む複合キーも、データ項目の内容を問わ
ず最終区間だけからなる一つの文字列データ項目
と擬して連結すれば済むということを発見して拡
張したものである。なお、これを更に辞書順配列
するときは前方圧縮できることも示した。 The method in Reference 2 limits data items to character strings whose minimum code is blank in the method in Reference 1; This was expanded upon the discovery that it is sufficient to concatenate it as a single character string data item consisting only of the final interval. It was also shown that forward compression is possible when further arranging this in dictionary order.

[References]

参考文献１ Blasgen、M.W.、Carey、R.G.、＆
Eswaran、K.P.、‘An Encoding Method for
Mutifield Sorting and Indexing'、
Cmumnication of the ACM、Nov.1977、
Vol.20、No.．11、pp874−878 参考文献２特開昭58−2938号公報〔発明の目的〕本発明は、上記の考察に基づくものであつて、
圧縮対象をレコードの最小値のものと仮定する必
要がないとともに、Nullのデータ項目を扱い得
る区分化圧縮法を提供することを目的としてい
る。 References 1 Blasgen, MW, Carey, RG, &
Eswaran, KP, 'An Encoding Method for
Mutifield Sorting and Indexing',
Cmumnication of the ACM, Nov.1977,
Vol.20, No.．． 11, pp874-878 Reference 2 Japanese Unexamined Patent Publication No. 58-2938 [Object of the Invention] The present invention is based on the above consideration, and
The purpose of this paper is to provide a partitioned compression method that does not require the assumption that the compression target is the minimum value of a record and can handle null data items.

[Structure of the invention]

そしてそのため、本発明の文字型データの区分
化圧縮法は、複数の文字型データのデータ項目をキーとして
有する複合キーをデータ項目毎に指定された長さ
の区間に区分指定区分点にセパレータを挿入し、
セパレータに前後の状況によつて定まる値を与え
る文字型データの区分化圧縮法であつて、データ項目を区分するに際して、データ項目長
が指定区間長の整数倍でない場合には埋込み文字
で埋めると共に、データ項目がNullである場合にはこれを指定
区間長の所定コードで表現すると共にその後にセ
パレータを配置し、セパレータがデータ項目の途中に存在し且つ次
の最初のブランク以外の文字がブランクより小で
あるか、セパレータがデータ項目の途中に存在し且つ次
の最初のブランク以外の文字がブランクより大で
あるか、セパレータがデータ項目の最後に存在し且つ次
にデータ項目がNullでないか、セパレータがデータ項目の最終区間に存在し且
つ次のデータ項目がNullであるか、対応するデータ項目がNullであるかを判断し、判断結果に応じた特有値をセパレータに与える
ことを特徴とするものである。 Therefore, the segmentation compression method for character data of the present invention divides a composite key having multiple data items of character data as keys into an interval of a specified length for each data item, and inserts a separator at the designated segmentation point. Insert
It is a segmentation compression method for character type data that gives a value to the separator that is determined by the surrounding situation, and when separating data items, if the data item length is not an integral multiple of the specified interval length, it is padded with embedded characters and , If the data item is Null, it is expressed by a predetermined code with a specified interval length, and a separator is placed after it, and if the separator exists in the middle of the data item and the next character other than the first blank is or the separator is in the middle of a data item and the next non-blank character is greater than a blank, or the separator is at the end of a data item and the next data item is not null, It is characterized by determining whether the separator exists in the final section of a data item and the next data item is Null, or whether the corresponding data item is Null, and giving the separator a unique value according to the determination result. It is something.

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第１図はキーの圧縮を説明する図、第２図は本
発明が適用されるデータベース管理システムの１
例を示すものである。 Figure 1 is a diagram explaining key compression, and Figure 2 is a diagram of a database management system to which the present invention is applied.
This is an example.

第１図イは完全キーを示すものである。完全キ
ーは、キーＡ、キーＢ及びキーＰより構成されて
いる。キーＡ及びキーＢはレコードのデータ項目
である。キーＰはアドレスである。キーＡのキー
長は50字としている。Nullとは、そのデータ項
目の値が未決定であることを示している。 Figure 1A shows a complete key. The complete key is composed of key A, key B, and key P. Key A and key B are data items of the record. Key P is an address. The key length of key A is 50 characters. Null indicates that the value of the data item is undetermined.

第１図ロは、区分化圧縮を説明するものであ
る。キーＡは区間４で圧縮され、キーＢは区間３
であり圧縮しないものとしている。φは（ブラ
ンク）より小さいコードとしている。ブランクは
EBCDICコードでは16進数で、“40”なる値を有
している。第１図ロにおいて、四角枠□は区分化
のセパレータを示す。区分化セパレータの値は、区分の右の区間以降に文字が続くならば、 −１最初の文字として「空白記号」以外の文字
が「空白記号」以下だけのときは、0F₁₆ これは、PL／ＩのVARCHARや、Ｃのｎ
区切りなどのような、長さ分のデータが有意
な場合の特殊な処理に利用される。その他で
は空白記号以下は区切りから省略するという
方式が踏襲される。 FIG. 1B illustrates segmented compression. Key A is compressed in interval 4, key B is compressed in interval 3
, and is assumed not to be compressed. φ is a code smaller than (blank). The blank is
The EBCDIC code has the value "40" in hexadecimal. In FIG. 1B, square frames □ indicate separators for partitioning. The value of the segmentation separator is -1 if characters continue after the interval on the right of the segment; 0F ₁₆ if the first character other than the "blank symbol" is less than or equal to the "blank symbol"; /VARCHAR of I, n of C
It is used for special processing when the length of data is significant, such as delimiting. In other cases, the system is followed in which the space below the blank symbol is omitted from the delimiter.

−２最初の文字として「空白記号」以外の文字
が存在するときは、CF₁₆ これは、文字列比較だけが目的で大小判定を
早めるためである。-2 If a character other than a "blank symbol" exists as the first character, CF ₁₆ This is for faster size determination for the purpose of character string comparison only.

区分の右の全区間に文字が無い場合や、非文
字項目のセパレータ −１次の項目がNullでないときはBl₁₆ −２次の項目はNullであるときは、Cl₁₆なお、
ｌは最終区間の有効長。 If there are no characters in the entire interval to the right of the division, or the separator of non-character items - 1 If the next item is not Null, Bl ₁₆ -2 If the next item is Null, Cl ₁₆
l is the effective length of the final section.

このキーがNullであるときFF₁₆ 先頭項目がNullの場合又は直前に区間制御
が存在しない場合に利用する。なお、ｌは有効
な文字数を示している（この例では区間長16以
下を仮定）。 When this key is Null, FF ₁₆ Used when the first item is Null or when there is no section control immediately before. Note that l indicates the number of valid characters (in this example, the section length is assumed to be 16 or less).

第１図のロの(1)は第１図イの(1)のキーに対して
区分化圧縮法を施したものである。キーＡを区間
４で圧縮すると、「LION」と「φ」との間に
最初のセパレータが入る。次の最初のブランク以
外の文字はφであり、φはブランクより小さい値
をもつので、最初のセパレータは「OF」なる値
をもつ。次の区間はφであり、２文字しかない
のでで埋め「φ」とする。この区間はキ
ーＡの最終区間であり、有効な文字は最初の２文
字であるので、第２番目のセパレータは「B2」
なる値をもつ。キーＢは桁数３であり、区分化圧
縮を行わないので、第３番目のセパレータは
「B3」なる値をもつ。第１図イの(2)、(3)、(4)も同
様に区分化圧縮される。 (1) in FIG. 1B is obtained by applying the segmented compression method to the key in (1) in FIG. 1A. When key A is compressed in section 4, the first separator is inserted between "LION" and "φ". The next first non-blank character is φ, and φ has a value less than blank, so the first separator has a value of "OF". The next section is φ, and since there are only two characters, it is filled in with ``φ''. This section is the final section of key A, and the valid characters are the first two, so the second separator is "B2"
It has a value of Since key B has three digits and does not perform partitioned compression, the third separator has a value of "B3". (2), (3), and (4) in Figure 1A are similarly segmented and compressed.

第１図の完全キーの大きさの順序は次のように
して定められる。まずキーＡで大きさの順序を定
め、キーＡで大きさの順序が決まらなかつた場合
にはキーＢで大きさの順序を定め、キーＢで定ま
らなかつた場合にはキーＰで大きさの順序を定め
る。なお長さの異なるキーは後ろにをPadding
して同じ長さとし、比較するものとする。第１図
ロにおいて、(4)の区分化圧縮キーの第１区間は
FFFFFFFFなる値をもつので、(1)ないし(4)の中
で最も大きいものとされる。(1)、(2)、(3)の区分化
圧縮キーの第１区間は共に「LION」であるの
で、第１番目のセパレータの値を試みて大小関係
を決めることを試みる。(1)の区分化圧縮キーの第
１番目のセパレータ「OF」、(2)の区分化圧縮キー
の第１番目のセパレータは「C4」、(3)の区分化圧
縮法キーの第１番目のセパレータは「CF」であ
るので、(1)、(2)、(3)の区分化圧縮キーの大きさの
順序は(3)、(2)、(1)となる。 The size order of the complete keys in FIG. 1 is determined as follows. First, use key A to determine the size order, and if key A cannot determine the size order, use key B to determine the size order, and if key B cannot determine the size order, use key P to determine the size order. determine the order. Note that keys with different lengths are padded at the back.
shall be of the same length and compared. In Figure 1B, the first section of the segmented compression key in (4) is
Since it has a value of FFFFFFFF, it is considered the largest among (1) to (4). Since the first sections of the segmented compression keys (1), (2), and (3) are all "LION", the value of the first separator is tried to determine the magnitude relationship. The first separator of the segmented compression key in (1) is "OF", the first separator of the segmented compression key in (2) is "C4", and the first separator of the segmented compression key in (3) Since the separator of is "CF", the order of size of the partitioned compression keys of (1), (2), and (3) is (3), (2), and (1).

第１図ハは前方圧縮を説明するものである。第
１図ハにおいて、第１桁目は残り文字数を示し、
第２桁目は第何番目の文字から省略されていない
かを示している。第１図ロの(1)の区分化圧縮キー
の前方圧縮の結果は、その前との区分化圧縮キー
との比較で定まるので、図には示されていない。 FIG. 1C illustrates forward compression. In Figure 1 C, the first digit indicates the number of remaining characters,
The second digit indicates the number of characters that are not omitted. The result of forward compression of the segmented compression key in (1) of FIG. 1B is not shown in the figure because it is determined by comparison with the previous segmented compression key.

第２図は本発明が適用されるデータベース管理
システムの構成を示す図である。 FIG. 2 is a diagram showing the configuration of a database management system to which the present invention is applied.

第２図において、１はコントローラ、２はデー
タベース、３はインデツクス部、４はエンコー
ダ、５もエンコーダ、６はデコーダ、７もデコー
ダ、８はアクセツサ、９は制御線、１０はバス、
１１はデータ線、１２は入出力インタフエースを
それぞれ示している。エンコーダ４は第１図ロで
説明したような区分化圧縮を行うものであり、エ
ンコーダ５は第１図ハで説明したような前方圧縮
を行うものであり、デコーダ６はエンコーダ４の
逆の処理を行うものであり、デコーダ７はエンコ
ーダ５の逆の処理を行うものである。 In FIG. 2, 1 is a controller, 2 is a database, 3 is an index section, 4 is an encoder, 5 is an encoder, 6 is a decoder, 7 is a decoder, 8 is an accessor, 9 is a control line, 10 is a bus,
Reference numeral 11 indicates a data line, and reference numeral 12 indicates an input/output interface. The encoder 4 performs segmented compression as explained in FIG. 1B, the encoder 5 performs forward compression as explained in FIG. The decoder 7 performs the reverse processing of the encoder 5.

入出力インタフエース１２を介してコントロー
ラ１に送られて来たデータは、コントローラ１の
制御の下でエンコーダ４によつて区分化圧縮さ
れ、更にエンコーダ５を介して前方圧縮のための
マツチングを介しながらインデツクス部３の上の
データを探す。インデツクス部３には、区分化圧
縮された後に前方圧縮された形でキーと当該キー
を有するデータベース２上のレコードのアドレス
との組が入つている。これによつてレコードのア
ドレスがバス１０を介してアクセツサ８に送ら
れ、データベース２からの目的レコードがデータ
線１１、バス１０およびコントローラ１を介して
入出力インタフエース１２に送られる。レコード
を新たにデータ２に格納する際には、アクセツサ
８によつてアドレスを決めて入出力インタフエー
ス１２、コントーラ１、バス１０およびデータ線
１１経由でデータベース２にレコードを格納する
とともに、このアドレスとキーをバス１０、エン
コーダ４およびエンコーダ５を介して区分化圧
縮、前方圧縮を行い、インデツクス部３へ格納す
る。また、一定キー範囲に属するレコードを順次
取り出すときは、デコーダ７、デコーダ６、バス
１０およびコントローラ１の経路が利用される。 Data sent to the controller 1 via the input/output interface 12 is segmented and compressed by the encoder 4 under the control of the controller 1, and is further subjected to matching for forward compression via the encoder 5. While searching for the data above the index section 3. The index section 3 contains sets of keys and addresses of records on the database 2 that have the keys in a format that has been segmented and compressed and then forward compressed. This causes the address of the record to be sent to the accessor 8 via the bus 10 and the target record from the database 2 to the input/output interface 12 via the data line 11, the bus 10 and the controller 1. When storing a new record in the data 2, the address is determined by the accessor 8, and the record is stored in the database 2 via the input/output interface 12, the controller 1, the bus 10, and the data line 11. and the key are subjected to segmentation compression and forward compression via the bus 10, encoder 4 and encoder 5, and are stored in the index section 3. Furthermore, when records belonging to a certain key range are sequentially retrieved, the path of the decoder 7, decoder 6, bus 10, and controller 1 is used.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、Nullデータ項目を扱い得ることおよび圧縮
対象をコードの最小値のものと仮定する必要がな
いこと等の効果を奏することができる。 As is clear from the above description, according to the present invention, it is possible to produce effects such as being able to handle null data items and not having to assume that the compression target is the minimum value of the code.

従来技術では文字列圧縮だけを念頭においてい
るので、PL／ＩのVARCHARのデータ処理や、
Ｃの＼ｎのようなデリミタが後置される型のデー
タ処理は扱えない。すなわち、領域内には復元す
べきであるけれども文字列比較上は空白を無視し
てよい場合である。空白以外の文字があるものと
無いものとの比較をここで終了させることが可能
であるので、処理の高速化と、データの復元を両
立させることが可能である。もちろん、実施例で
あるからコード系に依存するけれども、文字列比
較では空白は無視できる。 Conventional technology only focuses on string compression, so it is not suitable for PL/I VARCHAR data processing,
It cannot handle data processing types that are followed by a delimiter such as C's \n. In other words, although the area should be restored, spaces can be ignored when comparing strings. Since it is possible to end the comparison between those with and without characters other than spaces at this point, it is possible to achieve both high-speed processing and data restoration. Of course, since this is an example, it depends on the code system, but spaces can be ignored in string comparisons.

[Brief explanation of drawings]

第１図はキーの圧縮を説明する図、第２図は本
発明が適用されるデータベース管理システムの一
例を示すものである。１……コントローラ、２……データベース、３
……インデツクス部、４……エンコーダ、５……
エンコーダ、６……デコーダ、７……デコーダ、
８……アクセツサ、９……制御線、１０……バ
ス、１１……データ線、１２……入出力インタフ
エース。 FIG. 1 is a diagram explaining key compression, and FIG. 2 shows an example of a database management system to which the present invention is applied. 1... Controller, 2... Database, 3
...Index section, 4...Encoder, 5...
encoder, 6... decoder, 7... decoder,
8...accessor, 9...control line, 10...bus, 11...data line, 12...input/output interface.

Claims

[Scope of Claims] 1. A composite key having a plurality of data items of character type data as keys is inserted into an interval of a specified length for each data item, and a separator is inserted at a dividing point, and a separator is inserted into the separator to indicate the previous and subsequent situations. This is a segmentation compression method for character type data that gives a value determined by the data item.When segmenting a data item, if the data item length is not an integral multiple of the specified interval length, it is filled with padding characters, and if the data item is Null. In some cases, this is expressed by a predetermined code with a specified interval length, followed by a separator, and if the separator exists in the middle of the data item and the next character other than the first blank is smaller than the blank, or if the separator is in the middle of a data item and the next first nonblank character is greater than a blank, or the separator is at the end of a data item and then the data item is not null, or the separator is the last character in the data item. Partitioning of character type data characterized by determining whether the next data item existing in the interval is Null or the corresponding data item is Null, and giving a unique value to the separator according to the determination result. Compression method.