JPS63136222A

JPS63136222A - Data alignment processing system fitted with variable length label

Info

Publication number: JPS63136222A
Application number: JP28334786A
Authority: JP
Inventors: Mitsuo Kiyono; 清野　三男
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-11-28
Filing date: 1986-11-28
Publication date: 1988-06-08

Abstract

PURPOSE:To minimize a memory area and the comparison/transfer data value and to shorten the processing time by aligning the corresponding data based on the relevant original labels registered in a table after the labels larger than the prescribed length are compressed. CONSTITUTION:When a data input part 1 supplied the unaligned data including the variable length labels to a label length fixing part 2 via a file 7, the corresponding relation between the variable length labels and the original labels is registered into a table table 3 with these variable length compressed to the fixed length in case the variable length labels are larger than the prescribed length. While the blank codes are added when the variable labels are smaller than the prescribed value and these variable lengths are turned into the fixed value. When this length fixing process is through, a fixed length label data aligning part 4 is started and the arrangement processing is repeated for two data containing the labels of the compressed and expanded fixed lengths according to the difference of label lengths. Then a label restoring part 5 is started and the original variable length labels are restored in response to the compressed or non-compressed state of the fixed length labels. Then the arranged data containing the variable length labels undergone no length fixing process via expansion/ compression are applied to the corresponding file 8 via a data output means 6.

Description

【発明の詳細な説明】産業上の利用分野本発明は、情報処理システムにおける言語処理やユーテ
ィリティなどに利用される可変長名標付きデータの整列
処理方式に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a method for sorting variable-length labeled data used for language processing and utilities in information processing systems.

従来の技術情報処理システムでは、可変長の名標をキーとしてこれ
らの名標が付されたデータを所定の順序に並べ替えるデ
ータの整列処理が往々にして必要となる。In conventional technical information processing systems, it is often necessary to perform a data sorting process in which data to which variable-length names are attached is rearranged in a predetermined order using variable-length names as keys.

従来、このようなデータの整列処理方式では、可変長の
各名標を最大長の桁数に合わせて固定長にして比較した
り、名標の長さを意識しながら可変長で比較したり、名
標部分だけをテーブルに登録しておき比較を行なうたび
にテーブルを参照したりしている。Conventionally, in such data sorting processing methods, each name tag of variable length was compared with a fixed length according to the number of digits of the maximum length, or it was compared with a variable length while keeping the length of the name tag in mind. , only the name tag part is registered in a table and the table is referenced every time a comparison is made.

発明が解決しようとする問題点上記従来のデータの整列処理方式のうち、最大長の桁数
に合わせて名標を固定長にして比較するものは、全名標
の領域を最大長に合わせて確保しなければならず大きな
記憶領域が必要となるだけでなく、転送・比較対象のデ
ータ量が多くなるという問題がある。Problems to be Solved by the Invention Among the conventional data sorting processing methods mentioned above, the method in which name tags are set to a fixed length according to the number of digits of the maximum length and compared is a method that adjusts the area of all name tags to the maximum length There is a problem in that not only a large storage area is required, but also the amount of data to be transferred and compared increases.

また、名標の長さを意識しながら可変長比較する方式で
は、比較対象が可変長のため比較速度をか高めることが
困難で、またデータの転送量も可変長のため操作が複雑
になるという問題がある。In addition, in the method of comparing variable lengths while keeping in mind the length of the name tag, it is difficult to increase the comparison speed because the comparison target is variable length, and the amount of data transferred is also variable length, making operations complicated. There is a problem.

さらに、名標をテーブルに登録しておいて比較のつどテ
ーブルを参照する方式では、テーブル・アクセスに費や
すふん比較速度が低下し、また全名標をテーブルに登録
するために大きなメモリ量が必要となるという問題があ
る。Furthermore, with the method of registering name names in a table and referring to the table each time a comparison is made, the comparison speed decreases due to the amount of time required to access the table, and a large amount of memory is required to register all name names in the table. There is a problem that.

問題点を解決するための手段本発明の可変長名標付きデータの整列方式は、各データ
に付された名標のうち所定長以上の名標については原名
標との対応関係を名標テーブルに登録しつつ固定長に圧
縮すると共に所定長以下の名標については所定の符号を
付加することによって上記所定長に伸張する名標固定長
化部と、各データに付されている固定長の名標を比較し
、両者が一致する場合において一方又は双方が圧縮され
ているときには前記名標テーブルに登録中の対応の原名
標に基づき、その他の場合には固定長の名標に基づき対
応のデータの整列処理を行う固定長名標付きデータ整列
部と、この固定長名標付きデータ名標整列部によって所
定の順序に整列された各データに付されている固定長の
名標を上記名標テーブルを参照しつつ可変長の原名標に
復元する名標復元手段とを備えることにより、最も時間
がかかりかつハードウェア量がかさむ名標の比較の段階
を圧縮済みの固定長名標に基づき実行するように構成さ
れている。Means for Solving the Problems The method of sorting data with variable-length name tags of the present invention is that for name tags attached to each data that are longer than a predetermined length, the correspondence with the original name tag is stored in a name table. A name fixed length conversion unit compresses the name into a fixed length while registering it, and adds a predetermined code to the name less than a predetermined length to expand it to the predetermined length. Compare the name marks, and if they match, if one or both are compressed, the corresponding original name name registered in the name name table is used, and in other cases, the corresponding name name is compared based on the fixed-length name name. A fixed-length name-tagged data arrangement unit that performs data sorting processing, and a fixed-length name tag attached to each data sorted in a predetermined order by this fixed-length name-tagged data name arrangement unit. By providing a name name restoration means that restores the name name name to a variable-length original name name while referring to the name name table, the step of comparing name names, which takes the most time and requires a large amount of hardware, can be performed based on compressed fixed-length name names. configured to run.

実施例第１図は、本発明の一実施例の方式概念図であり、１は
データ入力部、２は名標固定長化部、３は名標テーブル
、４は固定長名標付きデータ整列部、５は名標復元部、
６はデータ出力部、７は未整列データの格納ファイル、
８は整列済みデータの格納ファイルである。Embodiment FIG. 1 is a conceptual diagram of a method according to an embodiment of the present invention, in which 1 is a data input section, 2 is a fixed-length name labeling section, 3 is a name table, and 4 is a data arrangement with fixed-length name tags. Department, 5 is nameplate restoration department,
6 is a data output section, 7 is a storage file for unsorted data,
8 is a storage file for sorted data.

データ入力部１は、未整列データの格納ファイル７から
可変長名標が付された未整列のデータを入力し、名標固
定長化部２に渡す。名標固定長化部２は、入力された未
整列のデータに付されている可変長の名標の長さを検査
し、所定長以上であれば原名標との対応関係を名標テー
ブルに登録しつつ固定長に圧縮する。また、名標固定長
化部２は、未整列データに付されている可変長の名標の
長さが所定値以下であれば、空白など所定の符号を付加
することによって上記所定長に伸張することによって固
定長化を行う。The data input unit 1 inputs unsorted data with variable-length name tags attached from the storage file 7 for unsorted data, and passes it to the name tag fixed length conversion unit 2 . The fixed name name length conversion unit 2 checks the length of the variable length name tag attached to the input unsorted data, and if it is longer than a predetermined length, the name name fixed length conversion unit 2 stores the correspondence with the original name name in the name name table. Compress to fixed length while registering. Furthermore, if the length of the variable-length name tag attached to the unsorted data is less than a predetermined value, the name fixed length generator 2 expands it to the above-mentioned predetermined length by adding a predetermined code such as a blank space. By doing this, the length is fixed.

第２図は、上記名標固定長化部２によって作成される固
定長の名標と、名標テーブル３の登録内容の一例を示す
概念図である。FIG. 2 is a conceptual diagram showing an example of a fixed-length name name created by the name name fixed-length conversion unit 2 and the contents registered in the name name table 3.

各名標は、Ｎビットの固定長に圧縮される。このＮビッ
トのうち前半のｎ１ビツトが比較有効部分であり、後半
のｎ２ビツトは圧縮の有無や原データとの対応関係の情
報を格納する比較無効部分である。Each landmark is compressed to a fixed length of N bits. Of these N bits, the first n1 bits are a valid comparison part, and the latter n2 bits are a comparison invalid part that stores information about the presence or absence of compression and the correspondence with the original data.

第２図の最上段に例示するような原名標ｒＡＪの場合、
その長さがｎ、ビット幅の比較有効部分を超過するため
、ｎ、ビット幅のｒ　ＡＩ　Ｊ　ｓ分ト超過部分「Ａ２
」とに分割され、ｎ１ビツト幅の比較有効部分「ＡＩ」
に、これが圧縮されていることを表示する圧縮表示フラ
グＦ１及び超過部分「Ａ２」を格納している名標テーブ
ル３内の（ｎｔ−１）ビットのポインタＰ、から成るｎ
２ビツトの比較無効部分が付加されたＮビットの固定長
の名標に圧縮される。そして、このポインタＰ、で指定
される名標テーブル３内の領域には、名標ｒＡＪの超過
部分「Ａｔ」が登録される。In the case of the original name mark rAJ as illustrated in the top row of Figure 2,
Since its length exceeds the comparison effective part of n, bit width, the excess part 'A2
”, and the n1 bit width comparison effective part “AI”
, a compression display flag F1 indicating that this is compressed, and a pointer P of (nt-1) bits in the name table 3 storing the excess portion "A2".
It is compressed into a fixed-length name of N bits with a 2-bit comparison invalid part added. Then, in the area in the name table 3 specified by the pointer P, the excess portion "At" of the name rAJ is registered.

２段目に例示するような原名標ｒＢＪの場合、その長さ
がｎ、ビット幅の比較有効部分と一致するため、ｎ、ビ
ットの原名標ｒＢＪそのものから成る比較有効部分と、
この名標が圧縮されていないことを表示する非圧縮表示
フラグＦ２と、（ｎｚ−１）ビットの連続“Ｏ”から成
る無効ポインタＰ０から成るＮビットの固定長名標に変
換される。In the case of the original name mark rBJ as illustrated in the second row, its length matches the comparison effective part of n bits width, so the comparison effective part consisting of the original name mark rBJ itself of n bits,
This name name is converted into an N-bit fixed-length name name consisting of an uncompressed display flag F2 indicating that the name name is not compressed, and an invalid pointer P0 consisting of (nz-1) bits of continuous "O".

第２図の３段目に例示するような原名標ｒｃＪの場合、
その長さがｎ１ピント幅の比較有効部分に満たないため
、原名標「Ｃ」そのものに空白を付加してｎｌ　ビット
に伸張した比較有効部分と、この名標が圧縮されていな
いことを表示する非圧縮表示フラグＦ２と、（ｎｚ−１
）ビットの連続“Ｏ”から成る無効ポインタＰ０から成
るＮビ・ノドの固定長名標に変換される。In the case of the original name mark rcJ as exemplified in the third row of Figure 2,
Since its length is less than the effective comparison part of n1 focus width, a space is added to the original name mark "C" itself to display the comparison effective part expanded to nl bits and that this name mark is not compressed. Uncompressed display flag F2 and (nz-1
) It is converted into an N-bit fixed-length name mark consisting of an invalid pointer P0 consisting of a series of "0" bits.

名標固定長化部２は、データ入力部１から受取った未整
列のデータの全てについて上述した名標の固定長化処理
を終了すると、固定長名標付きデータ整列部４を起動す
る。When the name tag fixed length conversion section 2 completes the above-described name tag fixed length conversion processing for all the unsorted data received from the data input section 1, it starts the fixed length name tag data arrangement section 4.

固定長名標付きデータ整列部４は、上述のようにして圧
縮・伸張された固定長の名標を有する２個のデータにつ
いて、それぞれに付された名標のの大小関係を検査し、
この大小関係に従って両者の順序を並べ替える整列処理
を繰り返す。この際、固定長名標付きデータ整列部４は
、両データに付されている固定長の名標の比較有効部分
を比較し、両者が一致する場合には圧縮表示フラグＦ、
を検査する。固定長名標付きデータ整列部４は、上記検
査結果から一方又は双方が圧縮されていることを検出す
ると、名標テーブル３に登録中の対応の超過部分をも含
めた原名標に基づき大小関係を比較し、両データの並べ
替えを行う、固定長名標付きデータ整列部４は、その他
の場合には固定長の名標の比較有効部分だけで両データ
の並べ替えを行う。The data sorting unit 4 with fixed length name tags examines the size relationship of the name tags attached to the two data having fixed length name tags compressed and expanded as described above, and
The sorting process of rearranging the order of both is repeated according to this size relationship. At this time, the fixed-length name tag data sorting unit 4 compares the comparatively valid portions of the fixed-length name tags attached to both data, and if the two match, the compressed display flag F,
Inspect. When the fixed length name tag data sorting unit 4 detects that one or both of them are compressed from the above inspection results, the fixed length name tag data sorting unit 4 sorts out the size relationship based on the original name tag including the excess portion of the correspondence currently registered in the name tag table 3. In other cases, the data sorting unit 4 with fixed-length name tags sorts both data using only the comparatively valid portions of the fixed-length name tags.

固定長名標付きデータ整列部４は、固定長名標付きデー
タの全てについて上記所定の並べ替えによる整列を終了
すると名標復元部５を起動する。The fixed-length name-tagged data sorting unit 4 activates the name-tag restoring unit 5 after completing the sorting of all the fixed-length name-tagged data by the above-described predetermined sorting.

名標復元部５は、所定の順序に整列されたデータのそれ
ぞれに付加されている固定長名標の圧縮の有無を圧縮表
示フラグＦ、と非圧縮表示フラグＦ２に基づき検査し、
圧縮されている場合には、比較無効部分のポインタに基
づき名標テーブルから超過部分を読出して合成すること
により、元の可変長名標に復元する。名標復元部５は、
非圧縮名標の比較有効部分に空白が付加されていれば、
これを切り捨てることによって元の可変長名標に復元す
る。The name name restoring unit 5 checks whether the fixed length name tags added to each of the data arranged in a predetermined order is compressed based on the compressed display flag F and the non-compressed display flag F2,
If it is compressed, the excess portion is read out from the name table based on the pointer of the comparison invalid portion and combined, thereby restoring the name to the original variable length name. The nameplate restoration unit 5
If spaces are added to the comparison valid part of the uncompressed name tag,
By truncating this, the original variable length name tag is restored.

上述の可変長名標の復元がすべての整列済みデータにつ
いて終了するとデータ出力手段６が起動され、伸縮によ
る固定長化前の可変長名標の付された整列済みデータが
対応のファイル８に出力される。When the restoration of the variable length name tags described above is completed for all the sorted data, the data output means 6 is activated, and the sorted data with the variable length name tags attached before being converted to a fixed length by expansion/contraction is output to the corresponding file 8. be done.

発明の効果以上詳細に説明したように、本発明の可変長名標付きデ
ータの整列方式は、所定長以上の名標については原名標
との対応関係を名標テーブルに登録しつつ固定長に圧縮
し、各データに付されている固定長の名標を比較し、両
者が一敗する場合において一方又は双方が圧縮されてい
るときには名標テーブルに登録中の対応の原名標に基づ
き対応のデータの整列処理を行い、整列処理の終了後に
各データに付されている固定長の名標を上記名標テーブ
ルを参照しつつ可変長の原名標に復元する構成であるか
ら、最大長で固定長化する方式に比べて記憶領域と比較
・転送のためのデータ量を最小限にすることができ、処
理時間が短縮されると同時にハードウェア量が節減され
る効果がある。Effects of the Invention As explained in detail above, the method of sorting data with variable-length name tags of the present invention is that for name tags that are longer than a predetermined length, the correspondence relationship with the original name tag is registered in the name tag table while the data is arranged in a fixed length. Compare the fixed-length names attached to each data, and if both lose once, if one or both are compressed, the corresponding original name is registered in the name table. The data is sorted, and after the sorting process is completed, the fixed-length name tags attached to each data are restored to the variable-length original name tags while referring to the name table above, so the maximum length is fixed. Compared to a method that takes longer, the storage area and the amount of data for comparison and transfer can be minimized, which has the effect of shortening processing time and reducing the amount of hardware.

また、名標の長さを意識しながら可変長比較する方式に
比べて操作が聞易になり、処理時間も短縮できる。Furthermore, compared to the method of comparing variable lengths while keeping in mind the length of the name tag, the operation is easier to understand and the processing time can be shortened.

また、名標テーブルに登録しておいて比較のつどテーブ
ルを参照する方式に比べて、テーブルの参照が不要にな
る非短縮名標の比率だけ処理速度とメモリ容量を節減で
きる効果がある。Furthermore, compared to a method in which names are registered in a name table and the table is referred to each time a comparison is made, processing speed and memory capacity can be reduced by the proportion of non-abbreviated names that do not require table reference.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係わる可変長名標付きデー
タの整列処理方式の方式概念図、第２図は名標に対する
固定長化処理の一例を示す概念図である。１・・・データ入力部、２・・・名標固定長化部、３・
・・名標テーブル、４・・・固定長名標付きデータ整列
部、５・・・名標復元部、６・・・データ出力部、７・
・・可変長名標付き未整列データ格納ファイル、８・・
・可変長名標付き整列済みデータ格納ファイル。FIG. 1 is a conceptual diagram of a method for sorting data with variable-length name tags according to an embodiment of the present invention, and FIG. 2 is a conceptual diagram showing an example of fixed-length processing for name tags. 1...Data input section, 2...Name tag fixed length conversion section, 3.
...Name table, 4...Fixed length name tag data arrangement unit, 5...Name name restoration unit, 6...Data output unit, 7.
・Unsorted data storage file with variable length name label, 8...
- Sorted data storage file with variable length name tags.

Claims

[Claims] In a data sorting method that arranges each piece of data in a predetermined order using a variable-length name tag attached to data as a key, for the name tags attached to each data that are longer than a predetermined length, a fixed name name length converting unit that compresses the correspondence to the original name name in a name name table while compressing it to a predetermined length, and adds a predetermined code to those whose length is less than the predetermined length to expand them to the predetermined length; Compare the fixed-length names attached to each data, and if they match, if one or both are compressed, use the corresponding original name registered in the name table, and in other cases. is attached to each data sorted in a predetermined order by the fixed-length name-tagged data alignment unit that sorts the corresponding data based on the fixed-length name tag. a name name restoring means for restoring a fixed-length name name in a fixed-length name name into a variable-length original name name while referring to the name name table;