JP3344755B2

JP3344755B2 - Ascending integer sequence data compression and decoding system

Info

Publication number: JP3344755B2
Application number: JP07093793A
Authority: JP
Inventors: 克信柴田
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 1993-03-05
Filing date: 1993-03-05
Publication date: 2002-11-18
Anticipated expiration: 2017-11-18
Also published as: JPH06259222A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、単調増加的に配列され
た昇順整数列データの圧縮および復号システムに関し、
特にデータベースから必要な情報を取り出すためのデー
タベース検索システムにおいて検索されるデータが単調
増加的に配列された整数列データである場合のそのデー
タの圧縮および復号システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for compressing and decoding monotonically increasing ascending integer sequence data.
In particular, the present invention relates to a system for compressing and decoding data obtained when data searched in a database search system for extracting necessary information from a database is integer string data arranged monotonically.

【０００２】[0002]

【従来の技術】従来、データを圧縮および復号する方法
の代表的なものとしては、ハフマン法、シャノン・ファ
ノ法、ギルバート・ムーア法、ランレングス符号化法な
どが知られている。たとえばハフマン法を用いたものと
しては特開平２−７８３２３号などが挙げられる。2. Description of the Related Art Conventionally, Huffman method, Shannon Fano method, Gilbert Moore method, run-length encoding method and the like are known as typical methods for compressing and decoding data. For example, Japanese Patent Application Laid-Open No. 2-78323 is an example using the Huffman method.

【０００３】[0003]

【発明が解決しようとする課題】これらの方法は主とし
て、データの文字ごとの出現頻度を測定し、頻度の高い
ものから優先的にデータのサイズを圧縮するものであ
る。これらの方法は、任意の形態のデータに適用できる
利点がある反面、圧縮、復号に数段階の処理を必要とす
るため、特に速度が要求される際には不向きである。These methods mainly measure the appearance frequency of each character of data, and compress the data size preferentially from those having higher frequency. Although these methods have an advantage that they can be applied to data of any form, they require several stages of processing for compression and decoding, and are therefore unsuitable especially when high speed is required.

【０００４】本発明は、上記のような問題に鑑み、単調
増加的（昇順）に配列された整数列データを高速で圧縮
するとともに、圧縮されたデータを記憶する記憶手段の
容量を小さくすることのできる圧縮および復号システム
を提供することを目的とする。SUMMARY OF THE INVENTION In view of the above problems, the present invention provides a method for compressing integer sequence data arranged monotonically (in ascending order) at a high speed, and reducing the capacity of a storage means for storing the compressed data. It is an object of the present invention to provide a compression and decoding system capable of performing the following.

【０００５】[0005]

【課題を解決するための手段】本発明の圧縮および復号
システムは、昇順に配列された整数列データの圧縮およ
び復号において、昇順に配列された整数列データのｎ番
目のデータから、第１の記憶手段に記憶されたｎ−１番
目のデータの減算を行うとともに、ｎ番目のデータを第
１の記憶手段に送る減算手段と、減算手段により得られ
た差分値を被除数として除算を行い、商および余りを出
力する第１の除算手段と、第１の除算手段により得られ
た商を０と比較する商比較手段と、商比較手段による比
較の結果０でない商を被除数として除算を行い、商およ
び桁上がりマークとともに余りを出力する第２の除算手
段と、第２の除算手段から出力される桁上がりマークお
よび余りを記憶するとともに、第１の除算手段から出力
される余りを記憶する第２の記憶手段と、第２の記憶手
段に記憶された桁上がりマークおよび２つの余りのデー
タから元の整数列データを復号する復号手段とを具備す
る。According to the compression and decoding system of the present invention, in the compression and decoding of integer sequence data arranged in ascending order, the n-th integer sequence data arranged in ascending order is used.
From the data of the eye, the (n-1) th stored in the first storage means
Of the nth data,
Subtraction means for sending to the first storage means, division by using the difference value obtained by the subtraction means as a dividend, quotient and remainder to be output, and quotient obtained by the first division means to be 0. Quotient comparison means for comparing with the quotient non-zero result of the comparison by the quotient comparison means, as a dividend, outputting the remainder together with the quotient and the carry mark, and the second division means. that stores the carry marks and the remainder from the first and second storage means for storing the remainder output from the division means, second storage means to the stored carry marks and two much data Decoding means for decoding the original integer sequence data.

【０００６】[0006]

【作用】本発明によれば、圧縮時に昇順データを、すで
に記憶された古い整数データと減算を行い、減算により
得られた差分を被除数として除算を行いその余りを出力
するとともに、その商を０と比較し、０でない商につい
てはこれを被除数として除算を行い、桁上がりマークと
ともに余りを出力し、商が０になるまでこの除算を繰り
返し、後者の除算により得られる桁上がりマーク、余
り、前者の除算により得られる余りを保存するようにし
ている。したがって、差分値を除算し、得られたデータ
を保存するようにしているから、従来の一般的な圧縮符
号化方法に比べて計算量を大幅に節約でき、高速で圧縮
および復号を行うことができる。また、統計量のような
データ全体にわたるパラメータを必要としないため、デ
ータの追加や削除を容易に実施することができる。According to the present invention, at the time of compression, the ascending data is subtracted from the already stored old integer data, the difference obtained by the subtraction is divided as the dividend, the remainder is output, and the quotient is set to 0. The quotient other than 0 is divided as the dividend, and the remainder is output together with the carry mark. This division is repeated until the quotient becomes 0. The carry mark obtained by the latter division, the remainder, the former The remainder obtained by dividing by is saved. Therefore, since the difference value is divided and the obtained data is stored, the amount of calculation can be greatly reduced as compared with the conventional general compression encoding method, and compression and decoding can be performed at high speed. it can. In addition, since parameters for the entire data such as statistics are not required, addition or deletion of data can be easily performed.

【０００７】[0007]

【実施例】図１には、本発明によるシステムの一実施例
が示されている。同図に示すように、昇順整数列データ
Ｄ１は３２０、３３３、４０１．．．と、単調増加的
（昇順）に配列されている。これらのデータはたとえば
３２ビットで表される。整数列データＤ１は圧縮装置に
おいて、減算部２２に送られる。減算部２２はそのデー
タの前に送られているデータの整数値によって減算を行
うとともに、今回送られた整数値を記憶部１１に送る。
すなわち、記憶部１１には、今回送られてきたデータの
直前に送られているデータが記憶されているから、これ
が減算部２２に読み出され、減算部２２は今回送られて
きたデータから直前に送られているデータを減算し、そ
の減算結果を除算部１２に送る。FIG. 1 shows an embodiment of the system according to the present invention. As shown in the drawing, ascending integer sequence data D1 includes 320, 333, 401. . . And are arranged in a monotonically increasing manner (ascending order). These data are represented by, for example, 32 bits. The integer sequence data D1 is sent to the subtraction unit 22 in the compression device. The subtraction unit 22 performs subtraction using the integer value of the data sent before the data, and sends the currently sent integer value to the storage unit 11.
That is, since the data transmitted immediately before the data transmitted this time is stored in the storage unit 11, the data is read out by the subtraction unit 22, and the subtraction unit 22 reads the data transmitted immediately before the data transmitted this time. Is subtracted, and the result of the subtraction is sent to the division unit 12.

【０００８】減算部２２はこの減算を行うとともに、今
回送られてきたデータを記憶部１１に送る。記憶部１１
には、減算部２２から送られた最新のデータが記憶され
る。なお、記憶部１１の初期値は０とする。The subtraction unit 22 performs this subtraction and sends the data sent this time to the storage unit 11. Storage unit 11
Stores the latest data sent from the subtraction unit 22. Note that the initial value of the storage unit 11 is 0.

【０００９】除算部１２は減算部２２から送られた減算
データを所定の値で除算する。本実施例では入力された
減算データを２５５で割る。得られた商は商比較部１３
に送られ、余りは圧縮数列Ｄ２処理部１６に送られる。The division unit 12 divides the subtraction data sent from the subtraction unit 22 by a predetermined value. In this embodiment, the input subtraction data is divided by 255. The obtained quotient is the quotient comparison unit 13
The remainder is sent to the compressed sequence D2 processing unit 16.

【００１０】商比較部１３は入力された商を０と比較
し、商が０でない場合にはこれを除算部１４へ送る。商
が０である場合には除算部１４へ何らデータを送らず、
昇順整数列データＤ１の次のデータを減算部２２へ入力
するよう指示を与える。除算部１４は、商比較部１３か
ら入力された商を所定の値、本実施例では２５６で除算
し、得られた商は再び商比較部１３に送られ、余りは桁
上がりを示すマーク文字Ｃとともに圧縮数列Ｄ２処理部
１６へ送られる。商比較部１３に送られた商が０と判定
されるまで除算部１４での除算が繰り返される。The quotient comparing section 13 compares the input quotient with 0, and sends the quotient to the dividing section 14 if the quotient is not 0. If the quotient is 0, no data is sent to the division unit 14,
An instruction is given to input the next data of the ascending integer sequence data D1 to the subtraction unit 22. The division unit 14 divides the quotient input from the quotient comparison unit 13 by a predetermined value, in this embodiment, 256, and sends the obtained quotient to the quotient comparison unit 13 again. The remainder is a mark character indicating a carry. It is sent to the compression sequence D2 processing unit 16 together with C. The division by the division unit 14 is repeated until the quotient sent to the quotient comparison unit 13 is determined to be 0.

【００１１】以上のようにして昇順整数列データの圧縮
処理が行われ、圧縮されたデータ列が圧縮数列処理部１
６に記憶される。The compression processing of the ascending integer sequence data is performed as described above, and the compressed data sequence is processed by the compression sequence processing unit 1.
6 is stored.

【００１２】次に具体的な例により説明する。最初のデ
ータ３２０が減算部２２に送られると、記憶部１１に記
憶された初期値が０であるから、３２０から０が減算さ
れ、差分値は３２０となり、除算部１２に送られる。除
算部１２では差分値３２０を２５５で除算し、商１、余
り６５が得られる。商比較部１３は入力された商を０と
比較し、商が０でないため、除算部１４に商１が送られ
る。除算部１４で商１が２５６によって除算され、商
０、余り１となるから、除算部１４は桁上がりマークＣ
とともに余り１を圧縮数列処理部１６に送り、次の整数
値を読むように通知する。Next, a specific example will be described. When the first data 320 is sent to the subtraction unit 22, the initial value stored in the storage unit 11 is 0, so 0 is subtracted from 320, the difference value becomes 320 and sent to the division unit 12. The division unit 12 divides the difference value 320 by 255 to obtain a quotient 1 and a remainder 65. The quotient comparison unit 13 compares the input quotient with 0, and since the quotient is not 0, the quotient 1 is sent to the division unit 14. The division unit 14 divides the quotient 1 by 256 to obtain the quotient 0 and the remainder 1, so that the division unit 14
At the same time, the remainder 1 is sent to the compressed sequence processing unit 16 to notify the next integer value to be read.

【００１３】圧縮数列処理部１６は、除算部１４から送
られた桁上がりマーク文字Ｃと余り１および除算部１２
から送られた余り６５を記憶する。The compression number sequence processing section 16 carries the carry mark character C sent from the division section 14 with the remainder 1 and the division section 12
The remaining 65 sent from is stored.

【００１４】次に整数列データＤ１からデータ３３３が
送られると、減算部２２は３３３を記憶部１１に記憶さ
れていた３２０で減算し、差分１３を得る。差分１３は
除算部１２に送られ、除算部１２は２５５で除算し、商
０、余り１３を得る。商０は商比較部１３に送られ、商
比較部１３は商０が判定され、整数列Ｄ１から次の整数
値を読み込むように指示が出される。この場合には商比
較部１３で商０が判定されているから、圧縮数列処理部
１６には除算部１２から送られた余り１３のみが送ら
れ、記憶される。Next, when the data 333 is sent from the integer sequence data D 1, the subtraction unit 22 subtracts 333 by 320 stored in the storage unit 11 to obtain a difference 13. The difference 13 is sent to the division unit 12, which divides by 255 to obtain a quotient 0 and a remainder 13. The quotient 0 is sent to the quotient comparing unit 13, and the quotient comparing unit 13 determines the quotient 0 and issues an instruction to read the next integer value from the integer sequence D1. In this case, since the quotient 0 is determined by the quotient comparing unit 13, only the remainder 13 sent from the dividing unit 12 is sent to the compressed sequence processing unit 16 and stored.

【００１５】同様の動作が繰り返されることにより、圧
縮されたデータが圧縮数列Ｄ２処理部１６に順次送られ
る。これらの圧縮データは保存部１５に記憶される。By repeating the same operation, the compressed data is sequentially sent to the compressed sequence D2 processing unit 16. These compressed data are stored in the storage unit 15.

【００１６】このように圧縮処理は、昇順整数列の先頭
から順に数列の差分Ｄを算出し、これを定数Ｌ(0)で割
った商Ｐ(0)、剰余Ｑ(0)を求め、商Ｐ(0)が０である場
合には、剰余Ｑ(0)のみを記憶手段に格納し、０でない
場合には、商Ｐ(0)をさらに定数Ｌ(1)で割り、商Ｐ
(1)、剰余Ｑ(1)を算出する。その後、この商Ｐ(i)(i=1,
2,...)が０になるまで、直前に算出された商Ｐ(i-1)を
被除数として除算を繰り返す。As described above, in the compression processing, a difference D of a sequence is calculated in order from the head of an ascending integer sequence, a quotient P (0) obtained by dividing the difference D by a constant L (0), and a remainder Q (0) are obtained. If P (0) is 0, only the remainder Q (0) is stored in the storage means. If P (0) is not 0, the quotient P (0) is further divided by a constant L (1).
(1) The remainder Q (1) is calculated. Then, this quotient P (i) (i = 1,
The division is repeated with the quotient P (i-1) calculated immediately before until (2,...) Becomes 0.

【００１７】漸化式で表すとＤ＝Ｌ(0)×Ｐ(0)＋Ｑ(0) Ｐ(i-1)＝Ｌ(i)×Ｐ(i)＋Ｑ(i)(i=1,2,...) この時、１つの差分値Ｄに対して以下のように数値が格
納される。Ｐ(0)＝０の場合、Ｑ(0)のみＰ(n)＝０（ｎ＞１）の場合、Ｃ、Ｑ(1)、Ｃ、Ｑ(2)、．．．Ｃ、Ｑ(n)、Ｑ(0) ただし、ＣはＱ(0)と区別可能なマーク文字である。ま
た、上記整数列の先頭の値については、その値自身を差
分値とする。除数Ｌ(i)は予め定義しておく。When expressed by a recurrence formula, D = L (0) × P (0) + Q (0) P (i−1) = L (i) × P (i) + Q (i) (i = 1,2 , ...) At this time, a numerical value is stored for one difference value D as follows. When P (0) = 0, only Q (0) When P (n) = 0 (n> 1), C, Q (1), C, Q (2),. . . C, Q (n), Q (0) where C is a mark character distinguishable from Q (0). As for the leading value of the integer sequence, the value itself is used as the difference value. The divisor L (i) is defined in advance.

【００１８】Ｌ(i)を大きく設定すれば、除算の結果、
商が０になる場合が増加し、演算コストを下げることが
できる。逆にＬ(i)を小さく設定すれば、除算の回数が
増加し、演算コストが上がるが余りを小さくおさえるこ
とができ、記憶領域を小さくすることができる。If L (i) is set large, the result of division is
The case where the quotient becomes 0 increases, and the calculation cost can be reduced. Conversely, if L (i) is set to be small, the number of divisions increases and the calculation cost increases, but the remainder can be kept small, and the storage area can be reduced.

【００１９】次に復号においては、直前に復号された数
値を保存しておき、これを利用する。最初の整数値に対
してはあらかじめ保持された値として０を用いる。Next, in decoding, the numerical value decoded immediately before is stored and used. For the first integer value, 0 is used as a value held in advance.

【００２０】まず保存部１５に記憶された圧縮データが
圧縮数列Ｄ２処理部１６に取り出され、読み取り部１７
により読み取られる。読み取り部１７は、圧縮データに
桁上がりを示すマーク文字Ｃが出現した場合には、その
直後のデータをバイアス処理部１８に送る。また、マー
ク文字Ｃの出現の有無にかかわらず、余りのデータを加
算部１９に送り、バイアス処理部１８に通知する。First, the compressed data stored in the storage unit 15 is taken out by the compressed sequence D2 processing unit 16 and read by the reading unit 17.
Is read by When the mark character C indicating a carry appears in the compressed data, the reading unit 17 sends the data immediately after that to the bias processing unit 18. Further, regardless of the presence or absence of the mark character C, the remaining data is sent to the adding unit 19 and notified to the bias processing unit 18.

【００２１】たとえば本実施例における最初の圧縮デー
タは、桁上がりを示すマーク文字Ｃであるから、読み取
り部１７はその直後のデータ１を読み取り、バイアス処
理部１８に送る。次に、読み取り部１７は余りのデータ
６５を読み取り、加算部１９に送る。For example, since the first compressed data in the present embodiment is the mark character C indicating a carry, the reading unit 17 reads the immediately following data 1 and sends it to the bias processing unit 18. Next, the reading unit 17 reads the remaining data 65 and sends it to the adding unit 19.

【００２２】バイアス処理部１８は、読み取り部１７か
ら送られてきたデータが何番目のマーク文字の後のデー
タであるかをカウントする。カウント数に応じて、バイ
アスが算出され、加算部に送られる。カウント数は読み
取り部１７から余り読み取りの通知がくると０に初期化
される。The bias processing section 18 counts the number of mark characters after the data sent from the reading section 17. The bias is calculated according to the count number and sent to the adding unit. The count number is initialized to 0 when the reading unit 17 receives a notice of the remaining reading.

【００２３】本実施例では、バイアス処理部１８に１番
目に送られてきたデータに対しては２５５を乗算し、２
番目以降のｎ（ｎ＝２，３，．．．）番目のデータに対
しては２５５×２５６^(n-1)を乗算して加算部１９に送
る。したがって、本実施例では１番目に送られてきたデ
ータ１に２５５を掛けて得られた２５５が加算部１９に
送られる。In the present embodiment, the first data sent to the bias processing unit 18 is multiplied by 255,
The nth (n = 2, 3,...) -Th data is multiplied by 255 × 256 ⁽ⁿ⁻¹⁾ and sent to the adder 19. Therefore, in the present embodiment, 255 obtained by multiplying the first sent data 1 by 255 is sent to the adder 19.

【００２４】加算部１９では、バイアス処理部１８から
送られてきたバイアス値を順次加算するとともに、読み
取り部１７から送られてくる余りを加算し、さらにその
前に復号され記憶部２３に記憶されている整数を読み出
して加算する。記憶部２３の初期値は０にされている。The adding section 19 sequentially adds the bias values sent from the bias processing section 18, adds the remainder sent from the reading section 17, and decodes and adds the remainder to the storage section 23. The integers are read and added. The initial value of the storage unit 23 is set to 0.

【００２５】本実施例ではバイアス処理部１８から送ら
れてきた２５５と読み取り部１７から送られてきた６５
とを加算し、さらに記憶部２３の初期値０を加算して、
復号データ３２０を得る。得られた復号データは復元整
数列Ｄ３保持部２１に送られ、必要に応じて出力され
る。In this embodiment, 255 sent from the bias processing unit 18 and 65 sent from the reading unit 17 are used.
Is added, and the initial value 0 of the storage unit 23 is further added.
The decrypted data 320 is obtained. The obtained decoded data is sent to the restored integer sequence D3 holding unit 21 and output as needed.

【００２６】このように、Thus,

【数１】 (Equation 1)

【００２７】を定義すれば、ｎ個のマーク文字および剰
余が読み込まれた時、差分Ｄは、Ｄ＝Ｑ(0)またはWhen the n mark characters and the remainder are read, the difference D becomes D = Q (0) or

【００２８】[0028]

【数２】 (Equation 2)

【００２９】として差分Ｄが復号でき、この差分Ｄを、
保持していた直前の復号された整数値に加えてもとの整
数値が復号できる。The difference D can be decoded as
The original integer value can be decoded in addition to the held immediately previous decoded integer value.

【００３０】本実施例によれば、上記のように圧縮時に
昇順データを、すでに記憶された古い整数データと減算
を行い、減算により得られた差分を被除数として除算を
行いその余りを出力するとともに、その商を０と比較
し、０でない商についてはこれを被除数として除算を行
い、桁上がりマークとともに余りを出力し、商が０にな
るまでこの除算を繰り返し、後者の除算により得られる
桁上がりマーク、余り、前者の除算により得られる余り
を保存するようにしている。したがって、差分値を除算
し、得られたデータを保存するようにしているから、従
来の一般的な圧縮符号化方法に比べて計算量を大幅に節
約できるから、高速で圧縮および復号を行うことができ
る。また、統計量のようなデータ全体にわたるパラメー
タを必要としないため、データの追加や削除を容易に実
施することができる。According to the present embodiment, the ascending data is subtracted from the already stored old integer data at the time of compression as described above, the difference obtained by the subtraction is used as a dividend, the remainder is output, and the remainder is output. , The quotient is compared with 0, the non-zero quotient is divided as a dividend, the remainder is output together with the carry mark, and this division is repeated until the quotient becomes 0, and the carry obtained by the latter division is obtained. The mark, the remainder, and the remainder obtained by the former division are preserved. Therefore, since the difference value is divided and the obtained data is stored, the amount of calculation can be significantly reduced as compared with the conventional general compression encoding method, so that high-speed compression and decoding are performed. Can be. In addition, since parameters for the entire data such as statistics are not required, addition or deletion of data can be easily performed.

【００３１】本発明による圧縮および復号システムは、
各種の昇順に配列された整数列データの圧縮および復号
に適用できる。たとえば次のようなデータ検索システム
におけるデータの処理に適用できる。A compression and decoding system according to the present invention comprises:
The present invention can be applied to compression and decoding of various types of integer sequence data arranged in ascending order. For example, the present invention can be applied to data processing in the following data search system.

【００３２】図２は、本発明が適用される一実施例を示
す近傍特徴量の抽出によるパターン検索システムのデー
タフロー図である。この検索システムでは、予め全対象
物件から事象（情報）の位相情報を全て捨象した近傍特
徴量データを作成し、そのデータ群に対して全物件検索
を行なう。検索のアルゴリズムは、学習ステップと検索
ステップとからなる。学習ステップでは、物件毎に近傍
特徴量行列が位相情報として作成される。図２では、検
索対象１０から近傍特徴量行列３０を作成し、それを構
造ファイル４０に保存するまでのステップに該当する。
また、検索ステップでは、検索キーに対して学習ステッ
プと同様の処理を行って検索キーの近傍特徴量が求めら
れ、物件の近傍特徴量行列とのマッチング演算が行なわ
れ、物件ごとにマッチング度（類似度）を示す評価結果
を得る。図２では、検索キー５０をもとに検索Ｓ４にて
構造ファイル４０の物件データとのマッチング演算を行
い、評価結果リスト７０あるいはソート済みリスト８０
のように結果を出力するまでのステップに該当する。以
下、各ステップについて説明する。FIG. 2 is a data flow diagram of a pattern search system by extracting a nearby feature quantity showing an embodiment to which the present invention is applied. In this search system, neighboring feature data is created from all target properties in advance by omitting all phase information of events (information), and a search for all properties is performed for the data group. The search algorithm includes a learning step and a search step. In the learning step, a neighborhood feature amount matrix is created as phase information for each property. In FIG. 2, this corresponds to a step of creating a neighborhood feature amount matrix 30 from the search target 10 and storing it in the structure file 40.
Also, in the search step, the same processing as in the learning step is performed on the search key to obtain the neighboring feature amount of the search key, and a matching operation is performed with the neighboring feature amount matrix of the property, and the matching degree ( (Similarity) is obtained. In FIG. 2, a matching operation with the property data of the structure file 40 is performed in search S4 based on the search key 50, and the evaluation result list 70 or the sorted list 80 is displayed.
Corresponds to the steps until the result is output. Hereinafter, each step will be described.

【００３３】（１）、学習ステップ図２に於いて、検索対象１０は、例えば日本語、英語、
ドイツ語、フランス語、ヘブライ語、ロシア語などの文
書データ、或いは量子化された波形数値データ、化学構
造式、遺伝子情報などである。このような検索対象に対
して、まず正規化手段Ｓ１により正規化の処理を行な
う。一般に検索対象は、情報の最小単位（文書であれば
アルファベットなどの文字、数値チャートであれば、あ
る時刻における実数値など）の列で表現されている。そ
れをなんらかの方法でｎ階調の整数列に変換する。これ
をデータの正規化と呼ぶ。(1) Learning step In FIG. 2, the search target 10 is, for example, Japanese, English,
Document data in German, French, Hebrew, Russian, etc., or quantized waveform numerical data, chemical structural formulas, genetic information, etc. For such a search target, first, normalization processing is performed by the normalization means S1. Generally, a search target is represented by a sequence of the minimum unit of information (a character such as an alphabet in a document, a real number at a certain time in a numerical chart, and the like). It is converted into an integer sequence of n gradations by some method. This is called data normalization.

【００３４】例えば、英文書データの場合、ＡＳＣＩＩ
コード表をそのまま用いることにより、次のような２５
６階調の数値表現として実現される。 …… This is a pen. …… 84｜104｜105｜115｜32｜105｜115｜32｜97｜32｜112｜101｜110｜46｜For example, in the case of English document data, ASCII
By using the code table as it is, the following 25
It is realized as a numerical representation of six gradations. …… This is a pen. …… 84 ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００３５】上記のコードにおいては、Ｔが84、ｈが10
4．．と対応している。In the above code, T is 84 and h is 10
Four. . It corresponds to.

【００３６】次に、正規化されたデータ２０から、学習
手段Ｓ２により近傍特徴量が算出され、以下に説明する
手順で近傍特徴量行列３０の形式に畳込まれる。ここで
近傍特徴量をとる演算式は種々考えられる。この演算式
は検索の鋭さ（過検出の少なさ）にも影響を与える。Next, from the normalized data 20, a neighboring feature value is calculated by the learning means S2, and is convolved into a format of a nearby feature value matrix 30 by the procedure described below. Here, various arithmetic expressions for calculating the neighborhood feature amount can be considered. This arithmetic expression also affects the sharpness of the search (less overdetection).

【００３７】学習手段Ｓ２の一例として、正規化された
データ２０から量子化量を求め、この量子化量を用いて
近傍特徴量行列３０を得る手順を説明する。例えば図４
に示すように、検索される対象物件（文書）が複数ある
とし、そのうちのｉ番目の物件の量子化について考え
る。ここで、ｉ番目の物件（文書）のｊ番目のデータ
（文字）をＣ_i,jとし、Ｃ_i,jのｋ近傍に関するデータを
Ｃ_i,j+1,Ｃ_i,j+2,....,Ｃ_i,j+kとする。ｉ番目の物件に
おいて、図３に示すように正規化された数値列135,64,3
7,71,101,...が並んでいるとすると、Ｃ_i,jに関する量
子化量ｘおよびＣ_i,jの前方ｋ近傍に関する量子化量ｙ
は、ｘ＝f(Ｃ_i,j) ｙ＝g(Ｃ_i,j,Ｃ_i,j+1,Ｃ_i,j+2,....,Ｃ_i,j+k) …式（１）で求められる。As an example of the learning means S2, a procedure for obtaining a quantization amount from the normalized data 20 and obtaining a neighboring feature amount matrix 30 using the quantization amount will be described. For example, FIG.
Suppose that there are a plurality of target properties (documents) to be searched, and the quantization of the i-th property is considered. Here, j-th data (characters) to C _i of the i-th property _(document), and _j, C _i, the data for the k-neighborhood of _{_{j C i, j + 1,}} C i, j + 2,. ..., Ci _{, j + k} . In the i-th property, a normalized numeric string 135, 64, 3 as shown in FIG.
7,71,101 and ... is that alongside, C _i, the quantization amount regarding _j x and C _i, the quantization weight for Upcoming k near the _j y
X = f (C _{i, j} ) y = g (C _{i, j} , C _{i, j + 1} , C _{i, j + 2} , ..., C _{i, j + k} ) ... ).

【００３８】ここで、f(Ｃ_i,j)はＣ_i,jに関するｎ段階
量子化関数である。すなわち、ｉ番目の物件のｊ番目の
データＣ_i,jについて所定の演算を行って得られる値で
あり、１〜ｎのいずれかの整数で表される。したがっ
て、このｎ段階量子化関数ｆの演算により得られた量子
化量ｘの値によって、図４に示す行列（座標）において
ｘ軸方向の位置が１〜ｎの範囲で定まる。Here, f (C _{i, j} ) is an n-stage quantization function for C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j} of the i-th property, and is represented by any integer from 1 to n. Therefore, the position in the x-axis direction in the matrix (coordinates) shown in FIG. 4 is determined in the range of 1 to n by the value of the quantization amount x obtained by the calculation of the n-stage quantization function f.

【００３９】また、g(Ｃ_i,j,Ｃ_i,j+1,Ｃ_i,j+2,....,Ｃ
_i,j+k)は、Ｃ_i,jの前方ｋ近傍に関するｍ段階量子化関
数である。すなわち、ｉ番目の物件のｊ番目のデータＣ
_i,jと、そのデータＣ_i,jの近傍の所定の数のデータＣ
_i,j+1,Ｃ_i,j+2,....,Ｃ_i,j+kとについて所定の演算を行
って得られる値であり、１〜ｍのいずれかの整数で表さ
れる。たとえば図３に示すようにｊ番目のデータＣ_i,j
が１３５であり、ｋが３の場合には、Ｃ_i,j+1,Ｃ_i,j+2,
Ｃ_i,j+3としてデータ１３５に続くデータ６４、３７、
７１を抽出し、これらのデータとデータ１３５との相関
について所定の演算を行う。ｊ番目のデータＣ_i,jが次
の６４の場合には、Ｃ_i,j+1,Ｃ_i,j+2,Ｃ_i,j+3としてデ
ータ６４に続くデータ３７、７１、１０１を抽出し、こ
れらのデータとデータ６４との相関について所定の演算
を行う。このようにしてｍ段階量子化関数ｇの演算によ
り得られた量子化量ｙの値によって、図４に示す行列
（座標）におけるｙ軸方向の位置が１〜ｍの範囲で定ま
る。G (C _{i, j} , C _{i, j + 1} , C _{i, j + 2} ,..., C
_{i, j + k} ) is an m-step quantization function for the neighborhood of k in front of C _{i, j} . That is, the j-th data C of the i-th property
_{i, j} and a predetermined number of data C near the data C _{i, j}
_{i, j + 1} , C _{i, j + 2} ,..., C _{i, j + k} are values obtained by performing a predetermined operation, and are represented by any integer from ₁ to _m . For example, as shown in FIG. 3, the j-th data C _{i, j}
Is 135 and k is 3, C _{i, j + 1} , C _{i, j + 2} ,
Data 64, 37 following data 135 as C _{i, j + 3} ,
71, and a predetermined calculation is performed on the correlation between these data and the data 135. If the j-th data Ci _{, j} is the next 64, the data 37, 71, 101 following the data 64 are extracted as Ci _{, j + 1} , Ci _{, j + 2} , Ci _{, j + 3.} Then, a predetermined operation is performed on the correlation between the data and the data 64. The position of the matrix (coordinates) shown in FIG. 4 in the y-axis direction in the range of 1 to m is determined by the value of the quantization amount y obtained by the calculation of the m-stage quantization function g.

【００４０】したがって、上記のように正規化されたデ
ータ２０から量子化量ｘ、ｙを求めることによって図４
に示す行列（座標）における位置が定まる。量子化量を
求める演算式ｆ()、ｇ()としては種々あるが、例えば、 f: x→x g: (x,y)→x-y（または｜x-y｜） …式（２）のように、演算式ｆ()は入力された値をそのまま量子化
量とし、演算式ｇ()は入力された２つの値の差、或いは
差の絶対値を量子化量とする例が考えられる。この場
合、正規化されたデータ２０が先の例84｜104｜105｜11
5.....では、データＣ_i,jを84とすると、Ｃ_i,jとＣ_i,j
の前方ｋ近傍に関する量子化量ｘ，ｙの座標位置は、(8
4,20)、(84,21)、(84,31)、.....となる。また、この式
（２）以外にも、幾つかの文字列の個々の文字整数値に
対し四則演算を施すことにより近傍特徴量を取り出して
もよい。図３中に示した量子化量ｘ，ｙの座標位置(51,
71)、(32,103)、.....は、上記式（２）とは異なる手法
によって求めたものである。Therefore, by obtaining the quantization amounts x and y from the data 20 normalized as described above, FIG.
Is determined in the matrix (coordinates) shown in FIG. There are various arithmetic expressions f () and g () for obtaining the quantization amount. For example, f: x → xg: (x, y) → xy (or | xy |) The operation expression f () may use the input value as the quantization amount as it is, and the operation expression g () may use the difference between the two input values or the absolute value of the difference as the quantization amount. In this case, the normalized data 20 corresponds to the previous example 84 | 104 | 105 | 11.
5 ....., assuming that data C _{i, j} is 84, C _{i, j} and C _{i, j}
The coordinate positions of the quantization amounts x and y with respect to the neighborhood of k in front of
4,20), (84,21), (84,31), ..... In addition to the equation (2), the neighborhood feature may be extracted by performing four arithmetic operations on individual character integer values of some character strings. The coordinate positions (51, 51) of the quantization amounts x and y shown in FIG.
71), (32, 103),... Are obtained by a method different from the above equation (2).

【００４１】本システムでは、各物件情報は、上記のよ
うにして求めた量子化量ｘ、ｙに対して物件の通番ｉと
重みｗ（x,y,i）の組として記憶される。重みｗ（x,y,
i）は、データｘ、ｙ、ｉから所定の演算によって求め
られるが、通常は重みｗ（x,y,i）の値は１に固定して
もよい。In the present system, each piece of property information is stored as a set of a property serial number i and a weight w (x, y, i) with respect to the quantization amounts x and y obtained as described above. Weight w (x, y,
i) is obtained by a predetermined calculation from the data x, y, and i, but usually the value of the weight w (x, y, i) may be fixed to 1.

【００４２】上記のようにして各物件についてデータＣ
_i,jごとに求められた量子化量ｘ、ｙの値に基づき図４
に棒によって示されるように、データを記憶する。すな
わち、データＣ_i,jの量子化量ｘ、ｙの値によって定め
られる座標の位置に、その物件の通番ｉとその重みｗ
（x,y,i）を組みとしたデータを記憶する。同図ではこ
のようなデータが記憶されるごとに棒の長さが延びるよ
うに表されている。もし重みｗ（x,y,i）を１とした場
合には、物件の通番ｉのデータのみがｘ、ｙの値によっ
て定められる座標の位置に記憶されてゆく。物件の通番
ｉのデータが図１に示す整数列データＤ１のように昇順
に配列された整数データであれば、前述の方法による圧
縮および復号に適している。したがって、前述の圧縮を
行うことにより、高速でデータを圧縮し、データの記憶
容量を小さくすることができる。As described above, data C for each property
4 based on the values of the quantization amounts x and y obtained for each of _{i and j} .
Store the data as indicated by the bar at That is, data C _i, the quantization amount x of _j, the position of coordinates defined by the value of y, the weight w and the serial number i of that property
Data containing (x, y, i) is stored. In the figure, the length of the bar is shown to be extended each time such data is stored. If the weight w (x, y, i) is 1, only the data of the serial number i of the property is stored at the position of the coordinates determined by the values of x and y. If the data of the serial number i of the property is integer data arranged in ascending order like the integer sequence data D1 shown in FIG. 1, it is suitable for compression and decoding by the above-described method. Therefore, by performing the above-described compression, data can be compressed at a high speed, and the storage capacity of the data can be reduced.

【００４３】この様にして作成された近傍特徴量行列に
物件の識別番号を付加して構造ファイル４０として保存
する。The property identification number is added to the neighborhood feature amount matrix created in this way, and is stored as the structure file 40.

【００４４】（２）、検索ステップまず検索キー５０を入力する。例えば、"This is a pe
n."を検索キーとする。この検索キー５０に対して学習
ステップでの正規化手段Ｓ１と同一の正規化方法に基づ
く正規化手段Ｓ３によりキー情報を以下の整数列に正規
化する。 84｜104｜105｜115｜32｜105｜115｜32｜97｜32｜112｜101｜110｜46｜(2) Search Step First, a search key 50 is input. For example, "This is a pe
n. "is used as a search key. The key information is normalized to the following integer sequence with respect to this search key 50 by the normalization means S3 based on the same normalization method as the normalization means S1 in the learning step. ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００４５】次に、検索手段Ｓ４において、学習ステッ
プでの学習手段Ｓ２と同一の自己相関計算式f()、g()を
用いて正規化された検索キー５０の数値列の先頭から量
子化量ｘ、ｙの組の系列を作成する。次に、この検索キ
ー５０の量子化量ｘ、ｙの組の系列に基づいて、構造フ
ァイル４０内から取り出した物件ｋに対する検索キー５
０の含有度数ω_kとして、Ｖ（ｘ_j,ｙ_j,ｋ）をｊ＝１〜
ｍについて合計することにより算出する。Next, in the search means S4, quantization is performed from the head of the numerical sequence of the search key 50 normalized using the same autocorrelation calculation formulas f () and g () as in the learning means S2 in the learning step. Create a series of pairs of quantities x, y. Next, based on the series of pairs of quantization amounts x and y of the search key 50, the search key
0 as containing the frequency omega _k _{_{of, V (x j, y j}} , k) and j =. 1 to
It is calculated by summing m.

【００４６】ただし、Ｖ（ｘ_j,ｙ_j,ｋ）は、構造ファイ
ル４０に記憶された物件ｉの重みに等しく、重みを持た
ない場合には０と定める。However, V (x _j , y _j , k) is equal to the weight of the property i stored in the structure file 40, and is set to 0 when it has no weight.

【００４７】したがって、検索すべきキー５０の数値列
から求めた量子化量ｘ、ｙの組に対応する図４の量子化
量のｘ、ｙの位置にデータがある場合（棒がある場合）
には、別に設けられた記憶手段のそのデータに示される
物件の通番ｉの格納箇所にその重みの値を構造評価値sc
ore（合致度）として記憶させる。Therefore, when there is data (when there is a bar) at the positions of x and y of the quantization amount in FIG. 4 corresponding to the combination of the quantization amounts x and y obtained from the numerical sequence of the key 50 to be searched.
In the storage section provided separately, the value of the weight is stored in the storage location of the serial number i of the property indicated by the data, the structure evaluation value sc
It is stored as ore (degree of match).

【００４８】次に、評価結果出力手段Ｓ５において、構
造ファイル４０内の各物件毎に得られた構造評価値scor
e（合致度）を完全一致の場合の評価値（この場合は、
文字数−ｋ、）で割って、検索キー５０の含有確率を求
め、評価結果のリスト７０を得る。更にソート手段Ｓ６
において、このリスト７０を含有確率の降順にソートし
ソート済みリスト８０を得る。Next, in the evaluation result output means S5, the structure evaluation value scor obtained for each property in the structure file 40 is obtained.
e (matching degree) is the evaluation value for an exact match (in this case,
By dividing by the number of characters -k,), the content probability of the search key 50 is obtained, and a list 70 of evaluation results is obtained. Further, sorting means S6
, The list 70 is sorted in descending order of the content probability to obtain a sorted list 80.

【００４９】このソート済みリスト８０が検索結果であ
り、その上位物件を参照することにより、検索キーが物
件中に含まれている確率が高い物件名を知ることができ
る。含有確率は、完全一致及び不完全一致の全てについ
て求まるから、あいまい一致検索を行なうことができ
る。The sorted list 80 is a search result, and by referring to a higher order property, it is possible to know a property name having a high probability that the search key is included in the property. Since the content probabilities are obtained for all of the perfect match and the incomplete match, a fuzzy match search can be performed.

【００５０】また、検索キーの全情報についての全物件
探索であるから、検索もれが発生する確率は、本質的に
零であると言う特徴がある。Further, since all the properties are searched for all the information of the search key, the probability of occurrence of a search omission is essentially zero.

【００５１】また、１つの物件に対する検索キーの評価
時間は、キーの文字数のみに依存し、物件の大きさには
依存しない。従って、非常に高速に検索を行なうことが
できる。The evaluation time of the search key for one property depends only on the number of characters of the key and does not depend on the size of the property. Therefore, the search can be performed at a very high speed.

【００５２】また検索結果のリストどうしの論理演算を
行うことにより、検索条件に対するＡＮＤ、ＯＲなどの
検索演算処理も高速に実行できる。By performing a logical operation between the search result lists, search operation processing such as AND, OR, and the like for the search condition can be executed at high speed.

【００５３】近傍特徴量は、各物件の全データを対象と
し取り出さなくてもよい。例えば、物件データ中の特定
の一つまたは一つ以上の整数値、特定の範囲の整数値、
或いはデータ列を構成する各バイト中の特定の１つまた
は一つ以上のビットを除外して近傍特徴量を捨象しても
よい。また日本語文書のように２バイト文字で構成され
ている場合には、例えば上位バイトを除外して下位バイ
トを対象として近傍特徴量を取り出してもよい。The neighboring feature amounts need not be extracted for all data of each property. For example, one or more specific integer values in property data, a specific range of integer values,
Alternatively, one or more specific bits in each byte constituting the data string may be excluded and the neighboring feature amounts may be omitted. Further, in the case of a two-byte character as in a Japanese document, for example, the upper-order byte may be excluded and the neighboring feature amount may be extracted from the lower-order byte.

【００５４】上述の例では、近傍特徴量によって生成さ
れる行列は、２５６次のビット行列であり、これは８Ｋ
バイトに相当する。従って、１物件のデータが１Ｋバイ
ト程度であるデータベースでは、効率のよいシステムで
あるとは言えない。そこで上記のようなデータ圧縮手段
Ｓ７を設けてデータ圧縮を行なって構造ファイル４０の
容量を減らすのがよい。In the above example, the matrix generated by the neighborhood feature is a 256-order bit matrix, which is 8K
Equivalent to bytes. Therefore, a database in which the data of one property is about 1 Kbyte cannot be said to be an efficient system. Therefore, it is preferable to provide the data compression means S7 as described above and perform data compression to reduce the capacity of the structure file 40.

【００５５】上述の実施例において，正規化手段Ｓ１、
学習手段Ｓ２、正規化手段Ｓ３、検索手段Ｓ４、評価結
果出力手段Ｓ５、ソート手段Ｓ６、データ圧縮手段Ｓ７
は、コンピュータプログラムによって構成することがで
きるが、論理回路素子を用いて専用のハードウエアを構
成してもよい。In the above embodiment, the normalizing means S1,
Learning means S2, normalization means S3, search means S4, evaluation result output means S5, sort means S6, data compression means S7
Can be configured by a computer program, but dedicated hardware may be configured using logic circuit elements.

【００５６】[0056]

【発明の効果】このように本発明によれば、差分を計算
しこれを基にして圧縮することにより格納される最大数
を抑えるから、圧縮率の向上を図ることができるととも
に、従来の一般的な圧縮符号化方法に比べて計算量を大
幅に節約できる。従って、高速で圧縮および復号を行う
ことができる。また、統計量のようなデータ全体にわた
るパラメータを必要としないため、データの追加や削除
を容易に実施することができる。As described above, according to the present invention, the maximum number to be stored is suppressed by calculating the difference and compressing it based on the difference, so that the compression ratio can be improved and the conventional general method can be achieved. The amount of calculation can be greatly reduced as compared with a typical compression encoding method. Therefore, compression and decoding can be performed at high speed. In addition, since parameters for the entire data such as statistics are not required, addition or deletion of data can be easily performed.

[Brief description of the drawings]

【図１】本発明による圧縮復号システムの一実施例のデ
ータフロー図である。FIG. 1 is a data flow diagram of one embodiment of a compression / decoding system according to the present invention.

【図２】本発明による圧縮復号システムを適用するデー
タベース検索システムのデータフロー図である。FIG. 2 is a data flow diagram of a database search system to which the compression / decoding system according to the present invention is applied.

【図３】近傍情報の量子化を示す図である。FIG. 3 is a diagram illustrating quantization of neighborhood information.

【図４】記憶される情報構造を示す図である。FIG. 4 is a diagram showing an information structure to be stored.

[Explanation of symbols]

１０検索対象１１記憶部１２除算部１３商比較部１４除算部１５保存部１６圧縮数列Ｄ２処理部１７読み取り部１８バイアス処理部１９加算部２０正規化データ２１復元数列Ｄ３保持部２２減算部２３記憶部３０近傍特徴量行列４０構造ファイル５０検索キー６０正規化キー７０評価結果リスト８０ソート済みリストＳ１正規化手段Ｓ２学習手段Ｓ３正規化手段Ｓ４検索手段Ｓ５評価結果出力手段Ｓ６ソート手段Ｓ７データ圧縮手段 Reference Signs List 10 search target 11 storage unit 12 division unit 13 quotient comparison unit 14 division unit 15 storage unit 16 compression sequence D2 processing unit 17 reading unit 18 bias processing unit 19 addition unit 20 normalized data 21 restoration sequence D3 holding unit 22 subtraction unit 23 storage Part 30 Neighborhood feature matrix 40 Structure file 50 Search key 60 Normalization key 70 Evaluation result list 80 Sorted list S1 Normalization means S2 Learning means S3 Normalization means S4 Search means S5 Evaluation result output means S6 Sorting means S7 Data compression means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭64−23623（ＪＰ，Ａ) 特開平４−326164（ＪＰ，Ａ) 特開平５−174067（ＪＰ，Ａ) 特開平５−181719（ＪＰ，Ａ) 特開平５−225238（ＪＰ，Ａ) 特開平５−225248（ＪＰ，Ａ) 特開平６−274193（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 5/00 G06F 17/30 H03M 7/18 G06F 12/00 511 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-64-23623 (JP, A) JP-A-4-326164 (JP, A) JP-A-5-174067 (JP, A) JP-A-5- 181719 (JP, A) JP-A-5-225238 (JP, A) JP-A-5-225248 (JP, A) JP-A-6-274193 (JP, A) (58) Fields investigated (Int. ⁷ , DB name) G06F 5/00 G06F 17/30 H03M 7/18 G06F 12/00 511

Claims

(57) [Claims]

1. A system for compressing and decoding integer sequence data arranged in ascending order, comprising :
Subtraction of the (n-1) th data stored in the first storage means
And stores the n-th data in the first storage
Subtraction means for sending to the means; first division means for performing division with the difference value obtained by the subtraction means as a dividend, and outputting a quotient and a remainder; and quotient obtained by the first division means to be 0. Quotient comparing means for comparison; second dividing means for performing division as a dividend with a quotient other than 0 as a result of the comparison by the quotient comparing means, and outputting a remainder together with the quotient and the carry mark; output from the second dividing means Second storage means for storing the carry mark and the remainder to be output, and the remainder output from the first division means; and the carry mark and the two carry means stored in the second storage means. Decoding means for decoding the original integer sequence data from the remaining data. A system for compressing and decoding ascending integer sequence data.

2. The ascending integer sequence data according to claim 1, wherein said second dividing means outputs said carry mark when dividing a non-zero quotient sent from said quotient comparing means. Compression and decoding system.

3. A third storage means for storing, for each property to be searched, its neighboring feature quantity, and a search means for performing an ambiguous search based on the degree of matching between the neighborhood feature quantity of the search key and the neighborhood feature quantity of the search subject. comprising a, on the data stored in said third memory means, said subtracting means, first and second dividing means, the quotient comparison means, and that the data compressed using the second storage means 2. The system for compressing and decoding ascending integer sequence data according to claim 1, wherein:

4. A quantization amount x for a j-th data string C _{i, j} of an i-th property to be searched and k data strings C _{i, j + 1} , C _{i, j + 2} , ...., Quantization amount y for C _{i, j + k}
And x = f (C _{i, j} ) y = g (C _{i, j} , C _{i, j + 1} , C _{i, j + 2} , ..., C _{i, j + k} ) 4. The compression and compression of ascending integer sequence data according to claim 3, wherein the data is used for searching a database that stores the serial number i of the property at the position of the third storage means determined based on the values of x and y obtained. Decryption system.