JPH05181913A

JPH05181913A - Compression and decoding system for ascending-order integer string data

Info

Publication number: JPH05181913A
Application number: JP3357900A
Authority: JP
Inventors: Hiroshi Takada; 寛高田
Original assignee: Nippon Steel Corp
Current assignee: Nippon Steel Corp
Priority date: 1991-12-26
Filing date: 1991-12-26
Publication date: 1993-07-23
Anticipated expiration: 2014-12-20
Also published as: JP2993540B2

Abstract

PURPOSE:To shorten the processing time by decreasing the calculation quantity in the compression and decoding of the ascending-order integer string data. CONSTITUTION:The ascending-order integer string data D1 are divided by a divisor part 12 and the obtained quotient is compared by a quotient storage and comparison part 14 with old quotients which are obtained so far; only when the quotient is varied, the difference and remainder of the quotient are preserved as a compressed string D2 and when not, only the remainder is preserved. The quantity of data is decreased by the division, so the processing time for the compression and decoding is shortened. Further, since parameters required for the whole data are not necessary, the data can be added and deleted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、単調増加的に配列され
た昇順整数列データの圧縮および復号システムに関し、
特にデータベースから必要な情報を取り出すためのデー
タベース検索システムにおいて検索されるデータが単調
増加的に配列された整数列データである場合のそのデー
タの圧縮および復号システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compression and decoding system for ascending integer sequence data arranged in a monotonically increasing manner.
In particular, the present invention relates to a compression and decoding system for data retrieved in a database retrieval system for retrieving necessary information from a database, which is integer sequence data arranged in a monotonically increasing manner.

【０００２】[0002]

【従来の技術】従来、データを圧縮および復号する方法
の代表的なものとしては、ハフマン法、シャノン・ファ
ノ法、ギルバート・ムーア法、ランレングス符号化法な
どが知られている。たとえばハフマン法を用いたものと
しては特開平２−７８３２３号などが挙げられる。2. Description of the Related Art Conventionally, as typical methods for compressing and decoding data, the Huffman method, Shannon-Fano method, Gilbert-Moore method, run-length coding method and the like are known. For example, as a method using the Huffman method, there is JP-A-2-78323.

【０００３】[0003]

【発明が解決しようとする課題】これらの方法は主とし
て、データの文字ごとの出現頻度を測定し、頻度の高い
ものから優先的にデータのサイズを圧縮するものであ
る。これらの方法は、任意の形態のデータに適用できる
利点がある反面、圧縮、復号に数段階の処理を必要とす
るため、特に速度が要求される際には不向きである。These methods mainly measure the appearance frequency of each character of the data and preferentially compress the size of the data in descending order of frequency. These methods have the advantage that they can be applied to any form of data, but require several stages of processing for compression and decoding, and are therefore unsuitable especially when speed is required.

【０００４】本発明は、上記のような問題に鑑み、単調
増加的（昇順）に配列された整数列データを高速で圧縮
するとともに、圧縮されたデータを記憶する記憶手段の
容量を小さくすることのできる圧縮および復号システム
を提供することを目的とする。In view of the above problems, the present invention compresses integer sequence data arranged in a monotonically increasing order (ascending order) at high speed and reduces the capacity of the storage means for storing the compressed data. It is an object of the present invention to provide a compression and decoding system capable of performing.

【０００５】[0005]

【課題を解決するための手段】本発明の圧縮および復号
システムは、昇順に配列された整数列データの圧縮およ
び復号において、昇順に配列された整数列データについ
て除算を行う除算手段と、除算手段により得られた商を
すでに記憶された古い商と比較し、得られた商が古い商
よりも大きい場合にこれらの商の差を出力する商記憶比
較手段と、商記憶比較手段から商の差が出力された場合
には商の差とともに除算手段により得られた余りを記憶
し、商記憶比較手段から商の差が出力されない場合には
除算手段により得られた余りのみを記憶する記憶手段
と、記憶手段に記憶された商の差および余りのデータか
ら元の整数列データを復号する復号手段とを具備する。In the compression and decoding system of the present invention, in the compression and decoding of the integer sequence data arranged in ascending order, the dividing means and the dividing means for dividing the integer sequence data arranged in ascending order. The quotient obtained by the above is compared with the already stored old quotient, and when the obtained quotient is larger than the old quotient, the quotient memory comparison means for outputting the difference between these quotients and the quotient difference comparison means A storage means for storing the remainder obtained by the dividing means together with the quotient difference when the is output, and for storing only the remainder obtained by the dividing means if the quotient difference is not output from the quotient storage comparing means. , And decoding means for decoding the original integer sequence data from the quotient difference and the remainder data stored in the storage means.

【０００６】[0006]

【作用】本発明によれば、圧縮時には昇順に配列された
データを除算し、得られた商をそれまでの古い商と比較
して商の差がある場合にのみ商の差を保存するとともに
余りを保存し、商の差がない場合には余りのデータのみ
を保存するようにしている。したがって、従来の一般的
な圧縮符号化方法に比べて計算量を大幅に節約できるか
ら、高速で圧縮および復号を行うことができる。また、
統計量のようなデータ全体にわたるパラメータを必要と
しないため、データの追加や削除を容易に実施すること
ができる。According to the present invention, the data arranged in ascending order is divided at the time of compression, the obtained quotient is compared with the old quotient, and the quotient difference is saved only when there is a quotient difference. The remainder is saved, and if there is no quotient difference, only the remaining data is saved. Therefore, the amount of calculation can be greatly saved as compared with the conventional general compression encoding method, so that compression and decoding can be performed at high speed. Also,
Since data-wide parameters such as statistics are not required, data can be easily added or deleted.

【０００７】[0007]

【実施例】図１には、本発明によるシステムの一実施例
が示されている。同図に示すように、整数列データＤ１
は３２０、３３３、４０１．．．と、単調増加的（昇
順）に配列されている。これらのデータはたとえば３２
ビットで表される。整数列データＤ１は圧縮装置におい
て、除算部１２に送られる。除算部１２は入力されたデ
ータに対して所定の値によって除算を行う。本実施例で
は入力されたデータを２５５で割る。得られた商は商記
憶比較部１４に送られ、余りは圧縮数列Ｄ２処理部１６
に送られる。1 shows an embodiment of the system according to the invention. As shown in the figure, integer string data D1
320, 333, 401. ．． And are arranged in a monotonically increasing order (ascending order). These data are for example 32
Expressed in bits. The integer string data D1 is sent to the division unit 12 in the compression device. The division unit 12 divides the input data by a predetermined value. In this embodiment, the input data is divided by 255. The obtained quotient is sent to the quotient storage comparison section 14, and the remainder is the compressed sequence D2 processing section 16
Sent to.

【０００８】商記憶比較部１４は入力された新しい商Ｐ
new を記憶されている古い商Ｐoldと比較する。古い商
Ｐold は初期値として０が与えられる。商記憶比較部１
４はＰnew ＞Ｐold の場合には、桁上がりを示すマーク
文字Ｃおよび商の差Ｐnew −Ｐold を圧縮数列Ｄ２処理
部１６に送るとともに、記憶されていた古い商Ｐoldに
代えて新しい商Ｐnew を記憶する。この条件を満たさな
い場合には、商記憶比較部１４は圧縮数列Ｄ２処理部１
６へ何らデータを送らない。The quotient memory comparing unit 14 receives the new quotient P
Compare new with the stored old quotient Pold. The old quotient Pold is given 0 as an initial value. Quotient memory comparison unit 1
When Pnew> Pold, 4 sends the mark character C indicating a carry and the difference Pnew-Pold of the quotient to the compressed sequence D2 processing unit 16 and stores the new quotient Pnew instead of the stored old quotient Pold. To do. If this condition is not satisfied, the quotient memory comparison unit 14 determines the compressed sequence D2 processing unit 1
No data is sent to 6.

【０００９】本実施例においては、最初のデータ３２０
を２５５で割ると、商１、余り６５が得られるが、古い
商Ｐold の初期値として０が与えられているため、Ｐne
w ＞Ｐold を満たし、商記憶比較部１４は桁上がりを示
すマーク文字Ｃおよび商の差１を圧縮数列Ｄ２処理部１
６に送るとともに、記憶されていた古い商０に代えて新
しい商１を記憶する。In this embodiment, the first data 320
Dividing 1 by 255 gives the quotient 1 and the remainder 65, but since the initial value of the old quotient Pold is 0, Pne
When w> Pold is satisfied, the quotient memory comparing unit 14 compresses the mark character C indicating a carry and the difference 1 of the quotient into the compressed sequence D2 processing unit 1
A new quotient 1 is stored instead of the stored old quotient 0.

【００１０】圧縮数列Ｄ２処理部１６は、商記憶比較部
１４から送られた桁上がりを示すマーク文字Ｃおよび商
の差１、および除算部１２から送られた余り６５を記憶
する。The compressed sequence D2 processing unit 16 stores the mark character C indicating the carry and the quotient difference 1 sent from the quotient storage comparing unit 14, and the remainder 65 sent from the dividing unit 12.

【００１１】次に整数列データＤ１として３３３が送ら
れると、除算部１２はこれを同様に２５５で割る。この
場合には商１、余り７８となる。商記憶比較部１４は新
しい商Ｐnew を記憶されている古い商Ｐold と比較す
る。この場合にはＰnew およびＰold はいずれも１であ
るから、上記の条件Ｐnew ＞Ｐold を満たさない。した
がって、圧縮数列Ｄ２処理部１６には除算部１２からの
余りのデータのみが送られる。Next, when 333 is sent as the integer string data D1, the division unit 12 similarly divides this by 255. In this case, the quotient is 1, and the remainder is 78. The quotient memory comparison unit 14 compares the new quotient Pnew with the stored old quotient Pold. In this case, since both Pnew and Pold are 1, the above condition Pnew> Pold is not satisfied. Therefore, only the residual data from the division unit 12 is sent to the compressed sequence D2 processing unit 16.

【００１２】同様の動作を繰り返すことにより、圧縮数
列Ｄ２処理部１６には圧縮されたデータが順次送られ
る。これらの圧縮データは保存部１８に記憶される。By repeating the same operation, the compressed data is sequentially sent to the compressed sequence D2 processing section 16. These compressed data are stored in the storage unit 18.

【００１３】復号においては、保存部１８に記憶された
圧縮データが圧縮数列Ｄ２処理部１６に取り出され、読
み取り部２２により読み取られる。読み取り部２２は、
圧縮データに桁上がりを示すマーク文字Ｃが出現した場
合には、その直後のデータをバイアス記憶部２４に送
る。また、マーク文字Ｃの出現の有無にかかわらず、余
りのデータを加算部２６に送る。In decoding, the compressed data stored in the storage unit 18 is taken out by the compressed sequence D2 processing unit 16 and read by the reading unit 22. The reading unit 22 is
When the mark character C indicating a carry appears in the compressed data, the data immediately after that appears in the bias storage unit 24. Further, the remainder data is sent to the addition unit 26 regardless of the appearance of the mark character C.

【００１４】たとえば本実施例における最初の圧縮デー
タは、桁上がりを示すマーク文字Ｃがあるから、その直
後のデータ１をバイアス記憶部２４に送る。また、余り
のデータ６５を加算部２６に送る。For example, since the first compressed data in this embodiment has the mark character C indicating a carry, the data 1 immediately after that is sent to the bias storage section 24. Further, the remainder data 65 is sent to the addition unit 26.

【００１５】バイアス記憶部２４は、同図に示すよう
に、商に基づく値Ｉを保存し、読み取り部２２からマー
ク文字Ｃの直後のデータΔＰ、すなわち商の差が送られ
た場合には除数Ｌと商の差ΔＰとの積Ｌ×ΔＰを、それ
まで保存されていた値Ｉに加算し、得られた値を新たな
値Ｉとして保存するとともに、加算部２６へ出力する。
Ｉの初期値は０とされる。As shown in the figure, the bias storage unit 24 stores the value I based on the quotient, and when the reading unit 22 sends the data ΔP immediately after the mark character C, that is, the difference between the quotients, the divisor. The product L × ΔP of L and the quotient difference ΔP is added to the value I that has been saved up to that point, and the obtained value is saved as a new value I and is output to the adder 26.
The initial value of I is 0.

【００１６】本実施例においては、上記のようにマーク
文字Ｃの直後のデータΔＰとして１が送られており、除
数Ｌは２５５であるから、バイアス記憶部２４は２５５
×１をＩの初期値０に加算した値２５５を保存するとと
もに、加算部２６へ出力する。In this embodiment, 1 is sent as the data ΔP immediately after the mark character C as described above, and the divisor L is 255, so the bias storage unit 24 has 255.
A value 255 obtained by adding x1 to the initial value 0 of I is stored and output to the addition unit 26.

【００１７】加算部２６は、バイアス記憶部２４から送
られるＩと読み取り部２２から送られる余りを加算す
る。この例では、バイアス記憶部２４から送られる２５
５と読み取り部２２から送られる余り６５を加算し、復
号データ３２０を得る。得られた復号データは復元数列
Ｄ３保持部２８に送られ、必要に応じて出力される。The addition unit 26 adds the I sent from the bias storage unit 24 and the remainder sent from the reading unit 22. In this example, 25 sent from the bias storage unit 24.
5 and the remainder 65 sent from the reading unit 22 are added to obtain the decoded data 320. The obtained decoded data is sent to the restored sequence D3 holding unit 28 and output as necessary.

【００１８】本実施例によれば、上記のように圧縮時に
昇順データを除数Ｌで除算し、得られた商をそれまでの
古い商と比較して商の差がある場合にのみ商の差を保存
するとともに余りを保存し、商の差がない場合には余り
のデータのみを保存するようにしている。したがって、
従来の一般的な圧縮符号化方法に比べて計算量を大幅に
節約できるから、高速で圧縮および復号を行うことがで
きる。また、統計量のようなデータ全体にわたるパラメ
ータを必要としないため、データの追加や削除を容易に
実施することができる。According to the present embodiment, as described above, the ascending order data is divided by the divisor L at the time of compression, and the obtained quotient is compared with the old quotient so far. And the remainder are saved, and when there is no difference in quotient, only the remaining data is saved. Therefore,
Since the amount of calculation can be significantly saved as compared with the conventional general compression encoding method, compression and decoding can be performed at high speed. In addition, since a parameter for the entire data, such as a statistic, is not required, it is possible to easily add or delete the data.

【００１９】本発明による圧縮および復号システムは、
各種の昇順に配列された整数列データの圧縮および復号
に適用できる。たとえば次のようなデータ検索システム
におけるデータの処理に適用できる。The compression and decoding system according to the present invention comprises:
It can be applied to compression and decoding of various integer sequence data arranged in ascending order. For example, it can be applied to data processing in the following data search system.

【００２０】図２は、本発明が適用される一実施例を示
す近傍特徴量の抽出によるパターン検索システムのデー
タフロー図である。この検索システムでは、予め全対象
物件から事象（情報）の位相情報を全て捨象した近傍特
徴量データを作成し、そのデータ群に対して全物件検索
を行なう。検索のアルゴリズムは、学習ステップと検索
ステップとからなる。学習ステップでは、物件毎に近傍
特徴量行列が位相情報として作成される。検索ステップ
では、検索キーと近傍特徴量行列とのマッチング演算が
行なわれ、物件ごとにマッチング度（類似度）を示す評
価結果を得る。以下、各ステップについて説明する。FIG. 2 is a data flow diagram of the pattern search system by extracting the neighborhood feature amount showing an embodiment to which the present invention is applied. In this search system, neighborhood feature amount data in which all phase information of events (information) is removed from all target properties is created in advance, and all property searches are performed on the data group. The search algorithm includes a learning step and a search step. In the learning step, a neighborhood feature amount matrix is created as phase information for each property. In the search step, a matching operation between the search key and the neighborhood feature amount matrix is performed to obtain an evaluation result indicating the matching degree (similarity) for each property. Each step will be described below.

【００２１】（１）、学習ステップ図２に於いて、検索対象１０は、例えば日本語、英語、
ドイツ語、フランス語、ヘブライ語、ロシア語などの文
書データ、或いは量子化された波形数値データ、化学構
造式、遺伝子情報などである。このような検索対象に対
して、まず正規化手段Ｓ１により正規化の処理を行な
う。一般に検索対象は、情報の最小単位（文書であれば
アルファベットなどの文字、数値チャートであれば、あ
る時刻における実数値など）の列で表現されている。そ
れをなんらかの方法でｎ階調の整数列に変換する。これ
をデータの正規化と呼ぶ。(1) Learning Step In FIG. 2, the search target 10 is, for example, Japanese, English,
Document data in German, French, Hebrew, Russian, etc., or quantized waveform numerical data, chemical structural formulas, genetic information, and the like. For such a search target, the normalization means S1 first performs a normalization process. In general, a search target is represented by a column of minimum units of information (characters such as alphabets in the case of documents, real numerical values at a certain time in the case of numerical charts). It is converted into an integer sequence of n gradations by some method. This is called data normalization.

【００２２】例えば、英文書データの場合、ＡＳＣＩＩ
コード表をそのまま用いることにより、次のような２５
６階調の数値表現として実現される。 …… This is a pen. …… 84｜104 ｜105 ｜115 ｜32｜105 ｜115 ｜32｜97｜32｜112 ｜101 ｜110 ｜46｜For example, in the case of English document data, ASCII
By using the code table as it is, the following 25
It is realized as a numerical expression with 6 gradations. …… This is a pen. …… 84 ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００２３】上記のコードにおいては、Ｔが84、ｈが10
4 ．．と対応している。In the above code, T is 84 and h is 10
Four . ． It corresponds to.

【００２４】正規化されたデータ２０は、次に学習手段
Ｓ２により近傍特徴量行列３０の形式に畳込まれる。こ
こで近傍特徴量をとる演算式は種々考えられる。この演
算式は検索の鋭さ（過検出の少なさ）にも影響を与え
る。The normalized data 20 is then convoluted into the form of the neighborhood feature quantity matrix 30 by the learning means S2. Here, various arithmetic expressions for obtaining the neighborhood feature amount are possible. This arithmetic expression also affects the sharpness of search (the degree of overdetection is small).

【００２５】今、ｉ番目の物件（文書）のｊ番目のデー
タ（文字）をＣ_i,jとし、Ｃ_i,jに関する量子化量ｘと
Ｃ_i,jの前方ｋ近傍に関する量子化量ｙを次のようにし
て求める。ここでは、検索される対象物件（文書）がｎ
個あるとし、そのうちのｉ番目の物件の量子化について
説明する。ｉ番目の物件において、図３に示すように正
規化された数値列135,64,37,71,101,...が並んでいると
すると、Ｃ_i,jに関する量子化量ｘは、ｘ＝f(Ｃ_i,j）Ｃ_i,jの前方ｋ近傍に関する量子化量ｙはｙ＝g(Ｃ_i,j, Ｃ_i,j+1,Ｃ_i,j+2,....,Ｃ_i,j+k) で求められる。[0025] Now, j-th data (characters) to C _i of the i-th property _(document), and _j, C _i, the quantization amount x and C _i relates _{_j,} quantization amount for Upcoming k near the _j y Is calculated as follows. Here, the target property (document) to be searched is n
Given that there are individual pieces, the quantization of the i-th property will be described. Assuming that the normalized numerical value sequence 135,64,37,71,101, ... is arranged in the i-th property as shown in FIG. 3, the quantization amount x for C _{i, j} is x = f (C _{i, j} ) Quantization amount y for the front k neighborhood of C _{i, j} is y = g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ...,} C _{i , j + k} ).

【００２６】ここで、f(Ｃ_i,j）はＣ_i,jに関するｎ段
階量子化関数である。すなわち、ｉ番目の物件のｊ番目
のデータＣ_i,jについて所定の演算を行って得られる値
であり、１〜ｎのいずれかの整数で表される。したがっ
て、得られたｘの値によって図４に示す行列（座標）に
おいてｘ軸方向の位置が１〜ｎの範囲で定まる。Here, f (C _{i, j} ) is an n-step quantization function for C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j} of the i-th property, and is represented by any integer of 1 to n. Therefore, the position in the x-axis direction in the matrix (coordinates) shown in FIG. 4 is determined within the range of 1 to n by the obtained value of x.

【００２７】また、g(Ｃ_i,j, Ｃ_i,j+1,Ｃ_i,j+2,....,
Ｃ_i,j+k) は、Ｃ_i,jの前方ｋ近傍に関するｍ段階量子
化関数である。すなわち、ｉ番目の物件のｊ番目のデー
タＣ_i,jとそのデータの近傍の所定の数のデータについ
て所定の演算を行って得られる値であり、１〜ｍのいず
れかの整数で表される。たとえば図３に示すようにｊ番
目のデータＣ_i,jが１３５であり、ｋが３の場合には、
Ｃ_i,j+1,Ｃ_i,j+2,Ｃ_i,j+3としてデータ１３５に続くデ
ータ６４、３７、７１を抽出し、これらのデータとデー
タ１３５との相関について所定の演算を行う。ｊ番目の
データＣ_i,jが次の６４の場合には、Ｃ_i,j+1,Ｃ_i,j+2,
Ｃ_i,j+3としてデータ６４に続くデータ３７、７１、１
０１を抽出し、これらのデータとデータ６４との相関に
ついて所定の演算を行う。Further, g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ...,}
C _{i, j + k} ) is an m-step quantization function with respect to the front k neighborhood of C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j of} the i-th property and a predetermined number of data in the vicinity of that data, and is represented by an integer of 1 to _m. It For example, as shown in FIG. 3, when the j-th data C _{i, j} is 135 and k is 3,
The data 64, 37, 71 following the data 135 are extracted as C _{i, j + 1,} C _{i, j + 2,} C _{i, j + 3} , and a predetermined calculation is performed on the correlation between these data and the data 135. . When the j-th data C _{i, j} is the next 64, C _{i, j + 1,} C _{i, j + 2,}
Data 37, 71, 1 following data 64 as C _{i, j + 3}
01 is extracted, and a predetermined calculation is performed on the correlation between these data and the data 64.

【００２８】このようにして得られたｙの値によって、
図４に示す行列（座標）におけるｙ軸方向の位置が１〜
ｍの範囲で定まる。したがって、上記のようにｘ、ｙを
求めることによって図４に示す行列（座標）における位
置が定まる。According to the value of y thus obtained,
The position in the y-axis direction in the matrix (coordinates) shown in FIG.
Determined in the range of m. Therefore, by determining x and y as described above, the position in the matrix (coordinates) shown in FIG. 4 is determined.

【００２９】本システムでは、各物件情報は、上記のよ
うにして求めたｘ、ｙに対して物件の通番ｉと重みｗ
（x,y,i)の組として記憶される。重みｗ（x,y,i)は、デ
ータｘ、ｙ、ｉから所定の演算によって求められるが、
通常は重みｗ（x,y,i)の値は１に固定される。In this system, each piece of property information has a serial number i and a weight w of the property for x and y obtained as described above.
It is stored as a set of (x, y, i). The weight w (x, y, i) is obtained from the data x, y, i by a predetermined calculation,
Normally, the value of the weight w (x, y, i) is fixed to 1.

【００３０】上記のようにして求められたデータＣ_i,j
ごとにｘ、ｙの値に基づき図４に棒によって示されるよ
うに、データを記憶する。すなわち、データＣ_i,jの
ｘ、ｙの値によって定められる座標の位置に、その物件
の通番ｉとその重みｗ（x,y,i)を組みとしたデータを記
憶する。同図ではこのようなデータが記憶されるごとに
棒の長さが延びるように表されている。通常は重みｗ
（x,y,i)は１とされるから、物件の通番ｉのデータのみ
がｘ、ｙの値によって定められる座標の位置に記憶され
てゆく。この物件の通番ｉのデータは昇順に配列された
整数データであるから、前述の方法による圧縮および復
号に適している。したがって、前述の圧縮を行うことに
より、高速でデータを圧縮し、データの記憶容量を小さ
くすることができる。The data C _{i, j} obtained as described above
The data is stored for each one based on the x, y values, as indicated by the bars in FIG. That is, the data in which the serial number i of the property and its weight w (x, y, i) are combined is stored at the position of the coordinates determined by the values of x and y of the data C _{i, j} . In the figure, the length of the bar is shown to be extended each time such data is stored. Usually weight w
Since (x, y, i) is set to 1, only the data of the serial number i of the property is stored at the position of the coordinates determined by the values of x and y. Since the data of the serial number i of this property is integer data arranged in ascending order, it is suitable for compression and decoding by the above-mentioned method. Therefore, by performing the above-described compression, the data can be compressed at high speed and the data storage capacity can be reduced.

【００３１】この様にして作成された近傍特徴量行列に
物件の識別番号を付加して構造ファイル４０として保存
する。The identification number of the property is added to the neighborhood feature amount matrix created in this way, and the structure file 40 is saved.

【００３２】（２）、検索ステップまず検索キー５０を入力する。例えば、"This is a pe
n."を検索キーとする。この検索キー５０に対して学習
ステップと同一の正規化方法に基づく正規化手段Ｓ３に
よりキー情報を整数列に正規化する。 84｜104 ｜105 ｜115 ｜32｜105 ｜115 ｜32｜97｜32｜112 ｜101 ｜110 ｜46｜(2) Search Step First, the search key 50 is input. For example, "This is a pe
n. "is used as the search key. The key information is normalized to an integer sequence by the normalization means S3 based on the same normalization method as the learning step for this search key 50. 84 | 104 | 105 | 115 | 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００３３】次に、検索手段Ｓ４において、学習ステッ
プと同一の自己相関計算式f() 、g() を用いて各物件に
対応する正規化された数値列の先頭からｘ、ｙの組の系
列を作成する。次に、このｘ、ｙの組の系列に基づい
て、物件ｋに対する検索キーの含有度数ω_kとして、Ｖ
（ｘ_j,ｙ_j,ｋ）をｊ＝１〜ｍについて合計することによ
り算出する。Next, in the search means S4, a set of x and y from the head of the normalized numerical value sequence corresponding to each property is calculated using the same autocorrelation calculation formulas f () and g () as in the learning step. Create a series. Next, based on the series of the set of x and y, the search key content frequency ω _k for the property k is V
It is calculated by summing (x _j, y _j, k) for j = 1 to m.

【００３４】ただし、Ｖ（ｘ_j,ｙ_j,ｋ）は、物件情報リ
ストが物件ｉについての重みを持つ場合、はその重みに
等しく、持たない場合には０と定める。However, V (x _j, y _j, k) is set to be equal to the weight when the property information list has the weight for the property i, and is set to 0 when the property information list does not have the weight.

【００３５】したがって、検索すべき数値列のｘ、ｙの
組に対応する図４のｘ、ｙの位置にデータがある場合
（棒がある場合）には、別に設けられた記憶手段のその
データに示される物件の通番ｉの格納箇所にその重みの
値を記憶させる。Therefore, when there is data (when there is a bar) at the position of x, y in FIG. 4 corresponding to the set of x, y of the numerical sequence to be searched (when there is a bar), that data of the storage means provided separately. The value of the weight is stored in the storage location of the serial number i of the property shown in FIG.

【００３６】次に、評価結果出力手段Ｓ５において、物
件毎に得られた構造評価値score （合致度）を完全一致
の場合の評価値（この場合は、文字数−ｋ、）で割っ
て、検索キーの含有確率を求め、評価結果のリスト７０
を得る。更にソート手段Ｓ６において、このリスト７０
を含有確率の降順にソートしソート済みリスト８０を得
る。Next, in the evaluation result output means S5, the structure evaluation value score (degree of coincidence) obtained for each property is divided by the evaluation value in the case of perfect match (in this case, the number of characters-k,) and a search is performed. A list 70 of evaluation results for which the key content probability is calculated
To get Further, in the sorting means S6, this list 70
Are sorted in descending order of content probability to obtain a sorted list 80.

【００３７】このソート済みリスト８０が検索結果であ
り、その上位物件を参照することにより、検索キーが物
件中に含まれている確率が高い物件名を知ることができ
る。含有確率は、完全一致及び不完全一致の全てについ
て求まるから、あいまい一致検索を行なうことができ
る。This sorted list 80 is a search result, and by referring to the higher-ranked property, it is possible to know the property name with a high probability that the search key is included in the property. Since the content probability is obtained for all of the perfect match and the incomplete match, the fuzzy match search can be performed.

【００３８】また、検索キーの全情報についての全物件
探索であるから、検索もれが発生する確率は、本質的に
零であると言う特徴がある。Further, since it is a search for all properties for all information of the search key, the probability that a missed search will occur is essentially zero.

【００３９】また、１つの物件に対する検索キーの評価
時間は、キーの文字数のみに依存し、物件の大きさには
依存しない。従って、非常に高速に検索を行なうことが
できる。Further, the evaluation time of the search key for one property depends only on the number of characters of the key and does not depend on the size of the property. Therefore, the search can be performed very quickly.

【００４０】また検索結果のリストどうしの論理演算を
行うことにより、検索条件に対するＡＮＤ、ＯＲなどの
検索演算処理も高速に実行できる。式（１）の自己相関
式は上述の例の他に種々考えることができる。例えば、 f: x→x g: (x,y)→x-y （または｜x-y ｜）とすれば、隣接文字及び一つ置きの文字の差分（または
差分の絶対値）を相関情報として近傍特徴量行列を作る
ことができる。また幾つかの文字列の個々の文字整数値
に対し四則演算を施すことにより近傍特徴量を取り出し
てもよい。By performing a logical operation between the search result lists, search operation processing such as AND and OR for the search condition can be executed at high speed. The autocorrelation equation of the equation (1) can be variously considered in addition to the above example. For example, if f: x → xg: (x, y) → xy (or | xy |), the difference between adjacent characters and every other character (or the absolute value of the difference) is used as correlation information in the neighborhood feature matrix. Can be made. Alternatively, the neighborhood feature amount may be extracted by performing four arithmetic operations on individual character integer values of some character strings.

【００４１】近傍特徴量は、各物件の全データを対象と
し取り出さなくてもよい。例えば、物件データ中の特定
の一つまたは一つ以上の整数値、特定の範囲の整数値、
或いはデータ列を構成する各バイト中の特定の１つまた
は一つ以上のビットを除外して近傍特徴量を捨象しても
よい。また日本語文書のように２バイト文字で構成され
ている場合には、例えば上位バイトを除外して下位バイ
トを対象として近傍特徴量を取り出してもよい。The neighborhood feature amount does not have to be extracted for all data of each property. For example, a specific one or more integer values in property data, an integer value in a specific range,
Alternatively, one or more specific bits in each byte forming the data string may be excluded to eliminate the neighborhood feature amount. In the case of a double-byte character like a Japanese document, for example, the upper byte may be excluded and the lower-order byte may be taken as the target to extract the neighborhood feature amount.

【００４２】上述の例では、近傍特徴量によって生成さ
れる行列は、２５６次のビット行列であり、これは８K
バイトに相当する。従って、１物件のデータが１K バイ
ト程度であるデータベースでは、効率のよいシステムで
あるとは言えない。そこで上記のようなデータ圧縮手段
Ｓ７を設けてデータ圧縮を行なって構造ファイル４０の
容量を減らすのがよい。In the above example, the matrix generated by the neighborhood feature amount is a 256th-order bit matrix, which is 8K.
Equivalent to bytes. Therefore, it cannot be said that a database in which the data for one property is about 1 Kbyte is an efficient system. Therefore, it is preferable to reduce the capacity of the structure file 40 by providing the data compression means S7 as described above to perform data compression.

【００４３】図５にデータ圧縮法の一例を示す。この例
では、２５６次の近傍特徴量行列の各要素毎に要素値が
１である物件名４０ａ（識別コード）を１バイト／件の
データ列として蓄積する。従って、要素値が０である物
件名は不要データとして除外する。FIG. 5 shows an example of the data compression method. In this example, the property name 40a (identification code) whose element value is 1 is stored as a 1-byte / case data string for each element of the 256th-order neighborhood feature amount matrix. Therefore, the property name whose element value is 0 is excluded as unnecessary data.

【００４４】物件数が２５５個以上ある場合には、物件
名４０ａは１バイトで表せないので、下位の１バイトの
みを蓄積する。例えば、物件数が１万件の場合、物件名
は２バイトで表されるが、そのうちの下位１バイトを使
用する。そして物件名コードが２５５を越える毎にデー
タ列にマーカ４０ｂを挿入する。When the number of properties is 255 or more, the property name 40a cannot be represented by 1 byte, so only the lower 1 byte is stored. For example, when the number of properties is 10,000, the property name is represented by 2 bytes, but the lower 1 byte is used. Then, every time the property name code exceeds 255, the marker 40b is inserted into the data string.

【００４５】検索時には、検索キーの近傍特徴量の各々
に該当する構造ファイルのデータ列を取り出し、物件名
毎の出現度数テーブルを作成する。この際、マーカ４０
ｂを越える毎に物件名コードに２５５を加える。このよ
うにして作成した出現度数テーブルに基づいて図２の評
価結果リスト７０が得られる。At the time of search, the data string of the structure file corresponding to each of the neighborhood feature amounts of the search key is taken out, and the appearance frequency table for each property name is created. At this time, the marker 40
Add 255 to the property name code every time it exceeds b. The evaluation result list 70 of FIG. 2 is obtained based on the appearance frequency table created in this way.

【００４６】なお物件名コードのデータ列が例えば全物
件中の半分以上ある場合には、その近傍特徴量行列要素
は各物件について共通であると見なして、その要素を削
除してもよい。When the data string of the property name code is, for example, more than half of all properties, the neighboring feature amount matrix element may be regarded as common to each property and the element may be deleted.

【００４７】上述の実施例において，正規化手段Ｓ１、
学習手段Ｓ２、正規化手段Ｓ３、検索手段Ｓ４、評価結
果出力手段Ｓ５、ソート手段Ｓ６、データ圧縮手段Ｓ７
は、コンピュータプログラムによって構成することがで
きるが、論理回路素子を用いて専用のハードウエアを構
成してもよい。In the above embodiment, the normalizing means S1,
Learning means S2, normalization means S3, search means S4, evaluation result output means S5, sorting means S6, data compression means S7.
Can be configured by a computer program, but dedicated hardware may be configured by using a logic circuit element.

【００４８】[0048]

【発明の効果】本発明の従来の一般的な圧縮符号化方法
に比べて計算量を大幅に節約できるから、高速で圧縮お
よび復号を行うことができる。また、統計量のようなデ
ータ全体にわたるパラメータを必要としないため、デー
タの追加や削除を容易に実施することができる。As compared with the conventional general compression encoding method of the present invention, the amount of calculation can be greatly saved, so that compression and decoding can be performed at high speed. In addition, since a parameter for the entire data, such as a statistic, is not required, it is possible to easily add or delete the data.

[Brief description of drawings]

【図１】本発明による圧縮復号システムの一実施例のデ
ータフロー図である。FIG. 1 is a data flow diagram of an embodiment of a compression decoding system according to the present invention.

【図２】本発明による圧縮復号システムを適用するデー
タベース検索システムのデータフロー図である。FIG. 2 is a data flow diagram of a database search system to which the compression decoding system according to the present invention is applied.

【図３】近傍情報の量子化を示す図である。FIG. 3 is a diagram showing quantization of neighborhood information.

【図４】記憶される情報構造を示す図である。FIG. 4 is a diagram showing a stored information structure.

【図５】圧縮された近傍特徴量のデータ構成図である。FIG. 5 is a data configuration diagram of a compressed neighborhood feature amount.

[Explanation of symbols]

１０検索対象１２除算部１４商記憶比較部１６圧縮数列Ｄ２処理部１８保存部２０正規化データ２２読み取り部２４バイアス記憶部２６加算部２８復元数列Ｄ３保持部３０近傍特徴量行列４０構造ファイル５０検索キー６０正規化キー７０評価結果リスト８０ソート済みリストＳ１正規化手段Ｓ２学習手段Ｓ３正規化手段Ｓ４検索手段Ｓ５評価結果出力手段Ｓ６ソート手段Ｓ７データ圧縮手段 10 search target 12 division unit 14 quotient memory comparison unit 16 compressed sequence D2 processing unit 18 storage unit 20 normalized data 22 reading unit 24 bias storage unit 26 addition unit 28 restored sequence D3 holding unit 30 neighborhood feature matrix 40 structure file 50 search Key 60 Normalized key 70 Evaluation result list 80 Sorted list S1 Normalization means S2 Learning means S3 Normalization means S4 Search means S5 Evaluation result output means S6 Sorting means S7 Data compression means

Claims

[Claims]

1. In a compression and decoding system for integer sequence data arranged in ascending order, division means for dividing integer sequence data arranged in ascending order, and an old quotient obtained by the division means are already stored. A quotient and a quotient, and outputs the difference between these quotients when the obtained quotient is greater than the old quotient; and when the quotient memory comparison means outputs the quotient difference Storage means for storing the remainder obtained by the dividing means together with the quotient difference, and for storing only the remainder obtained by the dividing means when the quotient storage comparing means does not output the quotient difference, A decoding and decoding system for decoding original integer sequence data from the quotient difference and the remainder data stored in the storage means.

2. The ascending integer of claim 1, wherein the quotient memory comparing means outputs the difference between the quotients together with a mark indicating a carry when the obtained quotient is larger than the old quotient. Column data compression and decoding system.

3. A storage unit that stores the neighborhood feature amount for each search target property, and the degree of matching between the search key neighborhood feature amount and the search target neighborhood feature amount is determined for each property, and the property number is matched. The ascending integer sequence data compression and decoding system according to claim 1, wherein the compression and decoding system is used for a database search comprising a search means for outputting in descending order of degree.

4. The quantization amount x for the j-th data string C _{i, j} of the i-th property to be searched and k data strings C _{i, j + 1,} C _{i, j + 2, in the} vicinity thereof _{. ..,} C _{i, j + k} quantized amount y and x = f (C _{i, j} ) y = g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ..,} C _{i, j + k} ), and is used for a database search for storing the serial number i of the property at the position of the storage means determined based on the obtained x and y values. 4. A compression and decoding system for ascending integer sequence data according to claim 3.