KR20080026772A

KR20080026772A - Method for a compression compensating restoration rate of a lempel-ziv compression method

Info

Publication number: KR20080026772A
Application number: KR1020060091759A
Authority: KR
Inventors: 박수홍; 문경기
Original assignee: 인하대학교 산학협력단
Priority date: 2006-09-21
Filing date: 2006-09-21
Publication date: 2008-03-26

Abstract

A compression method compensating the restoration rate of a Lempel-Ziv compression method is provided to shorten a restoration time and compress data irrespective of data size by applying a key-address method, and search for information about data restoration in big-oh capability by referring to key values configured on a hash table. A method for compressing and restoring original data by using a Lempel-Ziv compression method comprises the following steps of: searching for a repeated pattern from pattern information while searching for the pattern information of the original data by using the Lempel-Ziv compression method(S20,S30); configuring a hash table of a key-address method for the repeated pattern if the repeated pattern is searched(S50); and adding data for the restoration of the repeated pattern configured on the hash table to the original data to compress the original data without a loss(S70-S90).

Description

Method for a compression compensating restoration rate of a Lempel-Ziv compression Method}

도 1은 압축과 재구성을 개략적으로 나타내는 구성 블록도.1 is a block diagram schematically illustrating compression and reconstruction.

도 2는 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 나타내는 흐름도.2 is a flowchart illustrating a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention.

도 3은 데이터 수에 따른 복원 시간를 나타내는 그래프.3 is a graph showing the restoration time according to the number of data.

도 4는 포인터 수에 따른 복원 속도를 나타내는 그래프.4 is a graph showing the restoration speed according to the number of pointers.

도 5a 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법에 따른 알고리즘을 도시한 도.5A is a diagram illustrating an algorithm according to a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention.

도 5b 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법에 따른 알고리즘을 도시한 도.5b illustrates an algorithm according to a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention.

도 6은 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 도식화한 도.6 is a diagram illustrating a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention.

본 발명은 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법에 관한 것으로, 더욱 상세하게는 Lempel-Ziv 압축 방법의 복원 속도를 보완하기 위하여, 키-주소 방식을 적용함으로써, 복원에 따른 시간을 단축시키고, 데이터의 크기에 관계없이 압축 및 해쉬 테이블에 구성된 키 값 참조를 통하여 비그오(1)의 성능으로 복원에 관한 정보를 검색할 수 있으며, 무손실 압축 방법으로 원본 데이터에 대한 정확도가 정밀하게 유지되어 데이터 압축 시, 정보의 손실량이 없으므로 높은 복원 속도 및 무손실 압축 방법이 요구되는 문서, 동영상 및 공간 데이터 그리고, MP3 등과 같은 바이너리 포맷에 대하여 효율적으로 적용가능한 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법에 관한 것이다.The present invention relates to a compression method that compensates for the restoration speed of the Lempel-Ziv compression method, and more particularly, to compensate for the restoration speed of the Lempel-Ziv compression method, by applying a key-address method, By retrieving the performance of BIGO (1) through the key value reference configured in the compression and hash table, regardless of the size of the data, the information on the restoration can be retrieved. Compensation speed of the Lempel-Ziv compression method, which can be efficiently applied to binary formats such as document, video and spatial data, and MP3 that requires high restoration speed and lossless compression method because there is no loss of information during data compression. One compression method relates.

일반적으로, 가정용 컴퓨터에서는 텍스트 기반의 응용 프로그램을 사용하여 자료를 다루었지만, 온라인 및 멀티미디어 환경의 발달로 양, 질의 멀티미디어 자료가 증가하고 있으며, 이에 따라 저장 공간의 문제를 해결할 수 있는 방법으로 제시된 데이터 압축은 최소의 비트(Bit)를 이용하여 원래의 정보를 표현함을 목적으로, 작은 형태로 정보를 표현하는 과학 기술이다.[Khalid Sayood, "Introduction to Data Compression", Morgan KaufMann Publishers, 2000,] In general, home computers used text-based applications to handle data. However, due to the development of online and multimedia environments, quantity and quality of multimedia materials are increasing, and the data presented as a way to solve the problem of storage space. Compression is a scientific technique for expressing information in small forms, with the aim of representing the original information using minimal bits. [Khalid Sayood, "Introduction to Data Compression", Morgan Kauf Mann Publishers, 2000,]

도 1은 압축과 재구성을 개략적으로 나타내는 구성 블록도이다. 도면에서 도시하고 있는 바와 같이, 데이터 압축 기술은 주어진 원본 데이터 χ에 대하여 압축된 형태의 χ_c로 변환하는 압축 알고리즘 및 압축된 데이터인 χ_c를 이용하여 원래의 형태 γ로 재구성하는 복원 알고리즘으로 이루어진다.[Khalid Sayood, "Introduction to Data Compression", Morgan KaufMann Publishers, 2000]1 is a block diagram schematically illustrating compression and reconstruction. As shown in the figure, a data compression technique consists of a compression algorithm for converting a given original data χ into a compressed form χ _c and a reconstruction algorithm for reconstructing the original form γ using compressed data χ _c . [Khalid Sayood, "Introduction to Data Compression", Morgan Kauf Mann Publishers, 2000]

그리고, 재구성하는 과정은 원본 데이터 χ와 재구성된 데이터 γ이 정확하게 일치하는 무손실 압축과 χ와 재구성된 데이터 γ이 차이가 있는 손실 압축으로 나누어지며, 일반적으로 손실 압축이 무손실 압축에 비하여 높은 압축 성능을 보인다.[이동헌, “군집화 기법을 이용한 벡터데이터 압축 방법”, 인하대학교 대학원 석사학위논문:2-29, 2005]The reconstruction process is divided into lossless compression in which the original data χ and reconstructed data γ are exactly matched, and lossy compression in which χ and reconstructed data γ are different. Generally, lossy compression has a higher compression performance than lossless compression. Lee Dong-heon, “A Vector Data Compression Method Using Clustering Technique,” Master's Thesis, Inha University Graduate School: 2-29, 2005]

여기서, 무손실 압축 기술은 압축 및 재구성 과정을 수행하면서 정보의 손실없이 원본 데이터로 정확하게 재구성할 수 있는 압축 기술로, 전체 데이터를 사용할 수 없는 정보의 손실이 발생할 경우에 사용되는데, 약간의 오류로 전반적인 결과에 영향을 미치는 텍스트 데이터 및 개별적인 픽셀값이 중요도가 높은 의미를 가지는 위성 영상 데이터의 경우에 적용된다.[전우제, “벡터 데이터의 효율적 갱신을 고려한 압축 기법 연구”, 인하대학교 대학원 석사학위논문:7-9, 2005.]Here, the lossless compression technique is a compression technique that can accurately reconstruct the original data without losing information while performing the compression and reconstruction process, and is used in the case of loss of information that cannot use the entire data. Text data and individual pixel values affecting the results are applied to the case of satellite image data with high significance. [Chun Woo Je, “A Study on Compression Technique Considering Efficient Update of Vector Data”, Master's Thesis, Graduate School of Inha University : 7-9, 2005.]

한편, 손실 압축 기술은 압축 및 재구성 과정을 수행하면서 원본 데이터와 정확하게 재구성할 수 없는 압축 기술로, 정보의 손실 및 왜곡을 포함하더라도 용인될 경우에 사용되는데, 재구성한 데이터가 내용을 전달할 수 있는 수준까지 음질 을 낮추어 데이터의 크기를 줄일 수 있는 목소리의 경우에 적용된다.Lossy compression is a compression technique that cannot be reconstructed correctly from the original data while performing compression and reconstruction. It is used when the loss and distortion of information is acceptable, and the reconstructed data can deliver the contents. This applies to voices that can reduce the size of the data by lowering the sound quality.

따라서, 무손실 압축 기술 및 손실 압축 기술은 데이터의 활용에 따라 다양하게 적용 가능하며, 일례로 공간 데이터 타입을 저장 가능하고, 공간 데이터 및 관련 질의를 처리할 수 있고, 공간 인덱스의 사용과 질의 처리의 최적화가 가능한 데이터 베이스 시스템을 공간 데이터 베이스라고 정의하며[이민우, “임베디드 시스템의 객체 관계형 DBMS에 적합한 공간 인덱스 방법”, 인하대학교 대학원 석사학위논문:26-29, 2005.], OGC(Open Geospatial Consortium)는 공간 데이터 베이스에 대하여 Simple 특징(Feature) Specification For SQL v1.1(OGC, 1999)로 공간 데이터 모델 및 공간 연산자를 정의하고, 공간 데이터 베이스의 스키마(Schema)를 표준적으로 제시하고 있다.Therefore, the lossless compression technique and the lossy compression technique can be variously applied according to the utilization of the data. For example, the lossless compression technique and the lossy compression technique can store spatial data types, process spatial data and related queries, and use spatial indexes and query processing. We define a database system that can be optimized as a spatial database [Lee Min-woo, “Spatial Indexing Method for Object-Relational DBMS of Embedded System”, Inha University Master's Thesis: 26-29, 2005.], OGC (Open Geospatial Consortium) ) Defines the spatial data model and spatial operators in the Simple Feature Specification For SQL v1.1 (OGC, 1999) for a spatial database, and presents the schema of the spatial database as a standard.

Lempel-Ziv(1977)은 데이터 압축 시 데이터가 구성하는 코드 값에 대해 사전을 구성하고 현재 읽고 있는 정보가 구성된 사전에 존재하게 되면 데이터를 압축하는 사전 방식의 압축 알고리즘(Dicitionary Based Compression Algorithm)을 기반으로 한다. Lemepl-Ziv 압축 방법은 실제 그 구현에서 여러 가지 방법으로 구현을 할 수 있지만, 사전을 구성하는 방법에 따라 정적 사전(Static Dictionary)법과 동적 사전(Dynamic Dictionary)법으로 구분을 할 수 있다. Lempel-Ziv (1977) constructs a dictionary for the code values that make up data when compressing the data and based on the Dictionary Based Compression Algorithm, which compresses the data when the information currently being read exists in the configured dictionary. It is done. The Lemepl-Ziv compression method can be implemented in various ways in the actual implementation, but it can be divided into the static dictionary method and the dynamic dictionary method according to how the dictionary is constructed.

정적 사전법은 출현될 것으로 예상되는 코드 값에 대한 사전을 미리 만들어 두어서 사전에 저장된 코드 값이 다시 나올 경우 이미 만든 사전을 참조하여 압축하는 방법이다. 즉, 압축하고자 하는 파일의 내용이 예상 가능한 경우 효율적으로 적용 할 수 있는 방법이다. 동적 사전법은 해당 데이터를 읽어 들이면서 사전을 구 성하기 때문에 코드 값에 대한 참조는 이미 그전에 코드 값에서 출현한 것에 제한이 된다. 동적 사전법은 데이터를 읽어 들이면서 사전을 구성해야 하기 때문에 압축 속도가 느리다는 단점이 있지만, 임의의 데이터에 대해 압축률이 좋다.Static dictionary method is a method of making a dictionary of code values expected to appear in advance, and compressing them by referring to a dictionary already created when the stored code values come out again. In other words, this method can be applied efficiently when the contents of the file to be compressed are foreseeable. Since dynamic dictionaries construct dictionaries by reading the data, references to code values are limited to those already appearing in code values. Dynamic dictionary has the disadvantage of slow compression because it requires the construction of a dictionary while reading data, but it has a good compression rate for arbitrary data.

Lempel-Ziv 압축 방법은 구현하는데 있어서 기본적으로 압축한 코드 값의 복원 정보를 위해 단일 코드(single code)나 더블 코드(double code) 형태의 특수 문자열을 사용하여 데이터를 압축 한다. 특수 문자열은 복원을 위한 정보를 찾기 위해 사용되어지는 문자열로서 파일 내에 존재하지 않는 유일한 코드이다. 특수 문자열을 사용하는 Lempel-Ziv 압축 방법에 대한 예는 다음과 같다. [표 1]은 압축하려는 원본 데이터를 바이너리 모드로 변환을 해서 일련의 코드 형태로 나타낸 것이다. In the Lempel-Ziv compression method, the data is compressed using a special character string in the form of a single code or a double code for reconstruction information of a compressed code value. Special strings are strings that are used to find information for restoration and are the only code that does not exist in the file. Here is an example of a Lempel-Ziv compression method that uses special strings: [Table 1] shows the original data to be compressed and converted to binary mode in the form of a series of codes.

이 경우는 16진수 모드로 원본 데이터를 변환한 것으로 “01 8d 01 bf"의 코드가 3번 반복되는 것을 볼 수 있다. 이 경우를 Lempel-Ziv 압축 방법으로 압축을 하면 [표 2]와 같은 결과를 얻을 수 있다. In this case, the original data is converted to hexadecimal mode, and the code of “01 8d 01 bf” is repeated three times. When this case is compressed using the Lempel-Ziv compression method, the result is as shown in [Table 2]. Can be obtained.

”aa ff"는 압축된 데이터가 시작된다는 의미의 특수 문자열로서 2바이트의 더블 코드로 구성된다. 이 정보는 다음 바이트에 존재하는 코드인 “04 08”이 복원을 하기 위해 필요한 정보임을 알 수 있도록 사용한다. Lempel-Ziv 압축 방법의 자료 구조는 기본적으로 원형 큐를 사용하므로 첫 번째 문자인 ‘aa’가 읽힐 때 레어(rear)의 옵셋(offset) 값은 10이게 된다. 이때 ‘04’ 만큼 옵셋 정보를 빼서 6을 구하게 된다. 따라서 6번째부터 존재하는 데이터를 ‘08’을 참조하여 8번까지 해당하는 코드 값으로 계산을 하여 저장을 하게 된다."Aa ff" is a special string that means that the compressed data starts. It consists of a double-byte double code. This information indicates that the code "04 08", which exists in the next byte, is necessary for recovery. The data structure of the Lempel-Ziv compression method uses circular cues by default, so the offset value of the rare is 10 when the first character 'aa' is read, which is offset by '04'. By subtracting the information, you get 6. Therefore, the existing data from the 6th is counted up to 8 times with reference to '08' and stored.

Hexa CodeHexa code 04 5b 6d 08 2c d5 01 8d 01 bf 01 8d 01 bf 01 8d 01 bf 04 5b 6d 08 2c d5 01 8d 01 bf 01 8d 01 bf 01 8d 01 bf

Hexa CodeHexa code 04 5b 6d 08 2c d5 01 8d 01 bf aa ff 04 08 04 5b 6d 08 2c d5 01 8d 01 bf aa ff 04 08

[표 3]은 Lempel-Ziv 압축 방법으로 압축된 데이터를 다시 원본 데이터로 복원한 것을 나타낸 것으로 데이터의 코드 값이 정확하게 원본 데이터와 일치하는 것을 볼 수 있다. 따라서 Lempel-Ziv 압축 방법은 압축된 데이터가 정확히 원본 데이터로 복원 할 수 있다는 점으로 인해 무손실 압축 방법으로 분류할 수 있다. [Table 3] shows the restoration of the compressed data back to the original data by the Lempel-Ziv compression method. It can be seen that the code values of the data exactly match the original data. Therefore, Lempel-Ziv compression method can be classified as lossless compression method because compressed data can be restored exactly to original data.

그러나, Lempel-Ziv 압축 방법은 복원 시 처음부터 끝까지 압축 된 데이터를 읽어들인 과정을 통해 특수 문자열을 찾아야만 데이터의 복원이 가능하다. 즉, Lempel-Ziv 압축 방법은 O(N)의 알고리즘 성능으로 특수 문자열을 검색하므로 복원에 걸리는 시간이 데이터의 크기와 비례해서 증가되는 문제점을 갖게 된다.However, in the Lempel-Ziv compression method, data can be restored only by finding a special string through the process of reading the compressed data from the beginning to the end. That is, the Lempel-Ziv compression method searches for a special character string with O (N) algorithm performance, so that the time required for restoration increases in proportion to the size of the data.

본 발명은 상기한 문제점을 해결하기 위하여 안출한 것으로, Lempel-Ziv 압축 방법의 복원 속도를 보완하기 위하여, 키-주소 방식을 적용함으로써, 복원에 따른 시간을 단축시키고, 데이터의 크기에 관계없이 압축 및 해쉬 테이블에 구성된 키 값 참조를 통하여 비그오(1)의 성능으로 복원에 관한 정보를 검색할 수 있으며, 무손실 압축 방법으로 원본 데이터에 대한 정확도가 정밀하게 유지되어 데이터 압축 시, 정보의 손실량이 없으므로 높은 복원 속도 및 무손실 압축 방법이 요구되는 문서, 동영상 및 공간 데이터 그리고, MP3 등과 같은 바이너리 포맷에 대하여 효율적으로 적용가능한 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem. In order to compensate for the restoration speed of the Lempel-Ziv compression method, the key-address method is applied to reduce the time required for restoration and to compress the data regardless of the size of the data. And retrieval information by the performance of BIGO (1) through the reference of the key value configured in the hash table.In the lossless compression method, the accuracy of the original data is maintained precisely, It is therefore an object of the present invention to provide a compression method that compensates for the restoration speed of the Lempel-Ziv compression method, which can be efficiently applied to binary formats such as MP3, document, video and spatial data requiring high recovery speed and lossless compression method. .

상기한 바와 같은 목적을 달성하기 위하여 본 발명은 Lempel-Ziv를 이용하여 원시 데이터를 압축 및 복원하는 방법에 있어서, 원시 데이터의 패턴 정보를 검색하면서 그 패턴 정보에서 중복되는 패턴을 검색하는 단계; 상기 검색에서 중복되는 패턴이 검색되면 그 중복되는 패턴에 대한 키-주소 방식의 해쉬 테이블을 구성하는 단계; 상기 원시 데이터에 상기 해쉬 테이블에 구성된 중복되는 패턴의 복원을 위한 데이터를 부가하여 상기 원시 데이터를 무손실 압축하는 단계를 포함하여 이루어지는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a method of compressing and restoring raw data using Lempel-Ziv, the method comprising: searching for duplicate patterns in the pattern information while searching for pattern information of the raw data; If a duplicate pattern is found in the search, constructing a hash table of a key-address method for the duplicated pattern; And lossless compressing the raw data by adding data for restoring a duplicated pattern configured in the hash table to the raw data.

그리고, 상기 중복 패턴 검색 단계에서 사용하는 검색 방식은 해쉬 테이블과 해쉬 체인을 사용하는 Lempel-Ziv의 기본적인 압축 방법을 사용하는 것을 특징으로 한다.The retrieval method used in the duplicate pattern retrieval step uses Lempel-Ziv's basic compression method using a hash table and a hash chain.

하지만, Lomepl-Ziv의 압축 방법을 보완하기 위해 Lempel-Ziv의 압축 방법으로 검색된 중복된 패턴의 위치를 링크드 리스트에 저장하여 이를 상기 원시 데이터의 헤더 부분에 저장하고, 복원시 상기 키-주소 방식의 해쉬 테이블로 구성한다. 이 때 해쉬 테이블에는 상기 검색된 중복 패턴의 복원을 위한 데이터가 저장 위치를 키 값으로 저장하는 것을 특징으로 한다.However, in order to supplement Lomepl-Ziv's compression method, the location of the duplicated pattern retrieved by Lempel-Ziv's compression method is stored in the linked list and stored in the header part of the raw data, It consists of a hash table. At this time, the hash table stores the storage location of the data for restoring the found duplicate pattern as a key value.

더불어, 상기 중복되는 패턴의 복원을 위한 데이터에는 해당되는 중복 패턴의 데이터가 저장된 위치와 중복 패턴의 데이터의 크기에 대한 데이터가 포함되는 것을 특징으로 한다.In addition, the data for restoring the overlapping pattern may include data about a location where the data of the corresponding overlapping pattern is stored and the size of the data of the overlapping pattern.

여기서, 상기 Lempel-Ziv 를 이용한 압축 및 복원 방법을 이용하여 공간 데이터를 압축 및 복원하는 경우, 공간 데이터에서 디퍼렌셜 벡터와 시작점 좌표를 추출하여 벡터 데이터로 변환하는 단계를 더 포함하고, 상기 변환된 벡터 데이터의 패턴 정보를 검색하면서 그 패턴 정보에서 중복되는 패턴을 검색하는 것을 특징으로 한다.Here, when compressing and restoring the spatial data using the compression and decompression method using the Lempel-Ziv, the method further comprises the step of extracting the differential vector and the starting point coordinates from the spatial data into vector data, the converted vector Searching for the pattern information of the data, characterized in that for searching for a duplicate pattern in the pattern information.

이하, 본 발명에 따른 실시예를 첨부된 예시도면을 참고로 하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 나타내는 흐름도이다. 도시된 바와 같이, 공간 데이터의 경우를 예로써 설명하며, 벡터 데이터의 압축을 수행하는 과정 및 역의 과정을 통하여 원본 데이터로 재구성하는 과정을 나타낸다.2 is a flowchart illustrating a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention. As shown, a case of spatial data is described as an example, and a process of performing compression of vector data and a process of reconstructing original data through an inverse process is shown.

벡터 데이터 압축 과정은 설계에 이용될 데이터 모델을 선정하는 것으로부터 시작되며, 선정된 모델을 이용하여 공간 좌표계에서 객체의 절대적인 위치를 나타낼 수 있는 시작점 좌표 및 시작점을 기준으로 상대적인 위치를 표현하는 디퍼런셜 벡터(Differential Vector)로 나눈다(S10)The vector data compression process begins with selecting a data model to be used in the design, and using the selected model, a differential vector representing the starting point coordinates that can represent the absolute position of the object in the spatial coordinate system and the relative position relative to the starting point. Divide by (Differential Vector) (S10)

상기 단계(S10)에서, 데이터 압축 과정은 공간 데이터를 저장하기 위한 8 바이트(Byte) 처리 구조이므로, 공간 데이터를 위하여 최소화된 정보를 저장하기 위하여 OGC(Open Geospatial Consortium)의 기하 모델을 수정하여 사용하였다.In the step S10, since the data compression process is an 8-byte processing structure for storing spatial data, the geometric model of the Open Geospatial Consortium (OGC) is modified and used to store information minimized for spatial data. It was.

또한, 벡터 데이터에 대한 변환 과정으로부터 산출된 디퍼런셜 벡터(Differential Vector)에 대하여 패턴 정보를 검색한다(S20).In addition, the pattern information is searched for the differential vector calculated from the conversion process for the vector data (S20).

여기서, 산출된 패턴 정보를 Lempel-Ziv의 동적 사전(Dynamic Dictionary)법을 적용하여 반복되는 패턴에 대한 정보를 검색한 후(S30), 재압축을 실행한다(S40).Here, after applying the Lempel-Ziv dynamic dictionary method to the calculated pattern information to search for information about the repeated pattern (S30), and recompression (S40).

이때, 복원되는 속도의 향상을 위하여 패트 캐랙터(Pad Character)를 압축된 위치에 삽입하는 Lempel-Ziv 압축 방법을 사용하지 않고, 해쉬 테이블(Hash Table)에 패턴이 위치하고 있는 옵셋(offset) 값을 저장하는데, 이는 본 발명의 Lemepl-Ziv 압축 방법의 복원 속도를 보완한 압축 방법으로 복원시 비그오(1)의 성능으로 압축 정보를 검색하여 복원에 대한 속도를 높여주는 역할을 수행한다.At this time, the offset value where the pattern is located is stored in the hash table without using the Lempel-Ziv compression method that inserts a pad character into a compressed position to improve the speed of restoration. This is a compression method that compensates for the decompression speed of the Lemepl-Ziv compression method of the present invention, and plays a role of increasing the speed of decompression by retrieving the compression information with the performance of BIGO (1).

따라서, 중복 저장으로 인한 필요한 비용의 최소화가 요구되기 때문에, 디퍼 런셜 벡터(Differential Vector)에 대하여 중복이 되는 패턴 정보가 존재한다면 재압축을 하고(S40), 재압축으로부터 복원까지의 비용을 최소화하기 위하여 패드 캐릭터(Pad Character) 대신에 해쉬 테이블(Hash Table)로 복원에 관한 정보를 저장한다(S50).Therefore, since minimization of the necessary cost due to redundant storage is required, if there is overlapping pattern information for the differential vector, recompression is performed (S40), and the cost from recompression to restoration is minimized. In order to store the information about the restoration in a hash table instead of the pad character (S50).

그리고, 디퍼런셜 벡터(Differential Vector)의 길이에 대하여 반복된 패턴 정보의 유, 무를 검색하여 압축을 하게 되는데, 키-주소(Key-Address)방법으로 복원 속도를 향상시키기 위한 정보를 헤더 정보로 관리하게 된다.In addition, it searches and compresses the presence or absence of repeated pattern information with respect to the length of the differential vector, and manages information for improving the restoration speed by using the key-address method as header information. do.

이러한 일련의 과정을 통하여 압축된 데이터는 공간 객체의 절대적인 위치를 표현하는 객체의 시작점을 기준으로 산출한 디퍼런셜 벡터(Differential Vector)를 압축된 데이터에 대하여 중복된 정보가 존재한다면, 이를 제거한 압축 데이터가 최종적으로 압축된 데이터이다.Compressed data through such a series of processes, if there is overlapping information on the compressed data of the differential vector calculated based on the starting point of the object representing the absolute position of the spatial object, the compressed data is removed. Finally compressed data.

상기 단계(S40)의 제안된 압축 방법은 압축된 데이터를 읽어들이면서 반복되는 정보 및 데이터를 검색하고, 재압축을 하는 방법을 사용하는데, 선형 검색을 통하여 압축을 하기 때문에 속도의 감소를 가져올 수 있지만, 중복되는 정보를 찾을 수 있으며, 복원에 관한 정보를 해쉬 테이블(Hash Table)로 관리하기 때문에 복원 시, 압축에 관한 정보를 비그오(1)의 속도로 검색하여 복원에 관한 속도를 증가시킬 수있다The proposed compression method of step S40 uses a method of retrieving repeated information and data while reading compressed data, and recompressing it, which can reduce the speed due to compression through linear search. However, since duplicate information can be found and the information on the restoration is managed by a hash table, the information on compression can be searched at the speed of BIGO (1) to increase the speed of the restoration. Can be

그리고, 압축이 완료된 데이터를 압축 과정의 역과정을 통하여 원본 데이터로 재구성 가능하고(S70, S80, S90), 해쉬 테이블(Hash Table)에 저장된 압축의 위치 정보는 복원 단계에서 비그오(1)의 성능으로 압축 정보를 찾아낸다.The compressed data can be reconstructed into original data through the reverse process of the compression process (S70, S80, S90), and the position information of the compression stored in the hash table is stored in the bitio (1) in the restoration step. Find compression information by performance.

상기 과정을 실행하면, 반복되는 정보를 검색하여, 그 위치에 대한 길이만큼 복원을 함으로써, 디퍼런셜 벡터(Differential Vector)와 공간 객체의 절대적인 위치 좌표를 갖는 객체의 시작점을 이용하여(S80), 원래의 공간 객체로의 재구성이 가능하다(S90).By executing the above process, by retrieving the repeated information and restoring by the length of the position, by using the differential vector and the starting point of the object having the absolute position coordinates of the spatial object (S80), Reconstruction to a spatial object is possible (S90).

다음은 본 발명의 실험 실시예에 대하여 살펴본다.The following looks at the experimental example of the present invention.

본 발명의 성능 평가를 위하여, Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법의 성능을 정량적으로 평가하고, 종래 기술과의 비교를 통하여 타당성 및 적용 가능성을 평가하기 위하여, 데이터 수에 따라 구분되는 모든 도엽(圖葉)이 구축된 전체 데이터와 특정 지역의 도엽(圖葉) 데이터를 실험에 적합한 데이터로 선정하고, 선정된 두 종류의 데이터에 종래 기술에 따른 압축 방법 및 본 발명의 압축 방법을 적용하여 적용된 압축 방법에 대한 압축률 및 질의 처리 시간을 비교 분석하여 결과를 도출한다.In order to evaluate the performance of the present invention, in order to quantitatively evaluate the performance of the compression method that complements the restoration speed of the Lempel-Ziv compression method, and to evaluate the validity and applicability through comparison with the prior art, it is divided according to the number of data. The entire data in which all the leaf maps are constructed and the map leaf data of a specific region are selected as data suitable for the experiment, and the compression method according to the prior art and the compression method of the present invention are applied to the selected two types of data. We compare the compression rate and query processing time for the applied compression method and derive the result.

본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법은 Lempel-Ziv 압축 방법을 적용할 시에 사용되는 패드 캐릭터(Pad Character)를 변형한 키-주소(Key-Address)방식의 해쉬(Hash Table)을 사용하는 방법으로써, 비그오(1)의 성능을 가지는 Lempel-Ziv 압축 방법의 복원 속도를 데이터의 크기와 무관하게 비그오(1)의 성능을 가지게 한다.The compression method that complements the decompression speed of the Lempel-Ziv compression method of the present invention is a key-address hash of a pad character used when applying the Lempel-Ziv compression method. By using a hash table), the recovery speed of the Lempel-Ziv compression method having the performance of the vig 1 is made to have the vig 1 performance regardless of the size of the data.

표 12는 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법으로 저장한 것이며, 표 12의 구조는 패턴 정보가 시작되는 정보의 위치를 헤더에 저장하게 되는데, 첫 번째에 위치한 16진수 코드(Hexa Code)인 "Ob"는 10진수로 변환하면 11이되고, 레코드 부분의 11번째 위치를 가리키게 되면 2 바이트(Byte) 단위로 데이터를 읽어 패턴이 시작되는 위치인 "00 07"을 읽게되며, "00 07"의 다음 데이터인 "08"만큼의 데이터를 복원시켜 원본 데이터로 복원하게 된다.Table 12 is a compression method that compensates for the restoration speed of the Lempel-Ziv compression method. The structure of Table 12 stores the location of the information where the pattern information starts, in the header. Code) "Ob" is 11 when converted to decimal, and when it points to the 11th position of the record part, it reads data in 2 byte units and reads "00 07" where the pattern starts. 00 data of "08", which is the next data of "07", is restored and restored to the original data.

여기서, 해쉬 키(Hash Key)를 저장하는 헤더 부분으로 인하여 종래의 Lempel-Ziv 압축 방법에 비하여 1 바이트(Byte) 내지 2 바이트(Byte)의 크기를 더 포함하지만, 비그오(1)의 성능으로 저장된 정보를 찾을 수 있다.Here, due to the header portion for storing the hash key (Hash Key), compared to the conventional Lempel-Ziv compression method further includes a size of 1 byte (Byte) to 2 bytes (Byte), but due to the performance of the bitio (1) You can find the stored information.

Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법은 Lempel-Ziv 압축 방법으로 구축한 실험데이터와 동일한 연구에서 제안하는 방법에 대한 실험은 Lempel-Ziv 압축 방법으로 구축한 실험 데이터와 동일한 과정으로 구축하였다.The compression method that complements the restoration speed of the Lempel-Ziv compression method is the same as the experimental data constructed with the Lempel-Ziv compression method. It was.

마지막으로, Lempel-Ziv 압축 방법과 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 이용한 실험에 대하여 결과를 분석한다.Finally, the results are analyzed for the experiments using the Lempel-Ziv compression method and the compression method complementing the restoration speed of the Lempel-Ziv compression method of the present invention.

Lempel-Ziv 압축 방법과 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법의 압축률을 비교한다.The compression rate of the Lempel-Ziv compression method and the compression method that compensates for the recovery speed of the Lempel-Ziv compression method of the present invention are compared.

여기서, 인천 광역시와 수도권 일대의 지역에 대하여 공간 질의를 하였을 때, 이를 복원하여 원본 데이터로 복구되기까지 소요되는 시간을 포인트 수와 데이터 수에 따라 분석한다.Here, when a spatial query is made for the area of Incheon Metropolitan City and the metropolitan area, the time required to restore the original data to the original data is analyzed according to the number of points and the number of data.

도 3은 데이터 수에 따른 복원 시간을 나타내는 그래프이다. 도시된 바와 같이, 질의가 발생할 시 검색 대상이 되는 데이터 수에 따른 복원 시간을 나타내며, 복원 시간으로부터 복원 속도를 산출하면, 종래의 Lempel-Ziv 압축 방법이 복원까지 소요되는 질의 처리 시간이 데이터 수가 증가할수록 큰 폭으로 증가함을 알 수 있다.3 is a graph showing a restoration time according to the number of data. As shown, when the query occurs, it represents the restoration time according to the number of data to be searched, and if the restoration speed is calculated from the restoration time, the query processing time required until the restoration of the conventional Lempel-Ziv compression method increases the number of data. It can be seen that the larger the increase.

본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법은 데이터 수가 증가할수록 소요되는 질의 처리 시간이 증가하지만, 종래의 Lempel-Ziv 압축 방법보다 복원 시간이 낮고, 이에 따라 복원 속도가 높음을 나타낸다.In the compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention, the query processing time increases as the number of data increases, but the recovery time is lower than that of the conventional Lempel-Ziv compression method. Indicates.

도 4는 포인트 수에 따른 복원 시간을 나타내는 그래프이다. 도시된 바와 같이, 질의가 발생할 시 검색 대상이 되는 포인트 수에 따른 복원 시간을 나타내며, 복원 시간으로부터 복원 시간을 산출하면, Lempel-Ziv 압축 방법보다 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법이 평균 70%정도 질의에 대한 복원 속도가 향상되었다.4 is a graph showing the restoration time according to the number of points. As shown, when the query occurs, it represents the restoration time according to the number of points to be searched, and calculating the restoration time from the restoration time compensates for the restoration speed of the Lempel-Ziv compression method of the present invention rather than the Lempel-Ziv compression method. One compression method improves the recovery speed for queries by an average of 70%.

이하, 본 실시예에 따른 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법의 과정을 설명한다.Hereinafter, a process of a compression method that compensates for the restoration speed of the Lempel-Ziv compression method according to the present embodiment will be described.

도 5a 및 도 5b는 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법에 따른 알고리즘을 도시한 도이다. 도면에서 도시된 바와 같이, Lempel-Ziv 압축 방법을 적용함에 있어서, 특수 문자열을 사용하지 않고 해쉬 테이블(Hash Table)을 구성할 수 있도록 헤더 정보를 생성한다.5A and 5B are diagrams illustrating an algorithm according to a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention. As shown in the figure, in applying the Lempel-Ziv compression method, header information is generated so that a hash table can be constructed without using a special string.

여기서, 도 5a를 참조하며, 1행에 도시된 바와 같이, Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법은 메모리 기반의 압축 방법이므로, 벡터 데이터를 압축하기 위해서는 데이터 및 길이 정보가 요구되므로, 압축된 데이터가 저장하게 될 변수 및 압축된 데이터의 길이가 저장하게 될 변수를 인자로 받는 것이 필요하다.Here, referring to FIG. 5A, as shown in row 1, since the compression method that compensates for the decompression speed of the Lempel-Ziv compression method is a memory-based compression method, data and length information are required to compress vector data. In other words, it is necessary to take as arguments a variable to store the compressed data and a variable to store the length of the compressed data.

여기서, 2행에 도시된 바와 같이, 벡터 데이터를 구성하고 있는 각각의 객체는 반복되는 정보의 수가 다르므로, 동적으로 헤더 정보를 구성하기 위하여 링크드 리스트(Linked List)를 초기화하며, 3행에 도시된 바와 같이, 패턴 정보를 저장하는 변수와 변수의 길이 정보를 0으로 할당하여 초기화시킨다.Here, as shown in row 2, since each object constituting the vector data has a different number of repeated information, the linked list is initialized to dynamically configure the header information, and shown in row 3 As shown, the variable storing the pattern information and the length information of the variable are allotted to 0 and initialized.

그리고, 4행에 도시된 바와 같이, 큐에 압축할 데이터의 첫 바이트(Byte)를 저장하고, 5행에 도시된 바와 같이, 인자로 받은 압축할 데이터의 크기만큼 루프(Loop)를 실행시키며, 6행 내지 7행에 도시된 바와 같이, 현재 큐의 위치를 계산하는데 있어서 0의 값이 나오면 레이어(layer)값을 1 증가 시킨다.As shown in row 4, the first byte of data to be compressed is stored in the queue, and as shown in row 5, a loop is executed as much as the size of the data to be compressed as a parameter. As shown in rows 6 to 7, if a value of 0 is obtained in calculating the position of the current cue, the layer value is increased by one.

만약, 9행 내지 12행과 같이, 데이터에 대한 패턴 정보를 검색하기 위하여 사용되는 큐가 오버 플로우(Overflow)를 발생시키면, 현재 저장된 패턴(Pattern)의 값을 큐의 프런트(Front)와 비교를 하고, 동일하면 데이터를 압축한다.If the queue used to retrieve pattern information about the data overflows, as in rows 9 to 12, the value of the currently stored pattern is compared with the front of the queue. If it is the same, the data is compressed.

그래서, 13행과 같이, 프런트에 해당하는 큐의 값을 해쉬 테이블(Hash Table)에서 삭제한 후, 14행과 같이, 큐에서도 삭제한다.Thus, as in line 13, the value of the queue corresponding to the front is deleted from the hash table, and then in the queue, as in line 14.

다음으로, 데이터를 읽는 과정 중에서, 한 바이트(Byte) 단위로 데이터를 읽어 큐에 저장한 정보는 현재 읽은 데이터의 값이 존재 유, 무를 파악해야하는데, 이를 위하여 17행과 같이, 큐에 저장된 데이터에서 현재 비교하려는 데이터 옵셋(offset) 값을 구하고, 18행과 같이, 큐에 저장된 값을 대상으로 현재 읽은 바이트(Byte)의 존재 유, 무를 파악하며, 현재 읽은 바이트(Byte)가 전의 바이트(Byte) 중에 존재하지 않으면, 20행과 같이, 해쉬 테이블(Hash Table)과 큐에 저장한다.Next, during the data reading process, the information read in data by byte unit and stored in the queue should know whether there is a value of the currently read data, and for this purpose, the data stored in the queue as shown in line 17. Obtains data offset value to compare currently and checks the existence or absence of currently read byte for the value stored in the queue as in line 18, and currently read byte is the previous byte. If it does not exist, it is stored in the hash table and queue, as in line 20.

한편, 전에 읽은 바이트(Byte)중에서 현재 읽은 바이트(Byte)에 해당하는 패턴이 발견되면, 현재 비교중인 큐에서 읽어들일 예정 바이트(Byte)를 검색해야하는데, 22행과 같이, 큐에서 매칭되는 질의 값을 가산하고, 큐에서 검색한 정보를 저장하며, 24행과 같이, 큐에 예정 바이트가 있는 데이터를 저장하고, 루프를 재실행한다.On the other hand, if a pattern corresponding to the currently read byte is found among the previously read bytes, the expected byte to be read from the queue currently being compared should be searched. Add the values, store the information retrieved from the queue, store the data with the expected bytes in the queue, as in line 24, and rerun the loop.

만약, 25행과 같이, 전에 읽은 바이트(Byte)가 큐에 저장되었던 정보라면, 26행과 같이 이전 큐에 저장되었던 바이트의 중복 저장 존재 유, 무를 검사하며, 도 19b의 27행 내지 31행에 도시된 바와 같이, 매칭되는 길이의 값이 0일때와 동일한 과정을 진행시켜 결과값을 처리한다.If, as in line 25, the previously read byte is information stored in the queue, the presence or absence of duplicate storage of the bytes stored in the previous queue as in line 26 is checked, and in lines 27 to 31 of FIG. 19B. As shown, the same process as when the value of the matching length is 0 is processed to process the result value.

또한, 도 19b의 33행 내지는 37행과 같이, 해쉬 테이블(Hash Table) 구조를 구성하기 위하여, 압축 정보에 대한 위치를 키 값을 이용하여 헤더 정보를 생성시킬 수 있으며, 이를 위하여 데이터 압축 시 복원을 위한 위치 정보를 링크드 리스트(Linked List)에 저장해야한다.In addition, as shown in rows 33 to 37 of FIG. 19B, in order to form a hash table structure, header information may be generated using a key value for the location of the compressed information, and for this, it is restored during data compression. We need to store the location information for the linked list.

그리고, 도 5b의 45행과 같이, 압축한 데이터에서 헤더 정보를 파악하여, 해쉬 테이블(Hash Table)을 구성하고, 46행과 같이, 헤더 정보의 크기에 따라 데이터를 이동시키며, 47행과 같이, 해쉬 테이블(Hash Table)의 키 값으로 인덱스 정보를 산출 및 이용하여 해쉬 테이블(Hash Table)의 키 값에 대한 복원 정보를 검색한다.As shown in row 45 of FIG. 5B, the header information is identified from the compressed data to form a hash table, and as shown in row 46, the data is moved according to the size of the header information. In addition, the index information is calculated and used as the key value of the hash table to retrieve the restoration information for the key value of the hash table.

마지막으로, 반복되는 패턴이 위치하고 있는 위치 값과, 그 위치부터 반복 되는 길이를 이용하여 현재 위치하고 있는 바이트에서(Byte)에서 해쉬 테이블(Hash Table)의 키 값이 가리키는 위치 값의 데이터를 반복되는 길이만큼 저장시킨다.Finally, the length of repeating the data of the position value indicated by the key value of the hash table in the current byte, using the position value where the repeating pattern is located and the repeating length from the position. Save as much.

도 6은 본 발명의 Lempel-Ziv 압축 방법의 복원 속도를 보완한 압축 방법을 도식화한 도이다. 도면에서 도시된 바와 같이, Lempel-Ziv 압축 방법을 적용 시, 복원 속도를 높이기 위하여, 키-주소(Key-Address) 방식으로 보완하고, 데이터 크기와 관계없이 압축된 만큼 해쉬 테이블에 구성된 키 값을 참조하여 비그오(1)의 성능으로 복원에 관한 정보를 검색할 수 있다.6 is a diagram schematically illustrating a compression method that compensates for the restoration speed of the Lempel-Ziv compression method of the present invention. As shown in the figure, when applying the Lempel-Ziv compression method, in order to increase the recovery speed, the key-address method is supplemented, and the key values configured in the hash table are compressed as much as the data size is compressed. With reference to this, the performance of the BIGO 1 can be used to retrieve information about the restoration.

이상에서 설명한 바와 같이 상기와 같은 구성을 갖는 본 발명은 Lempel-Ziv 압축 방법의 복원 속도를 보완하기 위하여, 키-주소 방식을 적용함으로써, 복원에 따른 시간을 단축시키고, 데이터의 크기에 관계없이 압축 및 해쉬 테이블에 구성된 키 값 참조를 통하여 비그오(1)의 성능으로 복원에 관한 정보를 검색할 수 있으며, 무손실 압축 방법으로 원본 데이터에 대한 정확도가 정밀하게 유지되어 데이터 압축 시, 정보의 손실량이 없으므로 높은 복원 속도 및 무손실 압축 방법이 요구되는 문서, 동영상 및 공간 데이터 그리고, MP3 등과 같은 바이너리 포맷에 대하여 효율적으로 적용 가능한 효과를 거둘 수 있다.As described above, the present invention having the configuration as described above shortens the time required for restoration and compresses the data regardless of the size of the data by applying a key-address scheme to compensate for the restoration speed of the Lempel-Ziv compression method. And retrieval information by the performance of BIGO (1) through the reference of the key value configured in the hash table.In the lossless compression method, the accuracy of the original data is maintained precisely, Therefore, it is possible to efficiently apply to binary formats such as document, video and spatial data, and MP3 which require high restoration speed and lossless compression method.

이상에서는 본 발명의 바람직한 실시예를 예시적으로 설명하였으나, 본 발명의 범위는 이같은 특정 실시예에만 한정되지 않으며 해당 분야에서 통상의 지식을 가진자라면 본 발명의 특허 청구 범위내에 기재된 범주 내에서 적절하게 변경이 가능 할 것이다.Although the preferred embodiments of the present invention have been described above by way of example, the scope of the present invention is not limited to such specific embodiments, and those skilled in the art are appropriate within the scope described in the claims of the present invention. It will be possible to change.

Claims

In the method of compressing and restoring raw data using Lempel-Ziv,

Retrieving pattern information of the raw data using an existing Lempel-Ziv compression method and searching for duplicate patterns in the pattern information;

If a duplicate pattern is found in the search, constructing a hash table of a key-address method for the duplicated pattern;

And lossless compressing the raw data by adding data for restoring a redundant pattern configured in the hash table to the raw data.

The method of claim 1,

In the overlapping pattern search step, the overlapping pattern forms a hash table and a hash chain, and searches for the overlapping pattern using a circular queue method.

The method of claim 1,

The hash table of the key-address method is stored in a header portion of the raw data, and the hash table stores Lempel-Ziv as data for restoring the retrieved duplicated pattern as a key value. Compression and decompression method.

The method of claim 1,

The data for restoring the overlapping pattern includes data about the location where the data of the corresponding overlapping pattern is stored and the size of the data of the overlapping pattern.

The method of claim 1,

When compressing and restoring spatial data using the compression and decompression method using the Lempel-Ziv,

And extracting the differential vector and the starting point coordinates from the spatial data and converting them into vector data.

Compression and decompression method using Lempel-Ziv, characterized in that for retrieving the pattern information of the transformed vector data, the overlapping pattern in the pattern information.