KR20160021004A

KR20160021004A - Binary data compression and restoration method and apparatus

Info

Publication number: KR20160021004A
Application number: KR1020140153197A
Authority: KR
Inventors: 김정훈
Original assignee: 김정훈
Priority date: 2014-08-15
Filing date: 2014-11-05
Publication date: 2016-02-24
Also published as: KR101632115B1

Abstract

The present invention relates to a method for compressing binary data, which is performed by a device for compressing binary data. The method for compressing binary data comprises the steps of: scanning original binary data; acquiring a plurality of general clusters by dividing the scanned original binary data into the general clusters; generating a mapping dictionary which defines the corresponding relationship between each value of the general clusters and a universal code; and generating compressed data including the universal codes from the original binary data by referring to the mapping dictionary, wherein the general cluster represents a binary number including ″10″ encountered while moving in a direction from the least significant bit to the high-order bits of the original binary data and a binary number between the ″10″ and ″10″ encountered just before the ″10″, and the universal code includes ″10″ of a most significant bit, zero or more consecutive ″1″s, and at least zero ″0″ disposed between the ″10″ of the most significant bit and the consecutive ″1″s.

Description

TECHNICAL FIELD [0001] The present invention relates to a binary data compression and restoration method and apparatus,

본 발명은 이진 데이터의 압축 및 복원 방법과 장치에 관한 것으로서, 보다 구체적으로는 간단한 연산과 하드웨어적 구성을 통해 이진 데이터를 효과적이고 효율적으로 압축하고 복원할 수 있을 뿐만 아니라 데이터 전송 속도와 효율도 향상시킬 수 있는 이진 데이터의 압축 및 복원 방법과 장치에 관한 것이다.
The present invention relates to a method and apparatus for compressing and restoring binary data, and more particularly, to an apparatus and method for efficiently and efficiently compressing and restoring binary data through a simple operation and a hardware configuration, And more particularly to a method and apparatus for compressing and restoring binary data.

일반적으로, 통상의 전송 채널에서 이용 가능한 주파수 대역폭은 제한되어 있으므로, 많은 양의 데이터를 전송하기 위해서 모뎀과 같은 다양한 전송 시스템은 전송 데이터의 양을 압축하거나 줄일 수 있는 효과적인 데이터 압축 기법을 이용해 왔다.In general, since the frequency bandwidth available in a normal transmission channel is limited, various transmission systems such as a modem have used an effective data compression technique to compress or reduce the amount of transmission data in order to transmit a large amount of data.

다양한 압축기법 중의 하나로서, 국제 전기 통신 동맹(ITU : International Telecommunication Union)에 의해 표준화된 부호화 알고리즘으로, 모뎀과 같은 데이터 전송 시스템에서 채용하고 있는 CCITT V.42 bis 가 있다. 이 부호화 표준안에 적용된 기초는 Ziv-Lempel code(ZLC)이며, 이 방식은 입력 데이터로부터 적응적으로 사전을 형성해 가면서 앞의 입력 데이터와 동일한 구문(phrase)이 저장되어 있는 사전의 주소값을 부호어로 전송하는 방법이다. 사전화(dictionary) 작업은 입력 데이터와 계속적인 스트링 매칭(string matching)을 수행하여 최대 길이의 매칭 스트링에 매칭안된 문자를 결합하여 사전에 추가하는 과정으로 사전을 업데이트한다.One of the various compression schemes is the CCITT V.42 bis employed in a data transmission system such as a modem with a coding algorithm standardized by the International Telecommunication Union (ITU). The basis applied to this coding standard is a Ziv-Lempel code (ZLC). In this method, an address value of a dictionary storing the same phrase as the previous input data is formed as a codeword while adaptively forming a dictionary from the input data. Lt; / RTI > The dictionary operation performs a continuous string matching with the input data to update the dictionary by adding the unmatched characters to the maximum matching string and adding them to the dictionary.

그러나, 이러한 종래의 압축 방식은 데이터의 압축 및 복원에 대한 처리 연산이 복잡하고 비교적 고사양의 하드웨어적 장치를 필요로 하며, 처리 속도의 향상에 제한이 따르고 압축 결과값에 대한 신뢰성을 높이기 힘든 문제점이 있었다.
However, such a conventional compression method requires complicated processing of data compression and decompression, requires a relatively high-performance hardware device, limits the improvement of the processing speed, and increases the reliability of the compression result value there was.

본 발명의 배경기술은 대한민국 공개특허공보 제 1999-0022960호(1999. 3. 25 공개)에 개시되어 있다.
The background art of the present invention is disclosed in Korean Patent Laid-Open Publication No. 1999-0022960 (published on Mar. 25, 1999).

본 발명이 이루고자하는 기술적 과제는, 간단한 연산과 하드웨어적 구성을 통해 이진 데이터를 신속하고 효율적으로 압축하고 복원할 수 있고, 압축률도 뛰어나며 압축 데이터 및 복원 데이터의 신뢰성도 높일 수 있을 뿐만 아니라 데이터 전송시 전송효율과 속도도 향상시킬 수 있는 이진 데이터의 압축 및 복원 방법과 장치를 제공하는 데에 있다.
SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a data compression method and a data compression method that can compress and restore binary data quickly and efficiently through simple computation and hardware configuration, And a method and apparatus for compressing and restoring binary data that can improve transmission efficiency and speed.

본 발명의 일 측면에 따르면, 본 발명은 이진데이터 압축장치에 의해 수행되는 이진데이터의 압축방법으로서, 원본 이진데이터를 스캐닝하는 단계; 상기 스캐닝된 원본 이진데이터를 제너럴 클러스터 단위로 분할하여 복수의 제너럴 클러스터를 획득하는 단계; 상기 복수의 제너럴 클러스터의 각 값과, 유니버설 코드 간의 대응관계를 정의한 매핑사전을 생성하는 단계; 및 상기 매핑사전을 참조하여, 상기 원본 이진데이터로부터 복수의 유니버설 코드를 포함하는 압축데이터를 생성하는 단계를 포함하되, 상기 제너럴 클러스터는 상기 원본 이진데이터의 최하위비트로부터 상위비트 방향으로 이동하면서 만나는 "10", 및 상기 "10"과 그 직전에 만난 "10" 사이의 이진수를 포함하는 이진수를 나타내고, 상기 유니버설 코드는 최상위비트의 "10", 0개 이상의 연속된 "1", 및 상기 "10"과 상기 0개 이상의 연속된 "1" 사이에 배치된 적어도 0개 이상의 "0"을 포함하여 구성되는, 것을 특징으로 하는 이진 데이터의 압축방법을 제공한다.According to an aspect of the present invention, there is provided a method of compressing binary data performed by a binary data compression device, comprising: scanning original binary data; Dividing the scanned original binary data into units of a general cluster to obtain a plurality of general clusters; Generating a mapping dictionary defining a corresponding relationship between each value of the plurality of general clusters and the universal code; And generating compressed data including a plurality of universal codes from the original binary data by referring to the mapping dictionary, wherein the general cluster includes a plurality of universal codes, Quot; 10 ", and a binary number comprising a binary number between "10" and the immediately preceding "10 &Quot; and at least zero or more "0" s disposed between the zero or more consecutive "1" s.

본 발명에서, 상기 매핑사전은 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과, 오름차순으로 순차적으로 정렬된 유니버설 코드 간의 대응관계를 정의한 것임을 특징으로 한다.In the present invention, the mapping dictionary defines a corresponding relationship between each value of the plurality of general clusters sorted in ascending order and a universal code sequentially arranged in ascending order.

본 발명에서, 상기 매핑사전을 생성하는 단계에서, 상기 이진데이터 압축장치는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버설 코드의 값이 달라질 때부터, 상기 복수의 제너럴 클러스터의 각 값과 상기 유니버설 코드 간의 대응관계를 정의하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, in the step of generating the mapping dictionary, the binary data compression apparatus may further include a step of, when the values of the universal codes sequentially aligned in ascending order with the respective values of the plurality of general clusters sorted in ascending order are changed, And generating the mapping dictionary by defining a corresponding relation between each value of the general cluster and the universal code.

본 발명에서, 상기 매핑사전을 생성하는 단계에서, 상기 복수의 제너럴 클러스터의 각 값의 표현시, 각 제너럴 클러스터의 최상의 비트의 "10"을 제외한 "0"의 갯수와 "1"의 갯수의 조합에 의해 표현하는 것을 특징으로 한다.In the present invention, in the step of generating the mapping dictionary, when expressing each value of the plurality of general clusters, a combination of the number of "0" s and the number of "1" Is expressed by the following expression.

본 발명에서, 상기 매핑사전을 생성하는 단계에서, 상기 이진데이터 압축장치는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값을 상위비트 방향 또는 하위비트방향으로 일렬로 배열하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, in the step of generating the mapping dictionary, the binary data compression device arranges each value of the plurality of general clusters sorted in ascending order in an upper bit direction or a lower bit direction in a row to generate the mapping dictionary .

본 발명에서, 상기 복수의 제너럴 클러스터를 획득하는 단계에서, 상기 원본 이진데이터의 최상위 비트 앞에 "10"을 추가하여 분할을 수행하는 것을 특징으로 한다.In the present invention, in the step of acquiring the plurality of general clusters, "10" is added to the most significant bit of the original binary data to perform division.

본 발명에서, 상기 매핑사전을 생성하는 단계에서, 상기 원본 이진데이터에 포함된 복수의 제너럴 클러스터를 N개씩 조합하여 클러스터그룹을 생성하여 오름차순으로 정렬하고, 상기 유니버셜 코드를 오름 차순으로 정렬하여 N개씩 조합한 코드그룹을 생성하며, 상기 정렬된 클러스터 그룹과 상기 코드그룹 간의 대응관계를 정의하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, in the step of generating the mapping dictionary, a plurality of general clusters included in the original binary data are combined into N cluster groups to generate cluster groups, and the cluster groups are sorted in ascending order, the universal codes are sorted in ascending order, Generating a combined code group, and defining a corresponding relation between the sorted cluster group and the code group to generate the mapping dictionary.

본 발명은 상기 압축데이터와 상기 매핑사전을 결합한 결합데이터를 목적 장치로 전송하는 단계를 더 포함할 수 있다.The present invention may further comprise transmitting the combined data obtained by combining the compressed data and the mapping dictionary to a target device.

본 발명에서, 상기 매핑사전은, 제너럴 클러스터의 출현빈도의 내림차순에 따라 정렬된 상기 복수의 제너럴 클러스터의 각 값과, 오름차순으로 순차적으로 정렬된 유니버설 코드 간의 대응관계를 정의한 것임을 특징한다.In the present invention, the mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in ascending order of appearance frequency of the general cluster, and a universal code sequentially arranged in ascending order.

본 발명의 다른 측면에 따르면, 본 발명은 이진데이터 압축방법에 의해 압축된 이진 데이터를 복원장치가 복원하는 방법으로서, 복원부가 상기 매핑사전을 참조하여 상기 압축데이터로부터 이진데이터를 복원하는 단계를 포함하는 것을 특징으로 하는, 복원장치의 이진데이터 복원방법을 제공한다.
According to another aspect of the present invention, there is provided a method of restoring binary data compressed by a binary data compression method, wherein the restoration unit restores binary data from the compressed data with reference to the mapping dictionary And a restoration device for restoring the binary data.

본 발명의 또 다른 측면에 따르면, 본 발명은 원본 이진데이터를 스캐닝하고, 상기 스캐닝된 원본 이진데이터를 제너럴 클러스터 단위로 분할하여 복수의 제너럴 클러스터를 획득하는 데이터 스캐닝부; 상기 복수의 제너럴 클러스터의 각 값과, 유니버설 코드 간의 대응관계를 정의한 매핑사전을 생성하는 사전생성부; 및 상기 매핑사전을 참조하여, 상기 원본 이진데이터로부터 복수의 유니버설 코드를 포함하는 압축데이터를 생성하는 압축부를 포함하되, 상기 제너럴 클러스터는 상기 원본 이진데이터의 최하위비트로부터 상위비트 방향으로 이동하면서 만나는 "10", 및 상기 "10"과 그 직전에 만난 "10" 사이의 이진수를 포함하는 이진수를 나타내고, 상기 유니버설 코드는 최상위비트의 "10", 0개 이상의 연속된 "1", 및 상기 최상위 비트의 "0"과 상기 0개 이상의 연속된 "1" 사이에 배치된 적어도 0개 이상의 "0"을 포함하여 구성되는, 것을 특징으로 하는 이진 데이터 압축장치를 제공한다.According to another aspect of the present invention, there is provided a data scanning apparatus comprising: a data scanning unit scanning original binary data, dividing the scanned original binary data into units of a general cluster to obtain a plurality of general clusters; A dictionary generating unit for generating a mapping dictionary defining a corresponding relationship between each value of the plurality of general clusters and the universal code; And a compression unit for generating compressed data including a plurality of universal codes from the original binary data by referring to the mapping dictionary, wherein the general cluster includes: 10 ", and a binary number including a binary number between "10" and the immediately preceding "10 ", and the universal code indicates a binary number consisting of the most significant bit" 10 ", zero or more consecutive & And at least zero or more " 0 "s placed between" 0 "

본 발명에서, 상기 매핑사전의 생성시, 상기 사전생성부는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버설 코드의 값이 달라질 때부터, 상기 복수의 제너럴 클러스터의 각 값과 상기 유니버설 코드 간의 대응관계를 정의하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, at the time of generation of the mapping dictionary, the dictionary generation unit may calculate the number of each of the plurality of generic clusters from the value of the universal codes sequentially aligned in ascending order with the values of the plurality of general clusters sorted in ascending order, And generating the mapping dictionary by defining a corresponding relation between the value and the universal code.

본 발명에서, 상기 매핑사전의 생성함에 있어, 상기 사전생성부는 상기 복수의 제너럴 클러스터의 각 값의 표현시 각 제너럴 클러스터의 최상의 비트의 "10"을 제외한 "0"의 갯수와 "1"의 갯수의 조합에 의해 표현하는 것을 특징으로 한다.In the present invention, in generating the mapping dictionary, the dictionary generation unit may be configured such that, when expressing each value of the plurality of generic clusters, the number of "0" s excluding the " 10 " In the present invention.

본 발명에서, 상기 매핑사전의 생성시, 상기 사전생성부는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값을 상위비트 방향 또는 하위비트방향으로 일렬로 배열하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, at the time of generation of the mapping dictionary, the dictionary generation unit generates the mapping dictionary by arranging each value of the plurality of general clusters sorted in ascending order in a line in the upper bit direction or the lower bit direction .

본 발명에서, 상기 복수의 제너럴 클러스터를 획득시, 상기 데이터 스캐닝부는 상기 원본 이진데이터의 최상위 비트 앞에 "10"을 추가하여 분할을 수행하는 것을 특징으로 한다.In the present invention, when acquiring the plurality of general clusters, the data scanning unit performs division by adding "10" to the most significant bit of the original binary data.

본 발명에서, 상기 매핑사전의 생성시, 상기 사전생성부는 상기 원본 이진데이터에 포함된 복수의 제너럴 클러스터를 N개씩 조합하여 클러스터그룹을 생성하여 오름차순으로 정렬하고, 상기 유니버셜 코드를 오름 차순으로 정렬하여 N개씩 조합한 코드그룹을 생성하며, 상기 정렬된 클러스터 그룹과 상기 코드그룹 간의 대응관계를 정의하여 상기 매핑사전을 생성하는 것을 특징으로 한다.In the present invention, at the time of generating the mapping dictionary, the dictionary generation unit generates cluster groups by combining N generic groups included in the original binary data in ascending order, arranges the universal codes in ascending order And generating a mapping dictionary by defining a mapping relationship between the sorted cluster group and the code group.

본 발명은 상기 압축데이터와 상기 매핑사전을 결합한 결합데이터를 목적 장치로 전송하는 송신부를 더 포함할 수 있다.The present invention may further include a transmitter for transmitting the combined data obtained by combining the compressed data and the mapping dictionary to a target device.

본 발명에서, 상기 매핑사전은, 제너럴 클러스터의 출현빈도의 내림차순에 따라 정렬된 상기 복수의 제너럴 클러스터의 각 값과, 오름차순으로 순차적으로 정렬된 유니버설 코드 간의 대응관계를 정의한 것임을 특징으로 한다.In the present invention, the mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in descending order of appearance frequency of the general cluster, and a universal code sequentially arranged in ascending order.

본 발명의 또 다른 측면에 따르면, 본 발명은 이진데이터 압축장치에 의해 압축된 이진 데이터를 복원하는 장치로서, 상기 매핑사전을 참조하여 상기 압축데이터로부터 이진데이터를 복원하는 복원부를 포함하는 것을 특징으로 하는, 이진데이터 복원장치를 제공한다.
According to another aspect of the present invention, there is provided an apparatus for reconstructing binary data compressed by a binary data compression apparatus, the apparatus comprising: a reconstruction unit for reconstructing binary data from the compressed data by referring to the mapping dictionary; A binary data restoration device.

본 발명에 따른 이진 데이터의 압축 및 복원 방법과 장치는, 간단한 연산과 하드웨어적 구성을 통해 이진 데이터를 신속하고 효율적으로 압축하고 복원할 수 있고, 압축률도 뛰어나며 압축 데이터 및 복원 데이터의 신뢰성도 높일 수 있을 뿐만 아니라 데이터 전송시 전송효율과 속도도 향상시킬 수 있다.
The method and apparatus for compressing and restoring binary data according to the present invention are capable of quickly and efficiently compressing and restoring binary data through a simple operation and a hardware configuration, and also have excellent compression rate and reliability of compressed data and restored data Not only the transmission efficiency and the speed of data transmission can be improved.

도 1은 본 발명에 의한 일 실시예에 따른 이진 데이터의 압축장치 및 복원장치의 구성을 도시한 것이다.
도 2는 본 발명에 의한 일 실시예에 따른 이진 데이터의 압축방법을 설명하기 위한 흐름도이다.
도 3은 본 실시예의 매핑사전에서 제너럴 클러스터와 유니버설 코드 간의 대응관계를 나타낸 일 예이다.
도 4는 이진데이터에서 제너럴 클러스터 값의 순번이 증가함에 따라 제너럴 클러스터의 길이와 그에 대응하는 유니버설 코드의 길이 간의 관계를 예시적으로 나타낸 것이다.1 is a block diagram of a binary data compression apparatus and a decompression apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating a method of compressing binary data according to an embodiment of the present invention.
3 is an example showing the correspondence relationship between the general cluster and the universal code in the mapping dictionary of this embodiment.
FIG. 4 exemplarily shows the relationship between the length of a general cluster and the length of a universal code corresponding thereto as the number of generic cluster values increases in binary data.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고, 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and like parts are denoted by similar reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.
Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

도 1은 본 발명에 의한 일 실시예에 따른 이진 데이터의 압축장치 및 복원장치의 구성을 도시한 것이고, 도 2는 본 발명에 의한 일 실시예에 따른 이진 데이터의 압축방법을 설명하기 위한 흐름도로서, 이를 참조하여 본 발명에 따른 실시예를 설명하면 다음과 같다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a binary data compression apparatus and a decompression apparatus according to an embodiment of the present invention. FIG. 2 is a flowchart for explaining a binary data compression method according to an embodiment of the present invention , And an embodiment according to the present invention will be described with reference to the following.

도 1에 도시된 바와 같이, 본 실시예에 따른 이진데이터 압축장치(100)는 데이터 스캐닝부(110), 사전생성부(120), 압축부(130) 및 송신부(140)를 포함한다. 1, the binary data compression apparatus 100 according to the present embodiment includes a data scanning unit 110, a dictionary generation unit 120, a compression unit 130, and a transmission unit 140.

데이터 스캐닝부(110)는 원본 이진데이터를 스캐닝하고, 상기 스캐닝된 원본 이진데이터를 제너럴 클러스터 단위로 분할하여 복수의 제너럴 클러스터를 획득한다. 그리고, 사전생성부(120)는 상기 복수의 제너럴 클러스터의 각 값과, 유니버설 코드 간의 대응관계를 정의한 매핑사전을 생성한다.The data scanning unit 110 scans the original binary data and divides the scanned original binary data into units of a general cluster to acquire a plurality of general clusters. The dictionary generation unit 120 generates a mapping dictionary that defines a correspondence relationship between each value of the plurality of general clusters and the universal code.

압축부(130)는 상기 매핑사전을 참조하여, 상기 원본 이진데이터로부터 복수의 유니버설 코드를 포함하는 압축데이터를 생성한다.The compression unit 130 refers to the mapping dictionary to generate compressed data including a plurality of universal codes from the original binary data.

송신부(140)는 상기 압축데이터와 상기 매핑사전을 결합한 결합데이터를 복원장치(200) 등의 목적 장치로 전송한다.The transmission unit 140 transmits the combined data obtained by combining the compressed data and the mapping dictionary to a destination apparatus such as the restoration apparatus 200 or the like.

상기에서, 제너럴 클러스터는 상기 원본 이진데이터의 최하위비트로부터 상위비트 방향으로 이동하면서 만나는 "10", 및 상기 "10"과 그 직전에 만난 "10" 사이의 이진수를 포함하는 이진수를 나타내고; 상기 유니버설 코드는 최상위비트의 "10", 0개 이상의 연속된 "1", 및 상기 최상위비트의 "10"과 상기 0개 이상의 연속된 "1" 사이에 배치된 적어도 0개 이상의 "0"을 포함하여 구성된다.In this case, the general cluster represents a binary number including "10" which is encountered while moving from the least significant bit to the upper bit of the original binary data, and a binary number between "10" and the immediately preceding "10" The universal code includes at least 0 "s " arranged between the most significant bit" 10 ", zero or more consecutive "1" s and the most significant bit & .

상기 복수의 제너럴 클러스터의 획득시, 데이터 스캐닝부(110)는 상기 원본 이진데이터의 최상위 비트 앞에 "10"을 추가하여 분할을 수행한다.When acquiring the plurality of general clusters, the data scanning unit 110 performs division by adding "10" before the most significant bits of the original binary data.

상기 매핑사전은 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과, 오름차순으로 순차적으로 정렬된 유니버설 코드 간의 대응관계를 정의한 것임을 특징으로 한다.The mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in ascending order and a universal code sequentially arranged in ascending order.

상기 매핑사전의 생성시, 사전생성부(120)는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버설 코드의 값이 달라질 때부터, 상기 복수의 제너럴 클러스터의 각 값과 상기 유니버설 코드 간의 대응관계를 정의하여 상기 매핑사전을 생성할 수 있다.When generating the mapping dictionary, the dictionary generation unit 120 calculates the value of each of the plurality of generic clusters from the value of the universal code sequentially aligned in ascending order with each value of the plurality of general clusters sorted in ascending order, And the universal code to generate the mapping dictionary.

또한, 사전생성부(120), 상기 매핑사전의 생성함에 있어, 상기 복수의 제너럴 클러스터의 각 값의 표현시 각 제너럴 클러스터의 최상의 비트의 "10"을 제외한 "0"의 갯수와 "1"의 갯수의 조합에 의해 표현할 수 있다.In generating the mapping dictionary, the dictionary generating unit 120 may generate the mapping dictionary in such a manner that the number of "0" s except for "10" of the most significant bits of each general cluster and the number of " Can be expressed by a combination of numbers.

또한, 상기 매핑사전의 생성시, 사전생성부(120)는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값을 상위비트 방향 또는 하위비트방향으로 일렬로 배열하여 상기 매핑사전을 생성할 수 있다.When generating the mapping dictionary, the dictionary generation unit 120 may generate the mapping dictionary by arranging each value of the plurality of general clusters sorted in ascending order in a line in the upper bit direction or the lower bit direction.

아울러, 상기 매핑사전의 생성시, 사전생성부(120)는 상기 원본 이진데이터에 포함된 복수의 제너럴 클러스터의 값을 오름 차순으로 정렬하여 N개씩 클러스터그룹으로 조합하고, 상기 유니버셜 코드를 오름 차순으로 정렬하여 N개씩 조합한 코드그룹을 생성하며, 상기 클러스터 그룹과 상기 코드그룹 간의 대응관계를 정의하여 상기 매핑사전을 생성할 수도 있다.When generating the mapping dictionary, the dictionary generating unit 120 arranges the values of the plurality of general clusters included in the original binary data in ascending order, combines the values into cluster groups of N, and outputs the universal codes in ascending order Generates a code group in which N is combined, and creates a mapping dictionary by defining a correspondence relationship between the cluster group and the code group.

본 실시예에 따른 이진데이터 복원장치는 상기 매핑사전을 참조하여 상기 이진데이터 압축장치에 의해 압축된 압축데이터로부터 원본 이진데이터를 복원하는 복원부(210)를 포함한다.
The binary data decompression apparatus according to the present embodiment includes a decompression unit 210 for decompressing original binary data from the compressed data compressed by the binary data compression apparatus by referring to the mapping dictionary.

이와 같이 구성된 본 실시예의 동작 및 작용을 도 1 내지 도 4를 참조하여 구체적으로 설명한다.The operation and operation of the present embodiment thus constructed will be described in detail with reference to Figs. 1 to 4. Fig.

먼저, 데이터 스캐닝부(110)가 원본 이진데이터를 스캐닝하고, 그 스캐닝된 원본 이진데이터를 제너럴 클러스터 단위로 분할하여 복수의 제너럴 클러스터를 획득한다(S201,S202). 이 때, 복수의 제너럴 클러스터의 획득시, 데이터 스캐닝부(110)는 상기 원본 이진데이터의 최상위 비트 앞에 "10"을 추가하여 분할을 수행한다. 또한, 제너럴 클러스터는 이진 데이터의 최하위비트로부터 상위비트 방향으로 이동하면서 만나는 "10", 및 상기 "10"과 그 직전에 만난 "10" 사이의 이진수를 포함하는 이진수를 나타낸다. First, the data scanning unit 110 scans the original binary data, and divides the scanned original binary data into units of a general cluster to acquire a plurality of general clusters (S201, S202). At this time, at the time of acquiring a plurality of general clusters, the data scanning unit 110 adds "10" to the most significant bit of the original binary data to perform division. Further, the general cluster represents a binary number including "10" to meet while moving from the least significant bit to the upper bit of the binary data, and a binary number between "10" and "10"

모든 이진데이터는 제너럴 클러스터로 분리가 가능한데, 제너럴 클러스터의 분리는 이진데이터의 최상위 비트에서 최하위 비트로 이동하면 처음 만나는 “10”마다 분리하거나 최하위 비트에서 최상위 비트로 이동하면서 처음 만나서 “10”으로 분리할 수 있다.All binary data can be separated into general clusters. The separation of a general cluster can be separated from the most significant bit to the least significant bit of the binary data, separated from each other by a first "10" or moved from the least significant bit to the most significant bit, have.

예를들어, 10011011010101011010101011 이라는 이진수 데이터는For example, binary data 10011011010101011010101011

==> 1001 / 101 / 10 / 10 / 10 / 101 / 10 / 10 / 10 / 1011 과 같이 10개의 제너럴 클러스터로 구분이 될 수 있다. 10 generic clusters can be divided into 10 general clusters as shown in FIG.

만약, 원본 이진데이터의 최상위 비트가 "10"으로 시작하지 않는 경우에는, 상술한 바와 같이 원본 이진데이터를 분할하기 전에 무조건 원본 이진데이터 앞에 "10"을 추가한 뒤, 향후 압축해제(복원)시 최상위 비트의 "10"을 제거하는 방식으로 원본 이진데이터를 복원하면 된다. 이렇게 최상위를 “10”으로 만들어주는 방법은 매우 다양하다.If the most significant bit of the original binary data does not start with "10 ", as described above, unconditionally add" 10 "before the original binary data before dividing the original binary data, The original binary data can be restored by removing the most significant bit "10 ". There are many ways to make the top 10 "10".

예를 들어, 1111000101010101 과 같은 이진수라면,For example, if the binary number is 1111000101010101,

111 / 1000 / 10 / 10 / 10 / 101 로 구분이 되는데, 맨앞은 111 로서, "10"이 존재하지 않기 때문에 제너럴 클러스터를 생성하는 조건을 만족하지 못한다. 따라서, 이와 같은 경우 무조건 압축시작시에 최상위비트 위치에, "10"을 붙인뒤 제너럴 클러스터로 구분한다. 즉,111/1000/10/10/10/101, the front is 111, and since the "10" does not exist, the condition for generating the general cluster is not satisfied. Therefore, in such a case, "10" is added to the most significant bit position at the start of unconditional compression, and then classified into a general cluster. In other words,

101111000101010101101111000101010101

===> 10111 / 1000 / 10 / 10 / 10 / 101 로 바이너리 클러스터가 나뉜다. 복원(압축해제) 시에는, 최상위 비트의 "10"은 최종 복원데이터에서 제거하는 방법으로 완벽하게 원본 이진데이터를 복원할 수 있다.===> 10111/1000/10/10/10/101 The binary cluster is divided. At restoration (decompression), "10" of the most significant bit can be completely recovered from the original restored data by the method of removing it from the final restored data.

다음으로, 사전생성부(120)가 상기 복수의 제너럴 클러스터의 각 값과, 유니버설 코드 간의 대응관계를 정의한 매핑사전을 생성한다(S203). 이를 좀 더 구체적으로 설명하면 아래와 같다.Next, the dictionary generation unit 120 generates a mapping dictionary that defines the corresponding relationship between each value of the plurality of general clusters and the universal code (S203). This will be described in more detail as follows.

예를 들어, 아래와 같은 604842 bit의 실제의 어떤 파일의 이진데이터를 제너럴 클러스터 단위로 분리할 수 있다.For example, binary data of a real file of 604842 bits as shown below can be divided into general cluster units.

1001010000010010110000001100000100000101000000000000000110000000000000100000000000000000000000000000100001000000000011011011110111111010010111000011011101000000010000000000000000010001100000110000000000000000000001001100000000000010000000001001011011010000110110111101101110011101000110010101101110011101000101111101010100011110010111000001100101011100110101110100101110011110000110110101101100001000001010001000000100000000100010100010100000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000..........1001010000010010110000001100000100000101000000000000000110000000000000100000000000000000000000000000100001000000000011011011110111111010010111000011011101000000010000000000000000010001100000110000000000000000000001001100000000000010000000001001011011010000110110111101101110011101000110010101101110011101000101111101010100011110010111000001100101011100110101110100101110011110000110110101101100001000001010001000000100000000100010100010100000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ..........

상기 이진데이터를 제너럴 클러스터별로 분할하면 표 1과 같다. 아래 예는 최상위 비트로부터 최하위 비트로 이동하면서 제너럴 클러스터를 구분한 결과이다. 전체적으로 상기 예시의 경우, 138472개의 제너럴 클러스터로 분할되었고, 2비트 짜리에서부터 4099비트 짜리까지 전체 201종으로 구성되었다. 아래 표 1은 그 일부를 보인 것이다.Table 1 shows the binary data divided into general clusters. The following example is a result of dividing a general cluster by moving from the most significant bit to the least significant bit. In the above example, 138472 general clusters are divided into a total of 201 species ranging from 2 bits to 4099 bits. Table 1 below shows some of them.

분리된 제너럴 클러스터Isolated general cluster 100100 1010 100000100000 100100 101101 1000000110000001 100000100000 100000100000 1010 1000000000000000110000000000000001 1000000000000010000000000000 100000000000000000000000000000100000000000000000000000000000 1000010000 100000000001100000000001 101101 1011110111 10111111011111 1010 100100 10111011 100001100001 10111011 1010 1000000010000000 100000000000000000100000000000000000 ..............................

이제 이렇게 구분된 제너럴 클러스터를 비트수 길이에 따른 오름차순으로 정렬하면 표 2와 같다. 한편, 제너럴 클러스터의 정렬 기준과 관련하여 본 실시예에서는 제너럴 클러스터의 비트수 길이의 오름차순 정렬기준을 이용하는 것을 중심으로 하여 기술하고 있으나, 실시예에 따라서는 제너럴 클러스터의 출현빈도수의 내림차순 정렬기준을 이용하여 정렬한 뒤 이하의 과정을 수행하도록 할 수도 있다.Table 2 summarizes these generalized clusters in ascending order according to the number of bits. In the present embodiment, in accordance with the sorting criterion of the general cluster, an ascending sorting criterion of the number of bits of the general cluster is mainly used. However, according to the embodiment, the descending sorting criterion of the occurrence count of the general cluster is used And then perform the following process.

제너럴 클러스터General cluster 출현빈도Appearance frequency 제너럴 클러스터의 비트수Number of bits in a general cluster 1010 3517235172 22 100100 1396013960 33 101101 1722017220 33 10001000 91689168 44 10011001 68576857 44 10111011 68906890 44 1000010000 46934693 55 1000110001 48064806 55 1001110011 37843784 55 1011110111 49294929 55 100000100000 19851985 66 100001100001 24832483 66 100011100011 19091909 66 100111100111 23272327 66 101111101111 31443144 66 10000001000000 712712 77 10000011000001 828828 77 10000111000011 10381038 77 10001111000111 11131113 77 10011111001111 14451445 77 10111111011111 16721672 77 1000000010000000 579579 88 1000000110000001 433433 88 1000001110000011 671671 88 1000011110000111 815815 88 1000111110001111 829829 88 1001111110011111 896896 88 1011111110111111 932932 88 100000000100000000 258258 99 100000001100000001 143143 99 100000011100000011 153153 99 100000111100000111 292292 99 100001111100001111 497497 99 100011111100011111 499499 99 100111111100111111 437437 99 101111111101111111 490490 99 10000000001 billion 133133 1010 10000000011000000001 155155 1010 10000000111000000011 6969 1010 10000001111000000111 115115 1010 10000011111000001111 186186 1010 10000111111000011111 250250 1010 10001111111000111111 271271 1010 10011111111001111111 240240 1010 10111111111011111111 279279 1010 ...... ...... ......

이렇게 정렬된 최소길이의 제너럴 클러스터로부터 최대길이의 바이너리 클러스터까지의 값에 1:1로 대응되는 유니버설 코드(universal code)를 자동으로 생성하여 대응시켜서 매핑사전을 생성한다. 여기서, 유니버설 코드는 최상위비트의 "10", 0개 이상의 연속된 "1", 및 상기 최상위비트의 "10"과 상기 0개 이상의 연속된 "1" 사이에 배치된 적어도 0개 이상의 "0"을 포함하여 구성된다. 상기 매핑사전은 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과, 오름차순으로 순차적으로 정렬된 유니버설 코드 간의 대응관계를 정의한 것임을 특징으로 한다. 후술하는 바와 같이, 압축부(130)는 원본 이진데이터의 모든 제너럴 클러스터에 대하여 해당 유니버설 코드를 1:1 매핑 치환함으로써 압축이 되고, 반대과정을 거치면 복원(압축해제)이 된다.A mapping dictionary is generated by automatically generating and associating a universal code corresponding to 1: 1 from the aligned minimum length general cluster to the maximum length binary cluster. Here, the universal code includes at least 0 " 0 "s arranged between the most significant bit" 10 ", zero or more consecutive "1 ", and the most significant bit & . The mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in ascending order and a universal code sequentially arranged in ascending order. As will be described later, the compression unit 130 performs compression by replacing the universal codes of all the general clusters of the original binary data by 1: 1 mapping, and restoring (decompressing) when the opposite process is performed.

이 때, 이러한 1:1 제너럴 클러스터 - 유니버설 코드 매핑 테이블(매핑사전)은 그대로 압축결과화일에 포함시킬 수도 있고, 공지된 다양한 압축방법을 이용하여 별도의 압축과정을 통해 보다 작은 형태로 압축결과화일에 담을 수도 있다. 특히 제너럴 클러스터의 길이가 긴 것의 경우, 대부분 RUN-LENGTH code형태로서, RLE방식으로 효과적으로 사전을 압축할 수 있다. In this case, the 1: 1 general cluster-universal code mapping table (mapping dictionary) may be included in the compression result file as it is or may be compressed in a smaller form through a separate compression process using various known compression methods. . Especially, when the length of the general cluster is long, most of the RL-LENGTH codes can be effectively compressed by the RLE method.

특히, 오름차순으로 정렬된 복수의 제너럴 클러스터의 각 값과, 이에 대응하여 오름차순으로 순차적으로 정렬된 유니버설 코드가 최초로 불일치하는 위치에서부터 매핑사전을 생성하여도 무방하다. 이것은 유니버설 코드와 제너럴 클러스터가 일치하는 부분은 유니버설 코드를 가장 작은 값부터 순차적으로 자동 생성함으로써 각 제너럴 클러스터와의 매핑관계까지 모두 알 수 있기 때문이다. 이런 방식을 쓰면 매핑사전의 크기를 보다 더 줄일 수 있다.In particular, the mapping dictionary may be generated from a position where the values of a plurality of general clusters aligned in ascending order and the universal codes sequentially aligned in ascending order correspond to the first disagreement. This is because the portion where the universal code and the general cluster coincide is automatically generated by sequentially starting from the smallest value of the universal code so that the mapping relation with each general cluster can be known. This way, you can further reduce the size of the mapping dictionary.

예를 들어, 아래 표 3에 표시된 바와 같이, 좌측의 제너럴 클러스터와 우측의 유니버설 코드가 일치하는 부분은 제너럴 클러스터의 길이가 짧을 경우 더 높은 확률로 나타나게 되는데, 아래와 같이, 적어도 제너럴 클러스터 10,100,101,1000,1001,1011,....와 유니버설 코드 10,100,101,1000,1001,1011,...는 일치하고, 이에 대해서는 유니버설 코드를 가장 작은 값부터 순차적으로 생성함으로써 자동으로 생성할 수 있기 때문에, 최초로 바이너리 클러스터와 유니버설 코드가 불일치하는 부분부터 매핑사전을 생성하면 매핑사전의 크기를 줄일 수 있다.For example, as shown in Table 3 below, the portion where the left common cluster coincides with the right universal code appears at a higher probability when the length of the general cluster is short. At least the general clusters 10, 100, 101, The universal codes 10, 100, 101, 1000, 1001, 1011, ... coincide with each other and can be automatically generated by sequentially generating universal codes starting from the smallest value. And a universal code mismatch, the size of the mapping dictionary can be reduced.

제너럴
클러스터 General
cluster 원본데이터 내
출현빈도Within Original Data
Appearance frequency 제너럴
클러스터 길이General
Cluster length 유니버설 코드
(U code)Universal code
(U code) 유니버설 코드길이Universal cord length 유니버설 코드와 제너럴 클러스터의 길이 차이(delta)Length difference (delta) between universal code and general cluster delta * 출현빈도delta * frequency 1010 3517235172 22 1010 22 00 00 100100 1396013960 33 100100 33 00 00 101101 1722017220 33 101101 33 00 00 10001000 91689168 44 10001000 44 00 00 10011001 68576857 44 10011001 44 00 00 10111011 68906890 44 10111011 44 00 00 1000010000 46934693 55 1000010000 55 00 00 1000110001 48064806 55 1000110001 55 00 00 1001110011 37843784 55 1001110011 55 00 00 1011110111 49294929 55 1011110111 55 00 00 100000100000 19851985 66 100000100000 66 00 00 100001100001 24832483 66 100001100001 66 00 00 100011100011 19091909 66 100011100011 66 00 00 100111100111 23272327 66 100111100111 66 00 00 101111101111 31443144 66 101111101111 66 00 00 10000001000000 712712 77 10000001000000 77 00 00 10000011000001 828828 77 10000011000001 77 00 00 10000111000011 10381038 77 10000111000011 77 00 00 10001111000111 11131113 77 10001111000111 77 00 00 ...... ...... ...... ...... ...... ...... ....

한편, 유니버설 코드는, 10으로부터 시작하여 차례대로 자동적으로 생성되는데, 각각은 "10", "1011(10/11)","1000111(10/00/111)" 등과 같이 각 유니버설 코드의 최하위 비트로부터 연속된 적어도 0개 이상의 "1", 최상위비트로부터 배치된 "10", 및 상기 연속된 적어도 0개 이상의 "1"과 상기 최상위비트로부터 배치된 "10" 사이에 배치되는 적어도 0개 이상의 연속된 "0"을 포함하여 구성된다. 이러한 조건을 만족하는 유니버설 코드 중 비트길이가 가장 작은 이진수는 "10"이 되며, 그 이후로 100, 101, 1000, 1001, 1011, 10000, 10001, 10011, 10111, 100000...와 같이 오름차순으로 순차적으로 나열할 수 있다.On the other hand, a universal code is automatically generated in order starting from 10, and each of the universal codes is automatically generated in the order of 10, 1011 (10/11), 1000111 (10/00/111) Quot; 1 "arranged from the most significant bit and " 10 " arranged from the most significant bit, and at least zero or more consecutive Quot; 0 " Among the universal codes satisfying these conditions, the binary number having the smallest bit length becomes "10 ", and thereafter, in ascending order such as 100, 101, 1000, 1001, 1011, 10000, 10001, 10011, 10111, You can list them sequentially.

여기서 알 수 있는 바와 같이, 유니버설 코드는 반드시 맨 앞에 "10"을 가지고 있어야 하며, 이후 비트자릿수를 늘려 가면서 최하위비트로부터 "1"의 갯수를 늘려가되 최상위비트 쪽에 "10"은 반드시 남아 있어야 하고 "101111"과 같이 "10" 다음에 더 이상 "0"이 존재하지 않게 되면 비트자릿수를 한자리 늘려서 "1000000"로 넘어가게 된다. 이러한 유니버설 코드를 비트 길이가 작은 것부터 순차적으로 나열하여 각각에 대해 원본 이진데이터의 제너럴 클러스터와 대응시키면 아래의 표4와 같이 대응시킬 수 있다. 여기서, 순번은 원본 이진데이터에 포함된 제너럴 클러스터의 값을 오름차순으로 나열했을 때 각 제너럴 클러스터에 대응되는 순번을 의미한다.As can be seen here, the universal code must have a "10" at the beginning, then increase the number of bits and increase the number of "1" s from the least significant bit, while the "10" If there is no more "0" after "10", as in "101111", the number of bits is increased by one digit to "1000000". These universal codes are sequentially arranged from the smallest bit length to the common cluster of the original binary data, and they are corresponded to each other as shown in Table 4 below. Here, the order numbers refer to the order numbers corresponding to the respective general clusters when the values of the general clusters included in the original binary data are listed in ascending order.

순번turn 유니버설 코드Universal code 1One 1010 22 100100 33 101101 44 10001000 55 10011001 66 10111011 77 1000010000 88 1000110001 99 1001110011 1010 1011110111 1111 100000100000 1212 100001100001 1313 100011100011 1414 100111100111 1515 101111101111 1616 10000001000000 1717 10000011000001 1818 10000111000011 1919 10001111000111 2020 10011111001111 2121 10111111011111 ….... . …...

이렇게 각 제너럴 클러스터의 값을 오름차순으로 정렬한 결과에 상기 생성된 유니버설 코드를 매핑한 결과는 표 5와 같다. 표 5에 표시된 바와 같이, 제너럴 클러스터의 비트수가 작을 때에는 유니버설 코드도 동일한 비트로 함께 성장하므로, 압축효과가 없다.Table 5 shows the result of mapping the generated universal codes to the result of sorting the values of each general cluster in ascending order. As shown in Table 5, when the number of bits in the general cluster is small, the universal code also grows with the same bit, so there is no compression effect.

제너럴
클러스터 General
cluster 원본데이터내
출현빈도Within Original Data
Appearance frequency 제너럴 클러스터 길이General cluster length universal code(U code)universal code (U code) universal 코드길이universal cord length Universal코드와 제너럴 클러스터 길이 차이(delta)Universal code and general cluster length difference (delta) delta * 출현빈도delta * frequency 1010 3517235172 22 1010 22 00 00 100100 1396013960 33 100100 33 00 00 101101 1722017220 33 101101 33 00 00 10001000 91689168 44 10001000 44 00 00 10011001 68576857 44 10011001 44 00 00 10111011 68906890 44 10111011 44 00 00 1000010000 46934693 55 1000010000 55 00 00 1000110001 48064806 55 1000110001 55 00 00 1001110011 37843784 55 1001110011 55 00 00 1011110111 49294929 55 1011110111 55 00 00 100000100000 19851985 66 100000100000 66 00 00 100001100001 24832483 66 100001100001 66 00 00 100011100011 19091909 66 100011100011 66 00 00 100111100111 23272327 66 100111100111 66 00 00 101111101111 31443144 66 101111101111 66 00 00 10000001000000 712712 77 10000001000000 77 00 00 10000011000001 828828 77 10000011000001 77 00 00 10000111000011 10381038 77 10000111000011 77 00 00 10001111000111 11131113 77 10001111000111 77 00 00 ...... ...... ...... ...... ...... ...... ....

그러나, 상기 표 5에서 이어지는 표 6을 보면, 아직 압축되지 않은 원본 이진데이터에서 제너럴 클러스터의 길이는 통상적으로 유니버설 코드보다도 더 빠른속도로 증가하고 있음을 확인할 수 있고, 이에 따라 유니버설 코드 길이와 제너럴 클러스터의 코드길이가 차이가 나기 시작한다. 이와 같이 유니버설 코드와 제너럴 클러스터 간에 길이가 불일치하는 클러스터가 많고, 그러한 클러스터의 빈도가 높을수록 압축 효율은 높아지게 된다.However, according to Table 6 below, it can be seen that the length of the general cluster in the original uncompressed original binary data increases at a speed higher than that of the universal code, and accordingly, the length of the universal code, The code length of the code starts to differ. As described above, there are many clusters whose lengths are inconsistent between the universal code and the general cluster, and the higher the frequency of the clusters, the higher the compression efficiency.

제너럴 클러스터 General cluster 원본데이터내
출현빈도Within Original Data
Appearance frequency 제너럴 클러스터 길이General cluster length 유니버설 코드(U code)Universal code (U code) 유니버설 코드길이Universal cord length 유니버설코드와 제너럴 클러스터 길이차이
(delta)Difference between universal code and general cluster length
(delta) delta * 출현빈도delta * frequency .......... ...... ...... ...... .... .... .... 1000000000000000000000000010000000000000000000000000 1One 2626 10000000000001111111000000000000111111 1919 -7-7 -7-7 100000000000000000000000000100000000000000000000000000 22 2727 10000000000011111111000000000001111111 1919 -8-8 -16-16 100000000000000000000000001100000000000000000000000001 1One 2727 10000000000111111111000000000011111111 1919 -8-8 -8-8 1000000000000000000000000001100000000000000000000001 22 2828 10000000001111111111000000000111111111 1919 -9-9 -18-18 10000000000000000000000000111000000000000000000000000011 1616 2828 10000000011111111111000000001111111111 1919 -9-9 -144-144 10000000000000000000000000000100000000000000000000000000 22 2929 10000000111111111111000000011111111111 1919 -10-10 -20-20 1000000000000000000000000000110000000000000000000000000001 2222 2929 10000001111111111111000000111111111111 1919 -10-10 -220-220 10000000000000000000000000011100000000000000000000000011 22 2929 10000011111111111111000001111111111111 1919 -10-10 -20-20 100000000000000000000000000000100000000000000000000000000000 7272 3030 10000111111111111111000011111111111111 1919 -11-11 -792-792 100000000000000000000000000011100000000000000000000000000011 1212 3030 10001111111111111111000111111111111111 1919 -11-11 -132-132 10000000000000000000001111111111000000000000000000000111111111 1One 3131 10011111111111111111001111111111111111 1919 -12-12 -12-12 1000000000000000011111111111111110000000000000000111111111111111 2121 3232 10111111111111111111011111111111111111 1919 -13-13 -273-273 1000000000000000000000000000000111000000000000000000000000000011 1One 3333 1000000000000000000010000000000000000000 2020 -13-13 -13-13 1000000000000000000000000000011111000000000000000000000000111111 22 3333 1000000000000000000110000000000000000001 2020 -13-13 -26-26 100000000000000000111111111111111100000000000000000111111111111111 33 3333 1000000000000000001110000000000000000011 2020 -13-13 -39-39 100000000000000000000000000000000000100000000000000000000000000000000000 1One 3636 1000000000000000011110000000000000000111 2020 -16-16 -16-16 100000000000000000000000011111111111111110000000000000000000000111111111111111 1010 4040 1000000000000000111110000000000000001111 2020 -20-20 -200-200 1000000000000000000000000011111111111111110000000000000000000000000111111111111111 1One 4141 1000000000000001111110000000000000011111 2020 -21-21 -21-21 1000000000000000000000000001111111111111111000000000000000000000000111111111111111 1One 4242 1000000000000011111110000000000000111111 2020 -22-22 -22-22 1000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000 1One 5656 1000000000000111111110000000000001111111 2020 -36-36 -36-36 .......... ...... ...... ...... .... .... ....

도 3은 본 실시예의 매핑사전에서 제너럴 클러스터와 유니버설 코드 간의 대응관계를 나타낸 일 예이다. 도 3에 도시된 예에서, 아래에서 세 번째의 경우 2049 비트짜리 1개의 제너럴 클러스터에, 21비트짜리 201번째 유니버설 코드만으로도 심볼화가 가능하다. 이 때 2049 - 21 = 2028 비트의 압축효과가 있고, 이러한 제너럴 클러스터가 3개 나타났으므로 6084 비트의 압축효과가 상기 제너럴 클러스터에서 나타나게 된다.3 is an example showing the correspondence relationship between the general cluster and the universal code in the mapping dictionary of this embodiment. In the example shown in FIG. 3, in the third case from below, it is possible to symbolize only one 21-bit universal code of 2049 bits in one general cluster. At this time, there is a compressing effect of 2049 - 21 = 2028 bits. Since three such general clusters have appeared, a compressing effect of 6084 bits appears in the general cluster.

상기의 사항을 그래프로 다시 표현하면 도 4와 같다. 도 4는 이진데이터에서 제너럴 클러스터 값의 순번이 증가함에 따라 제너럴 클러스터의 길이와 그에 대응하는 유니버설 코드의 길이 간의 관계를 예시적으로 나타낸 것이다. 제너럴 클러스터의 길이가 불규칙적으로 증가하는 시점부터는, 제너럴 클러스터의 길이는 유니버설 코드의 성장길이보다도 매우 급격하게 증가하여 그 비트수의 차이가 나타남을 알 수 있고, 이 차이가 있는 부분에서 압축의 효과가 발생하는 것이다.The above matters are expressed in a graph as shown in FIG. FIG. 4 exemplarily shows the relationship between the length of a general cluster and the length of a universal code corresponding thereto as the number of generic cluster values increases in binary data. From the point at which the length of the general cluster increases irregularly, it can be seen that the length of the general cluster increases more sharply than the growth length of the universal code, and the difference in the number of bits appears. It happens.

이와 같은 원리에 따라 수리적으로 압축효과가 나타남을 알 수 있으며, 원본 이진데이터의 각 제너럴 클러스터를 1:1로 유니버설 코드로 치환하기 위한 매핑사전을 구현하는 방법을 구체적으로 설명하면 다음과 같다.A method of implementing a mapping dictionary for replacing each general cluster of original binary data with a universal code by 1: 1 will be described in detail as follows.

아래 표 7과 같이, 원본 이진데이터를 구성하는 제너럴 클러스터를 오름차순으로 정렬하면, 길이 17인, "10000000000000000" 에서 처음으로 유니버설 코드와 불일치가 발생하기 시작한다. 이 위치는 유니버설 코드의 순번으로는 120번 위치이다.
As shown in Table 7 below, when the general clusters constituting the original binary data are sorted in the ascending order, the inconsistency with the universal code starts to occur for the first time at "1700000000000000" having the length of 17. This position is number 120 in the order of the universal code.

제너럴 클러스터
General cluster
제너럴 클러스터 길이General cluster length 유니버설 코드
(U code)Universal code
(U code) 유니버설 코드길이Universal cord length 유니버설코드와 제너럴 클러스터 길이 차이
(delta)Difference between universal code and general cluster length
(delta) 1000000000000000010000000000000000 1717 10111111111111111011111111111111 1616 -1-One 1000000000000000110000000000000001 1717 1000000000000000010000000000000000 1717 00 1000000000000001110000000000000011 1717 1000000000000000110000000000000001 1717 00 1000000000000011110000000000000111 1717 1000000000000001110000000000000011 1717 00 1000000000111111110000000001111111 1717 1000000000000011110000000000000111 1717 00 1000001111111111110000011111111111 1717 1000000000000111110000000000001111 1717 00 100000000000000000100000000000000000 1818 1000000000001111110000000000011111 1717 -1-One 100000000000000001100000000000000001 1818 1000000000011111110000000000111111 1717 -1-One 100000000000001111100000000000001111 1818 1000000000111111110000000001111111 1717 -1-One 100000000000011111100000000000011111 1818 1000000001111111110000000011111111 1717 -1-One 100000000001111111100000000001111111 1818 1000000011111111110000000111111111 1717 -1-One 100000001111111111100000001111111111 1818 1000000111111111110000001111111111 1717 -1-One 100000011111111111100000011111111111 1818 1000001111111111110000011111111111 1717 -1-One 100000111111111111100000111111111111 1818 1000011111111111110000111111111111 1717 -1-One 10000000000000000001000000000000000000 1919 1000111111111111110001111111111111 1717 -2-2 10000000000000000111000000000000000011 1919 1001111111111111110011111111111111 1717 -2-2 1000000000000000000010000000000000000000 2020 1011111111111111110111111111111111 1717 -3-3 1000000000000000000110000000000000000001 2020 100000000000000000100000000000000000 1818 -2-2 1000000000000000001110000000000000000011 2020 100000000000000001100000000000000001 1818 -2-2 1000000000000000011110000000000000000111 2020 100000000000000011100000000000000011 1818 -2-2 1000000000000000111110000000000000001111 2020 100000000000000111100000000000000111 1818 -2-2

상술한 바와 같이, 사전생성부(120)는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버셜 코드의 값이 달라질 때부터, 상기 복수의 제너럴 클러스터의 각 값과 상기 유니버셜 코드 간의 대응관계를 정의하여 매핑사전을 생성할 수 있다. As described above, the dictionary generating unit 120 generates a plurality of generic clusters, each of which has a plurality of generic clusters, in which the values of the generic clusters sequentially aligned in ascending order with the values of the plurality of generic clusters sorted in ascending order, It is possible to create a mapping dictionary by defining a correspondence relationship between universal codes.

또한, 상기 매핑사전의 생성시, 사전생성부(120)는 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값을 상위비트 방향 또는 하위비트방향으로 일렬로 배열하여 상기 매핑사전을 생성할 수 있다. 이 때, 매핑사전은 비트길이가 가장 짧은 제너럴 클러스터로부터 시작하여 오름차순으로 작성될 수도 있고; 오름차순으로 정렬된 상기 복수의 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버셜 코드의 값이 달라질 때부터 작성될 수도 있다.When generating the mapping dictionary, the dictionary generation unit 120 may generate the mapping dictionary by arranging each value of the plurality of general clusters sorted in ascending order in a line in the upper bit direction or the lower bit direction. At this time, the mapping dictionary may be created in ascending order starting from the general cluster having the shortest bit length; And may be created when the values of the universal codes sequentially aligned in ascending order with the respective values of the plurality of general clusters arranged in ascending order are changed.

그 두번째 경우를 예로 들면, 매핑 사전은 표 8과 같이 제너럴 클러스터 길이 17인, "10000000000000000" 부터 일렬로 붙여서 가장 긴 제너럴 클러스터까지 순차적으로 이어 붙이고, 유니버설 코드의 순번 120을 저장하는 형태로 완성할 수 있다.
Taking the second case as an example, the mapping dictionary can be completed in the form of storing the order number 120 of the universal code sequentially from "10000000000000000", which is the general cluster length 17, to the longest general cluster as shown in Table 8 have.

제너럴 클러스터General cluster 1000000000000000010000000000000000 1000000000000000110000000000000001 1000000000000001110000000000000011 1000000000000011110000000000000111 1000000000111111110000000001111111 1000001111111111110000011111111111 100000000000000000100000000000000000 100000000000000001100000000000000001 100000000000001111100000000000001111 100000000000011111100000000000011111 100000000001111111100000000001111111 100000001111111111100000001111111111 100000011111111111100000011111111111 100000111111111111100000111111111111 10000000000000000001000000000000000000 10000000000000000111000000000000000011 1000000000000000000010000000000000000000 1000000000000000000110000000000000000001 1000000000000000001110000000000000000011 1000000000000000011110000000000000000111 1000000000000000111110000000000000001111 100000000000000000000100000000000000000000 100000000000000000001100000000000000000001 100000000000000000011100000000000000000011 100000000000000011111100000000000000011111 100000000000011111111100000000000011111111 100000000111111111111100000000111111111111 10000000000000000000001000000000000000000000 10000000000000000000011000000000000000000001 10000000000000000000111000000000000000000011 100000000000000000011110000000000000000111 10000000001111111111111000000000111111111111 1000000000000000000000010000000000000000000000 ....................

매핑 사전의 실제적인 모습은 다음과 같다. 이때 "/" 는 가상적인 구분자이며 실제로는 존재하지 않아도 제너럴 클러스터의 유일복호성에 따라 상기 표 8과 같이 정확하게 복원된다. 그리고, 오름차순으로 순차적으로 나열할 때 120번째 순번의 유니버설 코드, 즉 10000000000000000부터 아래의 매핑사전의 제너럴 클러스터에 차례로 대응된다. 상기에서 유일복호성이란 아래와 같이 생성된 매핑사전에서 제너럴 클러스터가 복호화되는 방법이 하나밖에 없다는 것을 의미하는 것으로, 바이너리 형태의 매핑사전은 최하위 비트로부터 시작하여 "10"을 만날 때마다 바로 그 앞에서 분리하여 각각의 제너럴 클러스터가 분리된다.The actual appearance of the mapping dictionary is as follows. At this time, "/" is a virtual delimiter, and even if it does not exist, it is correctly restored as shown in Table 8 according to the unique decryption property of the general cluster. When sequentially sequenced in the ascending order, the universal codes of the 120th order, that is, from 10000000000000000 to the general clusters of the mapping dictionaries below are sequentially mapped. In the above description, the singularity means that there is only one method for decoding the general cluster in the mapping dictionary generated as follows. The binary type mapping dictionary starts from the least significant bit and is separated from the front bit every time it encounters "10" Each general cluster is separated.

10000000000000000 / 10000000000000001 /10000000000000011 / ........... / 10000000000000000000000 / ......... 10000000000000000/10000000000000001/10000000000000011 / ........... / 10000000000000000000000 / .........

한편, 상기 매핑사전을 생성하는 단계(S203)에서, 복수의 제너럴 클러스터의 각 값의 표현시, 각 제너럴 클러스터의 최상의 비트의 "10"을 제외한 "0"의 갯수와 "1"의 갯수의 조합에 의해 표현하여 매핑사전을 생성할 수도 있다. 매핑사전 내 제너럴 클러스터는 연속된 0, 1 이 다수의 빈도로 연속된 형태로 나타나는 경우가 매우 많으므로, 아래 표 9와 같이 제너럴 클러스터의 최상위비트 위치의 "10"을 제외한 이후부터의 연속된 "0"의 갯수, 연속된 "0" 이후에 연속된 "1"의 갯수의 형태로 조합하여, 매핑사전을 숫자형태로 표현하는 방법도 매우 효과적일것이다.On the other hand, in the step of generating the mapping dictionary (S203), when expressing each value of the plurality of general clusters, a combination of the number of "0" s and the number of "1" To generate a mapping dictionary. Since the generic clusters in the mapping dictionary often show consecutive 0's and 1's in a continuous form at a plurality of frequencies, a continuous "10 " 0 "and the number of consecutive " 1" s after consecutive "0 "

즉, 매핑사전의 형태는 (15,0)(14,1)(13,2)(12,3)(8,7),....(4,12)...형태의 제너럴 클러스터와, 제너럴 클러스터와 유니버설 코드가 처음으로 불일치할 때의 유니버설 코드의 생성 순번 정보의 조합, 즉 (X.X)...(X.X) + 순번정보 형태가 될 수 있을 것이다. That is, the form of the mapping dictionary is (15,0) (14,1) (13,2) (12,3) (8,7), ... (4,12) (XX) ... (XX) + sequential information form of the generation sequence information of the universal code when the universal code first disagrees with the general cluster.

제너럴 클러스터General cluster 최초 10을 제외한 연속된 0 의 갯수The number of consecutive zeros except the first 10 연속된 0 이후의 연속된 1의 갯수The number of consecutive zeros after the consecutive zeros 1000000000000000010000000000000000 1515 00 1000000000000000110000000000000001 1414 1One 1000000000000001110000000000000011 1313 22 1000000000000011110000000000000111 1212 33 1000000000111111110000000001111111 88 77 1000001111111111110000011111111111 44 1111 100000000000000000100000000000000000 1616 00 100000000000000001100000000000000001 1515 1One 100000000000001111100000000000001111 1212 44 100000000000011111100000000000011111 1111 55 100000000001111111100000000001111111 99 77 100000001111111111100000001111111111 66 1010 100000011111111111100000011111111111 55 1111 100000111111111111100000111111111111 44 1212 10000000000000000001000000000000000000 ...... ...... 10000000000000000111000000000000000011 1000000000000000000010000000000000000000 1000000000000000000110000000000000000001 1000000000000000001110000000000000000011 1000000000000000011110000000000000000111 1000000000000000111110000000000000001111 100000000000000000000100000000000000000000 100000000000000000001100000000000000000001 100000000000000000011100000000000000000011 100000000000000011111100000000000000011111 100000000000011111111100000000000011111111 100000000111111111111100000000111111111111 10000000000000000000001000000000000000000000 10000000000000000000011000000000000000000001 10000000000000000000111000000000000000000011 100000000000000000011110000000000000000111 10000000001111111111111000000000111111111111 1000000000000000000000010000000000000000000000 ....................

상기와 같은 매핑사전 정보를 저장하고, 메모리 상에서 매핑사전 정보를 팽창시키면, 아래 표 10과 같은 형태가 될 것이다. 즉, 첫번째 매핑사전 생성방법(표 8 이하의 방법)에 따라서 생성된 매핑사전When the mapping dictionary information is stored and the mapping dictionary information is expanded in the memory, the mapping information will be as shown in Table 10 below. That is, the mapping dictionary generated according to the first mapping dictionary creation method (the method described below in Table 8)

에서, 오름차순으로 정렬된 제너럴 클러스터의 각 값과 오름차순으로 순차적으로 정렬된 유니버셜 코드의 값이 달라질 때부터의 순번정보가 120 일 경우, 순번정보 1~ 119 번까지는 유니버설 코드와 제너럴 클러스터가 일치하므로 자동적으로 제너럴 클러스터의 값과 유니버설 코드 간의 대응관계를 생성할 수 있고, 순번 정보 120부터는 상기 바이너리 형태의 매핑사전을 각각 제너럴 클러스터의 유일복호성을 이용하여 분리한 뒤, 순번정보 120부터의 유니버설 코드를 오름차순으로 순차적으로 자동적으로 생성하여 매핑할 수 있다. 이렇게 하면, 아래 표 10과 같이 압축 및 복원을 위한 사전 생성된다.The universal code and the general cluster are identical to each other in the sequence numbers 1 to 119 when the sequence numbers from the values of the sequentially sorted universal codes in the ascending order are different from the respective values of the ascending general cluster in the ascending order, A mapping relationship between the value of the general cluster and the universal code can be generated by the generic information 120. The mapping dictionary 120 separates the binary mapping mappings from the order information 120 by using the uniqueness of the general cluster, Can be automatically generated and mapped sequentially. This is pre-created for compression and restoration as shown in Table 10 below.

제너럴 클러스터(상위 119개는 유니버설코드의 자동생성결과로부터 동일하게 생성됨)General clusters (the top 119 are generated identically from the automatic generation of universal code) 유니버설 코드(바이너리 클러스터가 정렬된 상태에서 1:1로 자동적으로 생성됨)Universal code (binary clusters are automatically created 1: 1 with aligned) 1010 1010 100100 100100 101101 101101 10001000 10001000 10011001 10011001 10111011 10111011 1000010000 1000010000 1000110001 1000110001 1001110011 1001110011 1011110111 1011110111 100000100000 100000100000 100001100001 100001100001 100011100011 100011100011 100111100111 100111100111 101111101111 101111101111 10000001000000 10000001000000 10000011000001 10000011000001 10000111000011 10000111000011 10001111000111 10001111000111 10011111001111 10011111001111 10111111011111 10111111011111 1000000010000000 1000000010000000 1000000110000001 1000000110000001 1000001110000011 1000001110000011 …..... .. ……..... ... .. 100000000000000000011100000000000000000011 100000000000111111100000000000111111 100000000000000011111100000000000000011111 100000000001111111100000000001111111 100000000000011111111100000000000011111111 100000000011111111100000000011111111 100000000111111111111100000000111111111111 100000000111111111100000000111111111 10000000000000000000001000000000000000000000 100000001111111111100000001111111111 10000000000000000000011000000000000000000001 100000011111111111100000011111111111 10000000000000000000111000000000000000000011 100000111111111111100000111111111111 100000000000000000011110000000000000000111 100001111111111111100001111111111111 10000000001111111111111000000000111111111111 100011111111111111100011111111111111 1000000000000000000000010000000000000000000000 100111111111111111100111111111111111 1000000000000000000001110000000000000000000011 101111111111111111101111111111111111 1000000000000011111111110000000000000111111111 10000000000000000001000000000000000000 100000000000000000000000100000000000000000000000 10000000000000000011000000000000000001 100000000000000000000011100000000000000000000011 10000000000000000111000000000000000011 100000000000000000001111100000000000000000001111 10000000000000001111000000000000000111 100000000000000000111111100000000000000000111111 10000000000000011111000000000000001111 100000000000000000011111110000000000000000111111 10000000000000111111000000000000011111 1000000000000000000000000010000000000000000000000000 10000000000001111111000000000000111111 100000000000000000000000000100000000000000000000000000 10000000000011111111000000000001111111 100000000000000000000000001100000000000000000000000001 10000000000111111111000000000011111111 1000000000000000000000000001100000000000000000000001 10000000001111111111000000000111111111 10000000000000000000000000111000000000000000000000000011 10000000011111111111000000001111111111 10000000000000000000000000000100000000000000000000000000 10000000111111111111000000011111111111 ……... ... ……... ...

다음으로, 압축부(130)는 상기 매핑사전을 참조하여, 원본 이진데이터로부터 복수의 유니버설 코드를 포함하는 압축데이터를 생성한다(S204). 즉, 상기 생성된 매핑사전(상기 팽창된 매칭사전)을 참조하여, 단계(S202)에서 분할된 원본 이진데이터의 각각의 제너럴 클러스터를 이에 대응하는 유니버설 코드로 치환하여 합침으로써 압축데이터를 생성할 수 있다. Next, the compression unit 130 refers to the mapping dictionary to generate compressed data including a plurality of universal codes from the original binary data (S204). That is, referring to the generated mapping dictionary (the expanded matching dictionary), it is possible to generate compressed data by replacing each general cluster of the original binary data divided in step S202 with a universal code corresponding thereto, have.

이어서, 송신부(140)는 상기 압축데이터와 상기 매핑사전을 결합한 결합데이터를 복원 장치(200) 등의 목적 장치로 전송한다(S205). Then, the transmitting unit 140 transmits the combined data obtained by combining the compressed data and the mapping dictionary to the destination apparatus such as the restoration apparatus 200 (S205).

이와 같이 압축된 압축데이터를 전송함으로써 데이터 전송 속도 및 전송 효율을 향상시킬 수 있다. By transmitting the compressed data in this way, the data transmission rate and transmission efficiency can be improved.

한편, 매핑사전을 생성하는 또 다른 응용예에 대해서 살펴보면, 매핑사전을 생성하는 단계(S203)에서, 사전생성부(120)는 원본 이진데이터에 포함된 복수의 제너럴 클러스터를 N개씩 조합하여 클러스터그룹을 생성하여 오름차순으로 정렬하고, 상기 유니버셜 코드를 오름 차순으로 정렬하여 N개씩 조합한 코드그룹을 생성하며, 상기 정렬된 클러스터 그룹과 상기 코드그룹 간의 대응관계를 정의하여 상기 매핑사전을 생성할 수도 있다. 이를 좀 더 자세히 살펴 보면 다음과 같다.Meanwhile, in another exemplary application for generating a mapping dictionary, in step S203 of generating a mapping dictionary, the dictionary generation unit 120 combines N generic clusters included in the original binary data into cluster groups , Generating the code group in which the universal codes are arranged in ascending order and combining N pieces in ascending order, and creating the mapping dictionary by defining the corresponding relationship between the sorted cluster group and the code group . Here is a closer look.

원본 이진데이터를 2개이상의 제너럴 클러스터 단위로 분리하고, 길이에 따라 오름차순으로 정렬한 뒤 예를 들어 2개씩 묶어서 클러스터그룹을 생성하고, 유니버설 코드도 오름차순으로 순차적으로 정렬한 뒤 예를 들어 2개씩 묶어서 코드그룹을 생성한다. 그리고, 이렇게 하여 생긴 각각의 클러스터그룹과 코드그룹을 1:1로 대응시켜 매핑사전을 생성한다. 이 때 묶는 제너럴 클러스터의 갯수와 유니버설 코드의 갯수는 상기와 같이 2개(2차)가 될 수 있을 뿐만 아니라, 설정에 따라서는 자연수 N개(N차)가 될 수도 있다. The original binary data is divided into two or more general cluster units and sorted in ascending order according to the length. Then, for example, two cluster groups are created by grouping them in ascending order, and the universal codes are also sequentially sorted in ascending order. Create a code group. Then, a mapping dictionary is generated by associating each cluster group and code group thus generated with 1: 1. In this case, the number of the general clusters and the number of the universal codes that are combined at this time can be not only two (secondary) as described above, but also N (natural) numbers depending on the setting.

원본 이진데이터를 N개씩의 제너럴 클러스터 단위(클러스터그룹)로 분리할 경우, 각각의 클러스터그룹은 매핑사전에 근거하여 N개씩의 유니버설코드(코드그룹)에 1;1로 매핑된다. 아래 표 11은 그 결과를 예시적으로 표현한 것이다.When original binary data is divided into N general cluster units (cluster groups), each cluster group is mapped to N universes (code groups) 1: 1 based on a mapping dictionary. Table 11 below is an exemplary representation of the results.

제너럴
클러스터General
cluster 빈도수Frequency 길이Length U codeU code Ucode길이Ucode length diffdiff 10101010 28292829 44 10101010 44 00 1001010010 14031403 55 1010010100 55 00 1010010100 13761376 55 1010110101 55 00 1010110101 16321632 55 1001010010 55 00 1011010110 14371437 55 1011010110 55 00 100010100010 703703 66 101000101000 66 00 100100100100 642642 66 101001101001 66 00 100101100101 770770 66 101011101011 66 00 100110100110 767767 66 100100100100 66 00 101000101000 723723 66 100101100101 66 00 101001101001 882882 66 101100101100 66 00 101011101011 785785 66 101101101101 66 00 101100101100 742742 66 100010100010 66 00 101101101101 701701 66 100110100110 66 00

예를 들어 2차 코드그룹을 생성하는 방법에 대해 살펴 보면, 2 개의 유니버설 코드를 조합하여 유니버설 코드의 그룹(코드그룹)을 생성하면 표 12와 같다.For example, a method of generating a secondary code group will be described. A universal code group (code group) is generated by combining two universal codes as shown in Table 12.

순번turn 부호화데이터Encoded data 비트길이Bit length 1One 10101010 44 22 1001010010 55 33 1011010110 55 44 100010100010 66 55 100110100110 66 66 101110101110 66 77 10000101000010 77 88 10001101000110 77 99 10011101001110 77 1010 10111101011110 77 1111 1000001010000010 88 1212 1000011010000110 88 1313 1000111010001110 88 1414 1001111010011110 88 1515 1011111010111110 88 1616 100000010100000010 99 1717 100000110100000110 99 1818 100001110100001110 99 1919 100011110100011110 99 2020 100111110100111110 99 2121 101111110101111110 99 2222 10000000101000000010 1010 2323 10000001101000000110 1010 2424 10000011101000001110 1010 2525 10000111101000011110 1010 2626 10001111101000111110 1010 2727 10011111101001111110 1010 2828 10111111101011111110 1010 2929 1000000001010000000010 1111 3030 1000000011010000000110 1111 3131 1000000111010000001110 1111 3232 1000001111010000011110 1111 3333 1000011111010000111110 1111 3434 1000111111010001111110 1111 3535 1001111111010011111110 1111 3636 1011111111010111111110 1111 3737 100000000010100000000010 1212 3838 100000000110100000000110 1212 3939 100000001110100000001110 1212 4040 100000011110100000011110 1212 4141 100000111110100000111110 1212 4242 100001111110100001111110 1212 4343 100011111110100011111110 1212 …... …... …...

표 12에 표시된 바와 같이 2개의 유니버설 코드그룹 각각에 대하여 비트길이가 작은 수부터 순차적으로 이진수를 만들어 가게 된다. 이렇게 여러 개의 코드를 조합하게 되면, 전체적으로 단일 코드만으로 특정 정수값을 부호화한 결과값보다는 부호화 결과가 짧은 비트길이를 형성하게 된다. As shown in Table 12, for each of the two universal code groups, a binary number is sequentially generated starting from a bit having a small bit length. Combining multiple codes in this way results in a shorter bit length than the result of encoding a particular integer value with a single code as a whole.

그런데, 표 12의 경우에는, 처음에 첫번째 코드를 "10"으로 고정하고, 다음 코드는 "10"에서 계속 다음 단계인 "100", "101", "1000", "1001", "1011",... 순으로 선택하였는바, 이보다 더 효과적으로 2개의 코드를 조합하여 부호화를 수행할 수 있는 방법도 있을 수 있다. 이러한 방법의 일 예로서, 각각의 조합에 사용되는 최대비트 길이의 코드를 부호화하고자 하는 원본 이진데이터의 갯수에 따라 선택하는 것을 생각해 볼 수 있다. 예를 들어, 100개의 원본 이진데이터에 대하여 부호화데이터를 만들려고 하는 경우, 100개의 순번에 대해 2개의 코드로 구성된 유니버설 코드그룹을 만든다면, 100의 제곱근값인 10 이상의 최소의 자연수 10개의 가장 작은 코드를 조합하여 유니버설 코드를 만들고, 최종 생성된 코드를 비트수 길이대로 오름차순 정렬하여 상위 100개를 취하여 최종 유니버설 코드로 하면 된다. 즉,아래 표 13에서 비트길이가 짧은 순서대로 "10"~"10111"까지의 코드가 필요하다.
In the case of Table 12, the first code is fixed to "10" at the beginning, and the next code is "100", "101", "1000", "1001" , ..., and so on. However, there may be a method that can perform encoding by combining two codes more effectively. As an example of such a method, it is conceivable to select the code of the maximum bit length used for each combination according to the number of original binary data to be encoded. For example, if you want to create encoded data for 100 original binary data, if you create a universal code group consisting of two codes for 100 sequences, you would have a minimum natural number of 10, the square root of 100, The codes are combined to create a universal code, and the final generated code is sorted in ascending order of the number of bits and the top 100 is taken as the final universal code. That is, the code from "10" to "10111"

유니버설 코드 생성을 위한 클러스터Cluster for universal code generation 1010 100100 101101 10001000 10011001 10111011 1000010000 1000110001 1001110011 1011110111 100000100000 100001100001 100011100011 100111100111 101111101111 10000001000000 10000011000001 10000111000011 10001111000111 10011111001111 10111111011111 ......

그리고, 이러한 코드들을 중복을 허락하여 2개씩 조합하게 되면 10*10=100And, if two codes are combined by allowing these codes to be duplicated, 10 * 10 = 100

개의 조합된 부호화 데이터(코드그룹)가 생성되고, 이 부호화 데이터를 비트길이에 따라 오름차순으로 정렬한 뒤, 비트길이(비트수)가 짧은 순으로 상위 100개만을 추출하면, 100개 순번까지 유니버설 코드를 자동적으로 생성할 수 있게 되는 것이다.
(Code group) are generated. When the encoded data is sorted in ascending order according to the bit length and only the upper 100 bits are extracted in the order of the shorter bit length (the number of bits) Can be automatically generated.

상기와 같은 과정을 통해 이진 데이터가 압축되어 전송되면, 이진 데이터 복원장치(200)는 수신부(210)를 통해 상기 압축데이터를 수신하여 복원부(220)에 전달한다. 복원부(220)는 상기 결합데이터(압축데이터+매핑사전)에 포함되어 있는 상기 매핑사전을 참조하여, 상기 압축데이터로부터 원본 이진 데이터를 복원한다. 이 때 복원부(220)는 상술한 압축과정과는 반대의 과정을 통해 이진 데이터를 복원한다. 이 때, 복원부(220)는 상기 매핑사전을 참조하여 복원된 이진데이터에서 최상위비트에 있는 "10"을 삭제하여 최종적인 원본 이진데이터를 복원한다. 이것은 상술한 압축과정에 최상위비트에 추가된 "10"을 삭제하기 위한 것이다.
When the binary data is compressed and transmitted through the above process, the binary data decompression apparatus 200 receives the compressed data through the reception unit 210 and transmits the compressed data to the decompression unit 220. The restoring unit 220 restores the original binary data from the compressed data by referring to the mapping dictionary included in the combined data (compressed data + mapping dictionary). At this time, the restoring unit 220 restores the binary data through a process opposite to the compression process described above. At this time, the restoring unit 220 deletes "10" in the most significant bit from the restored binary data by referring to the mapping dictionary, and restores the final original binary data. This is for deleting "10" added to the most significant bit in the above-mentioned compression process.

이상 살펴 본 바와 같이, 본 실시예에 따른 이진 데이터의 압축 및 복원 방법과 장치는, 간단한 연산과 하드웨어적 구성을 통해 이진 데이터를 신속하고 효율적으로 압축하고 복원할 수 있고, 압축률도 뛰어나며 압축 데이터 및 복원 데이터의 신뢰성도 높일 수 있을 뿐만 아니라 데이터 전송시 전송효율과 속도도 향상시킬 수 있다.As described above, the method and apparatus for compressing and restoring binary data according to the present embodiment can quickly and efficiently compress and restore binary data through a simple operation and a hardware configuration, Not only the reliability of the restored data can be increased, but also the transmission efficiency and speed can be improved in data transmission.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고, 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.
While the invention has been shown and described in detail in the foregoing description, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art, Of the right.

100 : 이진데이터 압축장치
110 : 데이터 스캐닝부
120 : 사전생성부
130 : 압축부
140 : 송신부
200 : 이진데이터 복원장치
210 : 수신부
220 : 복원부100: binary data compression device
110: Data scanning unit
120:
130:
140:
200: Binary data restoration device
210:
220:

Claims

A method of compressing binary data performed by a binary data compression apparatus,
Scanning the original binary data;
Dividing the scanned original binary data into units of a general cluster to obtain a plurality of general clusters;
Generating a mapping dictionary defining a corresponding relationship between each value of the plurality of general clusters and the universal code; And
Generating compressed data including a plurality of universal codes from the original binary data by referring to the mapping dictionary,
Wherein the general cluster represents a binary number including "10" to meet in moving from the least significant bit to the most significant bit of the original binary data, and a binary number between "10"
The universal code includes at least 0 "s " arranged between the most significant bit" 10 ", zero or more consecutive "1" s and the most significant bit & And compressing the binary data.

The method according to claim 1,
Wherein the mapping dictionary defines a mapping relationship between each value of the plurality of general clusters sorted in ascending order and a universal code sequentially arranged in ascending order.

3. The method of claim 2,
In the step of generating the mapping dictionary,
Wherein the binary data compression device calculates a correspondence relationship between each value of the plurality of general clusters and the universal code from a value of a universal code sequentially aligned in ascending order with each value of the plurality of general generators sorted in an ascending order To generate the mapping dictionary. &Lt; Desc / Clms Page number 19 >

3. The method of claim 2,
In the step of generating the mapping dictionary,
And expressing each value of the plurality of general clusters by a combination of the number of "0" s and the number of "1 " s except for the most significant bit of each general cluster.

3. The method of claim 2,
In the step of generating the mapping dictionary,
Wherein the binary data compression device generates the mapping dictionary by arranging each value of the plurality of general clusters arranged in ascending order in a line in an upper bit direction or a lower bit direction.

The method according to claim 1,
In the step of acquiring the plurality of general clusters, division is performed by adding "10" to the most significant bit of the original binary data.

The method according to claim 1,
In the step of generating the mapping dictionary,
Generating a cluster group by combining the plurality of general clusters included in the original binary data in an ascending order and generating a clustered code group by sorting the universal codes in ascending order, And a mapping relation between the code group and the code group is defined to generate the mapping dictionary.

The method according to claim 1,
And transmitting the combined data obtained by combining the compressed data and the mapping dictionary to a target device.

The method according to claim 1,
Wherein the mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in descending order of occurrence frequency of a general cluster and a universal code sequentially arranged in ascending order.

A method for restoring binary data compressed by the binary data compression method according to any one of claims 1 to 9,
And restoring the binary data from the compressed data by referring to the mapping dictionary.

A binary data compression apparatus comprising:
A data scanning unit for scanning the original binary data and dividing the scanned original binary data into units of a general cluster to obtain a plurality of general clusters;
A dictionary generating unit for generating a mapping dictionary defining a corresponding relationship between each value of the plurality of general clusters and the universal code; And
And a compression unit for referring to the mapping dictionary to generate compressed data including a plurality of universal codes from the original binary data,
Wherein the general cluster represents a binary number including "10" to meet in moving from the least significant bit to the most significant bit of the original binary data, and a binary number between "10"
The universal code includes at least 0 "s " arranged between the most significant bit" 10 ", zero or more consecutive "1" s and the most significant bit & Wherein the binary data compression device comprises:

12. The method of claim 11,
Wherein the mapping dictionary defines a corresponding relationship between each value of the plurality of general clusters sorted in ascending order and a universal code sequentially arranged in ascending order.

13. The method of claim 12,
Upon creation of the mapping dictionary,
Wherein the dictionary generation unit defines a corresponding relationship between each value of the plurality of general clusters and the universal code from a value of a universal code sequentially aligned in ascending order with each value of the plurality of generic clusters sorted in an ascending order And generates the mapping dictionary.

13. The method of claim 12,
In generating the mapping dictionary, the dictionary generating unit may generate a plurality of general clusters by combining the number of "0" s and the number of "1" s excluding the " 10 " The binary data compression apparatus comprising:

13. The method of claim 12,
Upon creation of the mapping dictionary,
Wherein the dictionary generating unit generates the mapping dictionary by arranging each value of the plurality of general clusters sorted in ascending order in a line in an upper bit direction or a lower bit direction.

12. The method of claim 11,
Wherein when the plurality of general clusters are acquired, the data scanning unit performs division by adding "10" to the most significant bit of the original binary data.

12. The method of claim 11,
Upon creation of the mapping dictionary,
Wherein the dictionary generation unit generates a cluster group by combining N generic clusters included in the original binary data in ascending order and generates a code group in which N universal codes are arranged in ascending order, And generates the mapping dictionary by defining a mapping relationship between the cluster group and the code group.

12. The method of claim 11,
Further comprising a transmitter for transmitting the combined data obtained by combining the compressed data and the mapping dictionary to a target device.

12. The method of claim 11,
Wherein the mapping dictionary defines a correspondence relationship between each value of the plurality of general clusters sorted in descending order of occurrence frequency of the general cluster and a universal code sequentially arranged in ascending order.

An apparatus for restoring binary data compressed by a binary data compression apparatus according to any one of claims 11 to 19,
And a decompression unit for decompressing the binary data from the compressed data by referring to the mapping dictionary.