KR20210042439A - Binary data compression method and appratus thereof - Google Patents

Binary data compression method and appratus thereof Download PDF

Info

Publication number
KR20210042439A
KR20210042439A KR1020190124967A KR20190124967A KR20210042439A KR 20210042439 A KR20210042439 A KR 20210042439A KR 1020190124967 A KR1020190124967 A KR 1020190124967A KR 20190124967 A KR20190124967 A KR 20190124967A KR 20210042439 A KR20210042439 A KR 20210042439A
Authority
KR
South Korea
Prior art keywords
data
bit
bits
utf
binary cluster
Prior art date
Application number
KR1020190124967A
Other languages
Korean (ko)
Inventor
김정훈
Original Assignee
김정훈
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 김정훈 filed Critical 김정훈
Priority to KR1020190124967A priority Critical patent/KR20210042439A/en
Publication of KR20210042439A publication Critical patent/KR20210042439A/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/55Compression Theory, e.g. compression of random number, repeated compression
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6017Methods or arrangements to increase the throughput
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound

Abstract

According to the present invention, the bit gain of 1 bit from the compressed binary cluster can be utilized as it is by making [compressed binary cluster] + [data that starts with 1 (through proper data modulation or data as it is) and knows how many bits come after it through header bit analysis] + [other data bits to make a fixed bit width].

Description

이진데이터의 압축방법 및 그 장치{BINARY DATA COMPRESSION METHOD AND APPRATUS THEREOF}Binary data compression method and apparatus thereof {BINARY DATA COMPRESSION METHOD AND APPRATUS THEREOF}

발명을 실시하기위한 구체적인 내용에 상술Detailed information for carrying out the invention

발명을 실시하기위한 구체적인 내용에 상술Detailed information for carrying out the invention

발명을 실시하기위한 구체적인 내용에 상술Detailed information for carrying out the invention

발명을 실시하기위한 구체적인 내용에 상술Detailed information for carrying out the invention

발명을 실시하기위한 구체적인 내용에 상술Detailed information for carrying out the invention

예를들어, 임의의 이진데이터가 있을때,For example, when there is arbitrary binary data,

001110111000001111100111000101111....001110111000001111100111000101111....

01을 만날때 마다 데이터를 분리하면 바이너리 클러스터라고 하고Whenever it meets 01, the data is separated and it is called a binary cluster.

001/1101/11000001/1111001/110001/01/111001/1101/11000001/1111001/110001/01/111

마지막에 01을 만나지 못한 데이터는 잉여데이터라고 하자,Let's say that the data that did not meet 01 at the end is the surplus data,

이 바이너리 클러스터는 최하위의 "1"을 제거하면,If you remove the lowest "1" from this binary cluster,

반드시 0으로 끝나는 이진데이터로 1비트 압축이 되고 이를 압축바이너리 클러스터라고 하자.Binary data that must end in 0 is compressed by 1-bit, and this is called a compressed binary cluster.

00/110/1100000/111100/11000/0 // 111 (잉여데이터)00/110/1100000/111100/11000/0 // 111 (surplus data)

압축바이너리 클러스터만 따로 떼어 생각해보면,Considering the compressed binary cluster separately,

이압축바이너리 클러스터를 종으로 일렬로 세우면, 아래와 같다.If this compressed binary cluster is lined up vertically, it is as follows.

0000

110110

11000001100000

111100111100

1100011000

0 0

이렇게 압축바이너리 클러스터를 세로로 세우고 각 행에, 1로 시작하면서, 헤더비트등을 통해서, 이후 몇비트가 존재하는지를 알수 있는 이진데이터열을 결합시킨다. 일례로 UTF-8 문자에 대한 이진데이터열은, In this way, the compressed binary cluster is set vertically, and in each row, starting with 1, through header bits, etc., a binary data string that can know how many bits are present is combined. For example, a binary data string for UTF-8 characters,

0 으로 시작하면 0xxxxxxx 으로서 전체 8비트, Starting with 0, 0xxxxxxx is a total of 8 bits,

110 으로 시작하면, 110xxxxx10xxxxxx 으로서 16비트Starting with 110, 16 bits as 110xxxxx10xxxxxx

1110 으로 시작하면, 1110xxxx10xxxxxx10xxxxxx 으로서 24비트,Starting with 1110, 24 bits as 1110xxxx10xxxxxx10xxxxxx,

11110 으로 시작하면, 11110xxx10xxxxxx10xxxxxx10xxxxxx 으로서 32비트로서Starting with 11110, 11110xxx10xxxxxx10xxxxxx10xxxxxx as 32 bits

헤더비트열을 분석하면, 이후 몇바이트가 같은 데이터덩어리인지 알수있다.By analyzing the header bit sequence, you can see how many bytes are the same data chunk.

이 경우, 0으로 시작하는 문자열앞에 1을 부가하면, 10xxxxxxx 으로 9비트로 만들면, 모든 케이스를 구분할수 있으면서도 1로 시작하여 하기와 같이 압축바이너리클러스터와 결합할수 있다. In this case, if 1 is added to the front of the string starting with 0, and made into 9 bits with 10xxxxxxx, all cases can be distinguished and combined with the compressed binary cluster as follows, starting with 1.

00 <UTF-8비트열>00 <UTF-8 bit string>

110<UTF-8비트열>110<UTF-8 bit string>

1100000<UTF-8비트열>1100000<UTF-8 bit string>

111100<UTF-8비트열>111100<UTF-8 bit string>

11000<UTF-8비트열>11000<UTF-8 bit string>

0 <UTF-8비트열>0 <UTF-8 bit string>

이제 각 행단위로 동일한 비트수를 만들기위한 other data들을 순차적으로 해당비트만큼만 가져와서 연결하면, 각 행모두 동일한 비트너비를 가지면서도 각 해단위에서 0비트에서 1비트씩의 비트이익이 발생한다. 이렇게 고정비트열이 되면, 하나의 행으로 모두 연결하여 결합하여 전송하고 수신측에서 고정비트열단위로 분리하여 각기 압축해제 할 수 있다.Now, if the other data for making the same number of bits in each row are sequentially fetched and connected as much as the corresponding bit, bit gains of 0 to 1 bit in each solution unit occur while each row has the same bit width. When it becomes a fixed bit string, it can be combined and transmitted in one row, and each can be decompressed by separating it into fixed bit columns at the receiving side.

00 <UTF-8비트열><....other data bits>00 <UTF-8 bit string> <....other data bits>

110<UTF-8비트열><....other data bits>110<UTF-8 bit string><....other data bits>

1100000<UTF-8비트열><other data bits>1100000<UTF-8 bit string> <other data bits>

111100<UTF-8비트열><.other data bits>111100<UTF-8 bit string> <.other data bits>

11000<UTF-8비트열><..other data bits>11000<UTF-8 bit string> <..other data bits>

0 <UTF-8비트열><.....other data bits>0 <UTF-8 bit string> <.....other data bits>

본 발명은 이와 같이, The present invention is thus,

[압축바이너리클러스터] + [(적절한데이터 변조를 통해 또는 데이터 그대로) 1로 시작하면서 헤더비트 분석을 통해 이후 몇비트가 오는지 알수있는 데이터] + [고정비트너비를 만들기 위한 other data bits] 를 만들면 압축바이너리 클러스터로부터 1비트의 비트이익이 그대로 활용될수 있는 발명이다.[Compressed Binary Cluster] + [(through appropriate data modulation or data as it is) data that starts with 1 and knows how many bits are coming after header bit analysis] + [other data bits to make a fixed bit width] It is an invention that can utilize the bit profit of 1 bit from the binary cluster as it is.

특히 중간에 [(적절한데이터 변조를 통해 또는 데이터 그대로) 1로 시작하면서 헤더비트 분석을 통해 이후 몇비트가 오는지 알수있는 데이터] 는 기존 압축알고리즘 데이터 또는 같은 특성을 가진 어떤 데이터도 무방하다.In particular, in the middle [data that starts with 1 (through appropriate data modulation or data as it is) and lets you know how many bits are coming after header bit analysis], existing compression algorithm data or any data with the same characteristics can be used.

중간데이터에 대한 또다른 실시례로서, deflate 를 들자면, deflate는 매우 대중적인 압축알고리즘이고 국제표준인데,As another example for intermediate data, to take deflate, deflate is a very popular compression algorithm and is an international standard.

내부적으로는 multi-block 구조로 되어 있고, 이를 각 블럭별로 구분 할 수 있으며, deflate를 정의한 RFC-1951을 참고하여 간단한 변조과정을 거치면 아래와 같이 각 블럭의 첫바이트에 헤더비트가 표현하고 end-of-block이 끝에 존재하는 형태로 변형이 가능하다(end_of_block은 허프만 코드로서 뒤에 임의의 코드가 붙어도 유일하게 앞 데이터에 이어서 순차적으로 분리되어 확인된다), deflate의 특성상 바이트의 LSB(least significant bits)에서부터 데이터가 패킹되어, 첫번째바이트의 최하위 비트가 마지막 블럭인지(=1), 중간블럭인지(=0)을 나타내고 이후 2비트가 (10, 11, 01, 00 이 압축방법 및 예약비트를 나타낸다)Internally, it has a multi-block structure, and it can be classified for each block. If a simple modulation process is performed by referring to RFC-1951, which defines deflate, a header bit is expressed in the first byte of each block and end-of -Block can be transformed into a form at the end. Data is packed, and the least significant bit of the first byte indicates whether it is the last block (=1) or the middle block (=0), and the subsequent 2 bits (10, 11, 01, 00 indicate the compression method and reserved bits)

Figure pat00001
Figure pat00001

상기 데이터를 약간의 변조과정을 거쳐, 마지막블럭여부를 나타내는 최하위 비트를 최상위로 옮기고, 중간블럭들이 대부분일텐데 이것이 0으로 되어 있으므로 1로 세팅하고, 마지막블럭은 그대로 1 으로 세팅하여 아래와같이 변조한다. 물론 다른 비트들이 같이 따라온다거나 순서를 역방향으로 하는 형태의 변조도 가능하다, 중요한것은 중간블럭의 헤더의 첫비트가 0에서 1로 세팅되고,끝 블럭을 나타내는 헤더비트는 1그대로 1 이되고, 맨앞으로 온다는 점이다. 끝블럭을 구분하는 방법은, 블럭의 전체 갯수를 압축데이터에 포함한다거나, 디코딩과정에서 남아있는 데이터의 양을 보고, 고정비트열 비트길이만큼 남아있다면, 마지막 블럭임을 자연스럽게 알수있다.After some modulation of the data, the least significant bit indicating whether or not the last block is moved to the highest level, and most of the intermediate blocks are set to 1 because this is 0, and the last block is set to 1 as it is and modulated as follows. Of course, it is also possible to modulate in the form of other bits following together or in reverse order. Importantly, the first bit of the middle block header is set from 0 to 1, and the header bit indicating the end block is 1 as it is. It comes to the forefront. The method of classifying the last block is to include the total number of blocks in the compressed data, or look at the amount of data remaining in the decoding process, and if it remains as long as the bit length of a fixed bit string, it can be naturally known that it is the last block.

Figure pat00002
Figure pat00002

다음으로 상기 바이너리 클러스터를 규칙에 따라 압축바이너리 클러스터로 만든다.Next, the binary cluster is made into a compressed binary cluster according to the rules.

Figure pat00003
Figure pat00003

이 압축바이너리 클러스터 각각을 상기 deflate의 data block과 결합시킨다. 그리고 상기에서처럼, 고정비트열을 만들기 위한 other data bit들을 순차적으로 필요한 양만큼 가져와서 결합시킨다. 결합된 데이터는 1줄의 일렬 데이터로 결합하여 전송하고, 고정비트열만큼 수신측에서는 떼서 압축해제한다.Each of these compressed binary clusters is combined with the data block of the deflate. And, as in the above, other data bits for making a fixed bit string are sequentially fetched and combined as needed. The combined data is combined into one line of data in a row and transmitted, and the receiving side separates and decompresses as much as a fixed bit string.

하기 그림처럼 0을 만났다가 다시 1을 처음 만날때가 헤더비트의 시작점이고, 이전이 압축바이너리 클러스터 영역이다. 이 헤더비트에서부터 8비트를 떼서, 최상위 비트 1 ==> 0 으로 두고, 다시 최하위 비트로 이동시키면원래의 deflate의 헤더비트열의 시작이 되고 defalte의 decoding규칙을 적용해가면 자연스럽게 end-of-block까지가 압축데이터 블럭으로 구분되어 압축해제하면 원본이 복구된다.other data bit도 각기 순차적으로 결합시키면 원본이 복구된다. 압축바이너리 클러스터도 최하위 비트이후에 1을 결합하여 원본 바이너리 클러스터로복구한다.As shown in the figure below, the first time it encounters 0 and then 1 again is the starting point of the header bit, and the previous is the compressed binary cluster area. Separating 8 bits from this header bit, leaving the most significant bit as 1 ==> 0, and moving to the least significant bit again, the header bit string of the original deflate starts, and if the decoding rule of defalte is applied, the end-of-block naturally goes up. The original is restored when it is divided into compressed data blocks and decompressed, and the original is restored by sequentially combining other data bits. The compressed binary cluster is also restored to the original binary cluster by combining 1 after the least significant bit.

Figure pat00004
Figure pat00004

끝블럭을 구분하는 방법은, 상기 설명대로 블럭의 전체 갯수를 압축데이터에 포함한다거나, 디코딩과정에서 남아있는 데이터의 양을 보고, 고정비트열 비트길이만큼 남아있다면, 마지막 블럭임을 자연스럽게 알수있다.As for the method of classifying the last block, as described above, if the total number of blocks is included in the compressed data, or if the amount of data remaining in the decoding process is viewed and remains as long as the bit length of the fixed bit string, it can be naturally known that it is the last block.

Claims (1)

국내우선권 주장 선출원으로서 별도의 청구범위를 기재하지 아니함.As a pre-application for claiming domestic priority, no separate claim scope has been stated.
KR1020190124967A 2019-10-10 2019-10-10 Binary data compression method and appratus thereof KR20210042439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020190124967A KR20210042439A (en) 2019-10-10 2019-10-10 Binary data compression method and appratus thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020190124967A KR20210042439A (en) 2019-10-10 2019-10-10 Binary data compression method and appratus thereof

Publications (1)

Publication Number Publication Date
KR20210042439A true KR20210042439A (en) 2021-04-20

Family

ID=75743049

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020190124967A KR20210042439A (en) 2019-10-10 2019-10-10 Binary data compression method and appratus thereof

Country Status (1)

Country Link
KR (1) KR20210042439A (en)

Similar Documents

Publication Publication Date Title
US5260693A (en) Method and system for lossless and adaptive data compression and decompression
JP5498783B2 (en) Data compression method
US7990289B2 (en) Combinatorial coding/decoding for electrical computers and digital data processing systems
US20090019071A1 (en) Blocking for combinatorial coding/decoding for electrical computers and digital data processing systems
US7224293B2 (en) Data compression system and method
CN101667843B (en) Methods and devices for compressing and uncompressing data of embedded system
US20090015444A1 (en) Data compression for communication between two or more components in a system
WO2009009574A2 (en) Blocking for combinatorial coding/decoding for electrical computers and digital data processing systems
US8688621B2 (en) Systems and methods for information compression
KR20160123302A (en) Devices and methods of source-encoding and decoding of data
US6225922B1 (en) System and method for compressing data using adaptive field encoding
EP3444952A1 (en) Data compression apparatus, data decompression apparatus, data compression program, data decompression program, data compression method, and data decompression method
CN105052040A (en) System and method for multi-stream compression and decompression
US20130063287A1 (en) Decoding encoded data
CN106776663B (en) Audio file compression method and device
JP2007520112A (en) Quickly queryable data compression format for XML files
CN113312325B (en) Track data transmission method, device, equipment and storage medium
CN112290953B (en) Array encoding device and method, array decoding device and method for multi-channel data stream
KR20210042439A (en) Binary data compression method and appratus thereof
CN110719105B (en) Lossless compression and decompression method for test vector
CN113035278A (en) TPBWT-based sliding window compression method based on self-indexing structure
CN110288666B (en) Data compression method and device
CN115225724B (en) Data compression techniques using partitioning and irrelevant bit elimination
JP5070086B2 (en) Data compression apparatus and image reading apparatus
KR20190091586A (en) TCP/IP Packet data compression method and appratus based on binary compression method