KR101857385B1

KR101857385B1 - Method and Apparatus for checking error of Lempel-Ziv 77 lossless compressed data

Info

Publication number: KR101857385B1
Application number: KR1020170020412A
Authority: KR
Inventors: 권범; 이상훈; 공명식; 김민창; 김진우
Original assignee: 국방과학연구소
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2018-05-11

Abstract

The present invention relates to compressed data error check technique, and more particularly, to a method and an apparatus for checking whether an error occurs regardless of the number of error bits before performing decoding for data compressed by a Lempel-Ziv (LZ) 77 algorithm. The method comprises: receiving a character string to be compressed; setting a coding position; sequentially outputting an output tuple; and determining whether the compressed string is an error or not.

Description

[0001] The present invention relates to a method and an apparatus for checking error in LZ77 lossless compressed data,

본 발명은 압축 데이터 오류 체크 기술에 관한 것으로서, 더 상세하게는 LZ(Lempel-Ziv)77 알고리즘으로 압축된 데이터에 대해 복호화를 수행하기 전에 오류 비트 수에 상관없이 오류의 발생 여부를 확인할 수 있는 오류 체크 방법 및 장치에 대한 것이다.The present invention relates to a compressed data error checking technique, and more particularly, to an error checking technique for checking the occurrence of an error irrespective of the number of error bits before decrypting data compressed with the LZ (Lempel-Ziv) Checking methods and devices.

또한, 본 발명은 임의의 알고리즘으로 압축된 데이터에 대해 LZ77 알고리즘으로 압축된 데이터인지 여부 정보를 제공하는 오류 체크 방법 및 장치에 대한 것이다.The present invention also relates to an error check method and apparatus for providing information on whether or not compressed data is compressed by the LZ77 algorithm for compressed data with an arbitrary algorithm.

데이터 압축은 손실 및 무손실로 분류될 수 있다. 손실 압축 방식에서는 압축 해제된 데이터가 원래의 데이터와 동일하지 않을 수도 있다. 손실 압축 방식과는 달리, 무손실 압축 방식에서는 데이터 압축 및 압축 해제 동안 데이터의 원형이 보존된다. 무손실 압축 방식은 사전(dictionary) 부호화 및 엔트로피(entropy) 부호화 유형으로 분류된다.Data compression can be classified as loss and lossless. In a lossy compression scheme, the decompressed data may not be the same as the original data. Unlike the lossy compression method, the data type is preserved during data compression and decompression in the lossless compression method. Lossless compression schemes are classified into dictionary coding and entropy coding types.

가장 널리 사용되는 사전 부호화 알고리즘은 Abraham Lempel과 Jakob Ziv가 만든 Lempel-Ziv(LZ) 알고리즘들 및 그들의 변종들이다. 특히 1977년에 만들어진 LZ77 알고리즘은 현재 압축하고자 하는 문자열이 이미 출현하였는지 확인하고 출현하였다면 상기 문자열을 출력하는 대신 이미 출현한 문자열에 대한 포인터의 위치와 일치하는 길이를 출력하는 방식으로 데이터를 압축한다. The most widely used pre-coding algorithms are the Lempel-Ziv (LZ) algorithms made by Abraham Lempel and Jakob Ziv and their variants. In particular, the LZ77 algorithm made in 1977 confirms whether a character string to be compressed has already appeared and compresses the data by outputting a length corresponding to the position of a pointer to a character string that has already appeared, instead of outputting the character string.

그런데, 부호화된 데이터는 전송 과정에서 오류가 발생할 수 있고, 오류를 검출하는 방식에는 패리티 비트(parity bit)를 이용하는 방식이 있다. 패리티 비트는 주어진 비트열에 1이 홀수 번 나오는지 짝수 번 나오는지 추가적인 정보를 입력하는 방식이다. However, the encoded data may cause an error in the transmission process, and the error detection method uses a parity bit. The parity bit is a method of inputting additional information in the given bit string to determine whether 1 is an odd number or an even number.

따라서, 짝수 개의 오류가 발생하면 오류를 검출하지 못한다는 단점과 데이터 전송 과정에서 패리티 비트 자체에도 오류가 생길 수 있다는 단점 때문에 현재는 많이 사용되지 않고 있다. Therefore, the error is not detected when an even number of errors occur, and the parity bit itself may have an error in the data transmission process.

오류를 검출하는 방식으로 현재 가장 보편적으로 사용되는 방법은 해밍(Hamming)이 1950년대에 소개한 해밍 부호로 소위 '해밍 부호'로 불리는 방식이다. 4비트의 데이터를 전송하기 위해 3비트의 패리티 비트를 추가하기 때문에 이런 이름을 가진다. The most commonly used method for detecting errors is the Hamming code introduced in the 1950s by Hamming, which is called 'Hamming Code'. It has this name because it adds 3 bits of parity bits to transmit 4 bits of data.

이 부호는 모든 1비트 오류를 감지해서 정정하는 것이 가능하며, 2비트 오류도 감지 할 수 있다는 장점이 있다. 하지만 해밍 부호 방식은 3비트 이상에 오류가 일어났을 경우 오류를 감지 할 수 없다는 한계를 갖는다.This code has the advantage of being able to detect and correct all 1-bit errors and to detect 2-bit errors. However, the Hamming coding scheme has a limitation that an error can not be detected when an error occurs in 3 bits or more.

1. 한국공개특허번호 제2001-0084412호(발명의 명칭: 데이터 압축 알고리즘)1. Korean Patent Laid-Open No. 2001-0084412 (entitled: Data Compression Algorithm) 2. 한국공개특허번호 제0-2011-0007865호(발명의 명칭: 데이터의 압축 방법)2. Korean Patent Publication No. 0-2011-0007865 entitled " Data compression method "

1. 정순철외, "전략 테이블과 유전 알고리즘을 이용한 LZ77 알고리즘의 성능 개선"학술논문 정보과학회논문지, 소프트웨어 및 응용 제31권 제12호 (2004. 12) pp.1628-16361. Chung, Sun-Cheol et al., "Performance Improvement of LZ77 Algorithm Using Strategic Tables and Genetic Algorithms", Journal of Information Science and Technology Volume 31, Issue 12 (2004. 12) pp.1628-1636

본 발명은 위 배경기술에 따른 문제점을 해소하기 위해 제안된 것으로서, LZ(Lempel-Ziv)77 알고리즘으로 압축된 데이터에 대해서 오류 비트 수에 상관없이 오류의 발생여부를 체크할 수 있는 오류 체크 방법 및 장치를 제공하는데 그 목적이 있다.The present invention has been proposed in order to solve the problem according to the above background art, and it is an object of the present invention to provide an error checking method capable of checking whether an error has occurred regardless of the number of error bits for data compressed with the LZ (Lempel-Ziv) The purpose of the device is to provide.

또한, 본 발명은 임의의 알고리즘으로 압축된 데이터에 대해 LZ77 알고리즘으로 압축된 데이터인지 여부 정보를 제공하는 오류 체크 방법 및 장치를 제공하는데 다른 목적이 있다.It is another object of the present invention to provide an error check method and apparatus for providing information on whether compressed data is compressed by an LZ77 algorithm with an arbitrary algorithm.

본 발명은 위에서 제시된 과제를 달성하기 위해, LZ(Lempel-Ziv)77 알고리즘으로 압축된 데이터에 대해서 오류 비트 수에 상관없이 오류의 발생여부를 체크할 수 있는 오류 체크 방법을 제공한다.The present invention provides an error checking method capable of checking whether an error has occurred regardless of the number of error bits with respect to data compressed by the LZ (Lempel-Ziv) 77 algorithm.

상기 오류 체크 방법은,The error check method includes:

(a) 압축될 문자열을 입력받는 단계;(a) receiving a string to be compressed;

(b) 상기 압축될 문자열의 시작 부분에 코딩위치를 설정하는 단계;(b) setting a coding position at the beginning of the string to be compressed;

(c) 상기 코딩 위치를 기준으로 상기 압축될 문자열 중 이미 압축이 완료된 문자열을 갖는 서치 버퍼에서 상기 코딩 위치를 기준으로 상기 압축될 문자열 중 아직 압축이 완료되지 않은 문자열을 갖는 룩어헤드 버퍼의 시작부분부터 일치하는 문자열을 찾아 상기 룩어헤드 버퍼가 빌 때 까지 아웃풋 투플을 순차적으로 출력하는 단계; 및(c) a first portion of the lookahead buffer having a yet-to-be-compressed character string to be compressed based on the coding position in a search buffer having a character string already compressed among the characters to be compressed based on the coding position, And sequentially outputting the output tuples until the lookahead buffer is empty; And

(d) 순차적으로 출력되는 아웃풀 투플을 미리 설정되는 오류 체크 조건과 비교하여 비교 결과에 따라 LZ77(Lempel-Ziv) 알고리즘을 통하여 압축이 완료된 문자열을 오류 또는 무오류로 판정하는 단계;를 포함한다. (d) comparing the out-pooled tuples sequentially output with an error check condition set in advance, and determining the compressed string as error or no error through the LZ77 (Lempel-Ziv) algorithm according to the comparison result.

이때, 상기 아웃풋 투플은 상기 코딩 위치에 따른 시작 위치(p), 상기 서치 버퍼와 룩어헤드 버퍼에서 일치하는 문자열 길이(len), 상기 룩어헤드 버퍼에서 일치하지 않는 제일 첫 문자(C)를 포함하는 것을 특징으로 할 수 있다.At this time, the output tuple includes a start position p according to the coding position, a string length len coinciding with the search buffer in the lookahead buffer, and a first character C not matching in the lookahead buffer . &Lt; / RTI >

또한, 상기 (d) 단계는, (d-1) 상기 오류 체크 조건 중 제 1 조건으로서 첫 번째 아웃풋 투플의 시작 위치(p)와 문자열 길이(len)의 값이 각각 영(0)임을 확인하는 단계; (d-2) 상기 오류 체크 조건 중 제 2 조건으로서 상기 시작 위치(p)의 값이 상기 문자열 길이(len)의 값보다 항상 크거나 같은지 확인하는 단계; 및 (d-3) 상기 오류 체크 조건 중 제 3 조건으로서 상기 시작 위치(p)의 값이 항상 상기 서치 버퍼의 크기(W)보다 작거나 같은지 확인하는 단계;를 포함하는 것을 특징으로 할 수 있다.The step (d) includes the steps of: (d-1) confirming that the value of the start position (p) and the length (len) of the first output tuple as the first condition among the error check conditions is zero step; (d-2) checking whether the value of the start position (p) is always greater than or equal to the value of the string length (len) as a second condition of the error check condition; And (d-3) checking whether the value of the start position (p) is always smaller than or equal to a size (W) of the search buffer as a third condition of the error check condition .

또한, 상기 (d-2) 내지 (d-3) 단계는 LZ77 무손실 압축 데이터의 마지막 아웃풋 투플에 도달할 때까지 반복하여 이루어지는 것을 특징으로 할 수 있다.The steps (d-2) to (d-3) may be repeated until the last output tuple of the LZ77 lossless compressed data is reached.

또한, 상기 (d-1) 내지 (d-3) 단계 중 어느 하나의 단계라도 만족되지 않으면 오류로 판정되는 것을 특징으로 할 수 있다.If any one of the steps (d-1) to (d-3) is not satisfied, an error is determined.

또한, 상기 시작 위치(p)와 문자열 길이(len)는 L자리 수의 2진 부호(L bits)로 표현되며, 상기 첫 문자(C)는 8자리 수의 2진 부호(8 bits)로 표현되는 것을 특징으로 할 수 있다.The starting position p and the string length len are expressed by L bits of L digits and the first character C is represented by 8 bits of 8 bits. .

또한, 상기 L은 상기 서치 버퍼의 크기(W)와 수학식

으로 정의되는 것을 특징으로 할 수 있다.L is the size of the search buffer (W)

As shown in FIG.

또한, 상기 첫 문자(C)가 될 수 있는 문자열은 아스키 코드(ASCII)인 것을 특징으로 할 수 있다.In addition, the character string that can be the first character (C) may be an ASCII code.

또한, 상기 압축이 완료된 문자열은 수학식

bits(여기서, N은 아웃풋 투플의 총개수이고, M은 압축이 완료된 문자열의 길이를 나타낸다)에 의해 정해지는 것을 특징으로 할 수 있다.In addition, the compressed string is expressed by the following equation

bits (where N is the total number of output tuples and M is the length of the compressed string).

다른 한편으로, 본 발명의 다른 일실시예는, 압축될 문자열을 입력받는 입력부; 입력받은 문자열중 이미 압축이 완료된 문자열을 갖는 서치 버퍼와 아직 압축이 완료되지 않은 문자열을 갖는 룩어헤드 버퍼를 갖는 버퍼부; 상기 압축될 문자열의 시작 부분에 코딩위치를 설정하고, 상기 코딩 위치를 기준으로 상기 서치 버퍼에서 완료되지 않은 문자열을 갖는 룩어헤드 버퍼의 시작부분부터 일치하는 문자열을 찾아 상기 룩어헤드 버퍼가 빌 때 까지 아웃풋 투플을 순차적으로 출력하는 오류 체크부; 및 순차적으로 출력되는 아웃풀 투플을 미리 설정되는 오류 체크 조건과 비교하여 비교 결과에 따라 출력 결과 문자열을 오류 또는 무오류로 판정하는 오류 판단부;를 포함하는 LZ77(Lempel-Ziv) 무손실 압축 데이터의 오류 체크 장치를 제공할 수 있다.On the other hand, another embodiment of the present invention provides a data processing apparatus comprising: an input unit for inputting a character string to be compressed; A buffer unit having a search buffer having a string that has already been compressed among the input strings and a lookahead buffer having a string that has not yet been compressed; A coding position is set at the beginning of the character string to be compressed and a matching character string is found from the beginning of the lookahead buffer having an uncompleted character string in the search buffer on the basis of the coding position, An error check unit for sequentially outputting the output tuples; And an error determination unit for comparing the out-pooled tuple sequentially output with an error check condition set in advance and determining an output result string as error or no error according to a result of comparison, LZ77 (Lempel-Ziv) A check device can be provided.

본 발명에 따르면, LZ(Lempel-Ziv)77 알고리즘으로 압축된 데이터에 대해 복호화를 수행하기 전에 오류의 발생 여부를 알 수 있다. According to the present invention, it is possible to know whether or not an error has occurred before decoding the data compressed by the LZ (Lempel-Ziv) 77 algorithm.

또한, 본 발명의 다른 효과로서는 임의의 알고리즘으로 압축된 데이터에 대해 LZ77 알고리즘으로 압축된 데이터인지의 여부를 판별할 수 있다는 점을 들 수 있다.Another advantage of the present invention is that it is possible to determine whether or not data compressed by an arbitrary algorithm is compressed by the LZ77 algorithm.

도 1은 일반적인 LZ(Lempel-Ziv)77 알고리즘의 부호화 과정을 나타낸 개념도이다.
도 2는 일반적인 LZ77 알고리즘의 부호화 과정을 통해 출력되는 압축 결과 문자열(output stream)의 구성도이다.
도 3은 본 발명의 일실시예에 따른 LZ77 무손실 압축 데이터의 오류 체크 과정을 보여주는 흐름도이다.
도 4는 본 발명의 일실시예에 따른 LZ77 무손실 압축 데이터의 오류 체크 장치(400)의 구성 블록도이다.1 is a conceptual diagram showing a coding process of a general LZ (Lempel-Ziv) 77 algorithm.
2 is a configuration diagram of a compression result string output through a general LZ77 algorithm encoding process.
3 is a flowchart illustrating an error checking process of LZ77 lossless compressed data according to an embodiment of the present invention.
FIG. 4 is a block diagram of the configuration of an error checking apparatus 400 for LZ77 lossless compressed data according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다.Like reference numerals are used for similar elements in describing each drawing.

제 1, 제 2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. "및/또는" 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term "and / or" includes any combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Should not.

이하 첨부된 도면을 참조하여 본 발명의 일실시예에 따른 LZ77 무손실 압축 데이터의 오류 체크 방법 및 장치를 상세하게 설명하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a method and an apparatus for checking error in LZ77 lossless compressed data according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

또한, 발명에 대한 설명의 편의를 위해 아래와 같은 용어들을 먼저 설명한다.In addition, for convenience of explanation of the invention, the following terms will be described first.

1) Input stream: 압축될 원본 문자열1) Input stream: the original string to be compressed

2) Output stream: 압축된 압축 결과 문자열2) Output stream: Compressed compression result string

3) Character: 압축될 원본 문자열의 기본적인 데이터 단위3) Character: The basic data unit of the original string to be compressed.

4) Coding position: Input stream 내에서 현재 압축을 진행하고 있는 처리지점의 위치 (Lookahead buffer의 처음과 일치)4) Coding position: The position of the processing point currently compressing in the input stream (matching the beginning of the lookahead buffer)

5) Lookahead buffer: Coding position에서 input stream의 마지막 character까지의 문자열5) Lookahead buffer: string from coding position to the last character of input stream

6) Search buffer: 이미 처리가 끝난 W(=size of search buffer)개의 character들. 즉, coding position 앞에 있는 W개의 문자열을 의미한다.6) Search buffer: W (= size of search buffer) characters already processed. That is, it means W strings before the coding position.

7) Pointer(p): Search buffer와 lookahead buffer에서 일치한 문자열을 찾은 경우, 그 문자열의 시작 위치를 의미하고 coding position으로부터 몇 character 떨어져 있는 가로 표현한다.7) Pointer (p): When a matching string is found in the search buffer and lookahead buffer, it means the start position of the string and it is horizontal to several characters apart from the coding position.

8) Length(len): Search buffer와 lookahead buffer에서 일치하는 문자열의 길이8) Length (len): Length of matching string in search and lookahead buffer

9) C: lookahead에서 일치하지 않는 제일 첫 character9) C: the first character that does not match in the lookahead

도 1은 일반적인 LZ(Lempel-Ziv)77 알고리즘의 부호화 과정을 나타낸 개념도이다. 도 1을 참조하면, LZ77 알고리즘의 압축 원리는 서치 버퍼(search buffer)에서 룩어헤드 버퍼(lookahead buffer)의 앞부분과 일치하는 가장 긴 문자열을 찾고, 찾아낸 문자열을 출력하는 대신에 그 위치(p)와 일치하는 길이(len) 즉, (p, len)을 출력 결과 문자열(output stream)로 출력하는 것이다. 1 is a conceptual diagram showing a coding process of a general LZ (Lempel-Ziv) 77 algorithm. Referring to FIG. 1, the compression principle of the LZ77 algorithm is to search for the longest character that coincides with the beginning of the lookahead buffer in the search buffer, And output the matching length len, (p, len) as an output stream.

따라서 찾아낸 문자열의 길이가 길수록 압축효율이 높아진다. 만약 서치 버퍼(search buffer)에 일치하는 character가 하나도 없는 경우, 룩어헤드 버퍼(lookahead buffer)에서 일치하지 않는 제일 첫 character(C)를 search buffer에 저장하여, 향후 상기 C character가 출현 할 경우에 C 압축할 수 있도록 한다. Therefore, the longer the length of the found string, the higher the compression efficiency. If there is no matching character in the search buffer, the first non-matching character (C) in the lookahead buffer is stored in the search buffer. If the C character appears in the search buffer in the future, C So that it can be compressed.

LZ77 알고리즘의 부호화는 다음의 과정을 통해 진행된다. (1) 코딩 위치(coding position)를 압축될 원본 문자열(input stream)의 시작에 위치시킨다. (2) 서치 버퍼(search buffer)에서 룩어헤드 버퍼(lookahead buffer)의 시작부분부터 일치하는 가장 긴 문자열을 찾는다. (3) 아웃풋 투플(output tuple) (p, len, C)를 출력한다. (4) 룩어헤드 버퍼(lookahead buffer)가 비어있지 않다면 코딩 위치(coding position)와 서치 버퍼(search buffer)를 len+1 만큼 옮긴다. 그리고 룩어헤드 버퍼(lookahead buffer)가 빌 때까지 (2)-(4)의 과정을 반복한다. The encoding of the LZ77 algorithm proceeds through the following process. (1) Position the coding position at the beginning of the input stream to be compressed. (2) Look up the longest matching string from the beginning of the lookahead buffer in the search buffer. (3) Output tuple output (p, len, C). (4) If the lookahead buffer is not empty, the coding position and the search buffer are shifted by len + 1. Repeat (2) - (4) until the lookahead buffer is empty.

결과적으로 LZ77 알고리즘의 부호화 과정을 통해 출력되는 출력 결과 문자열(output stream)은 (p, len, C) 투플(tuple)들의 열을 2진 부호(binary code)로 표현한 형태가 된다. 여기서 p와 len은 L자리 수의 2진 부호(L bits)로 표현되며, C는 8자리 수의 2진 부호(8 bits)로 표현된다. 이때, L은 서치 버퍼(search buffer)의 크기 W와 다음식과 같은 관계를 갖는다.As a result, the output stream output through the encoding process of the LZ77 algorithm is a form (p, len, C) in which the rows of tuples are expressed as binary codes. Where p and len are represented by binary numbers of L digits (L bits), and C is represented by 8 digits of binary codes (8 bits). At this time, L has a relationship with a size W of a search buffer and the following expression.

예를 들어, W가 12인 경우 L은 4가 되고, p와 len은 4자리 수의 2진 부호로 표현된다. C는 W의 값과 상관없이 항상 8자리 수의 2진 부호로 표현된다. 8bits의 C가 될 수 있는 문자열은 아스키 코드(ASCII)와 같다. For example, if W is 12, L becomes 4, and p and len are represented by a 4-digit binary code. C is always represented by an 8-digit binary code, regardless of the value of W. A string that can be a C in 8 bits is equivalent to ASCII (ASCII).

도 2는 일반적인 LZ77 알고리즘의 부호화 과정을 통해 출력되는 압축 결과 문자열(output stream)의 구성도이다. 도 2를 참조하면, 설명의 편의를 위해 LZ77 알고리즘으로 부호화된 압축 결과 문자열(output stream)이 총

bits로 구성되어 있다고 가정한다. 그리고 이해를 돕기 위해 아웃풋 투플(output tuple)에서 p에 해당하는 열(column)을

, len에 해당하는 열을

, C에 해당하는 열을

로 표현한다. 이때,

는 열의 순서를 나타내는 색인(index)으로

사이의 값을 갖는다.2 is a configuration diagram of a compression result string output through a general LZ77 algorithm encoding process. Referring to FIG. 2, for convenience of description, a compression result string (output stream)

bits. And for the sake of clarity, you can use the output tuple as a column for p

, the heat corresponding to len

, The column corresponding to C

. At this time,

Is an index indicating the order of the columns.

Lt; / RTI >

도 3은 본 발명의 일실시예에 따른 LZ77 무손실 압축 데이터의 오류 체크 과정을 보여주는 흐름도이다. 본 발명의 일실시예에서는 LZ77 무손실 압축 데이터의 오류 발생 여부를 확인하기 위해, LZ77 알고리즘 출력 결과 문자열(output stream)이 다음과 같은 특성을 갖는 다는 것을 고려한다.3 is a flowchart illustrating an error checking process of LZ77 lossless compressed data according to an embodiment of the present invention. In an embodiment of the present invention, it is considered that the output stream of the LZ77 algorithm has the following characteristics in order to check whether or not an error occurs in the LZ77 lossless compressed data.

LZ77 알고리즘 부호화 과정에서 첫 번째 아웃풋 투플(output tuple)의 p와 len의 값은 각각 0, 0이다. 왜냐하면, 서치 버퍼(search buffer)가 비어있기 때문에 서치 버퍼(search buffer)와 룩어헤드 버퍼(lookahead buffer)에서 일치하는 문자열이 없어 p와 len이 각각 0, 0이기 때문이다. 그리고 이를 상기 정의한 변수를 통해 표현하면 다음식과 같다.In the LZ77 algorithm encoding process, the values of p and len of the first output tuple are 0 and 0, respectively. This is because p and len are 0 and 0, respectively, because there is no matching character in the search buffer and the lookahead buffer because the search buffer is empty. Then, it can be expressed by the above-mentioned variable as follows.

따라서, A_i=0 및 B_i=0인지를 확인한다(단계 S310,S320).Therefore, it is confirmed whether _Ai = 0 and _Bi = 0 (steps S310 and S320).

확인 결과, 단계 S320에서, A_i=0 및 B_i=0이 아니면, 출력 결과 문자열(output stream)에 에러(즉 오류)가 있는 것으로 검출한다(단계 S321).If it is determined in step S320 that A _i = 0 and B _i = 0, it is detected that there is an error (i.e., error) in the output stream (step S321).

이와 달리, 단계 S320에서, A_i=0 및 B_i=0이면, 1을 증가시킨다(단계 S330). p의 값은 len의 값보다 항상 크거나 같다. 왜냐하면, 서치 버퍼(search buffer)와 룩어헤드 버퍼(lookahead buffer)에서 일치하는 문자열의 길이 len은 일치하는 문자열의 시작 위치를 의미하는 p보다 같거나 작을 수밖에 없기 때문이다. 그리고 이를 상기 정의한 변수를 통해 표현하면 다음식과 같다.Alternatively, if in step S320, A _i = 0 and B _i = 0, increases to 1 (step S330). The value of p is always greater than or equal to the value of len. This is because the length len of the matching string in the search buffer and the lookahead buffer is equal to or smaller than p, which means the start position of the matching string. Then, it can be expressed by the above-mentioned variable as follows.

따라서, A_i≥B_i인지를 확인한다(단계 S340).Therefore, it is checked whether A _i ? B _i (step S340).

확인결과, 단계 S340에서, A_i≥B_i가 아니면, 출력 결과 문자열(output stream)에 에러(즉 오류)가 있는 것으로 검출한다(단계 S321).If it is not A _i _? B _i in step S 340, it is detected that there is an error (i.e., error) in the output stream (step S 321).

이와 달리, 단계 S340에서, A_i≥B_i이면, 다음 단계로 진행한다. _Otherwise , in step S340, if _Ai &_ge; _Bi , the process proceeds to the next step.

p의 값은 항상 W(=search buffer의 크기)보다 작거나 같다. 왜냐하면, 일치하는 문자열의 길이는 서치 버퍼(search buffer)의 크기를 넘을 수 없기 때문에 p의 값은 W 값보다 같거나 작을 수밖에 없기 때문이다. 그리고 이를 상기 정의한 변수를 통해 표현하면 다음식과 같다.The value of p is always less than or equal to W (= the size of the search buffer). This is because the length of the matched string can not exceed the size of the search buffer, so the value of p must be equal to or smaller than the value of W. Then, it can be expressed by the above-mentioned variable as follows.

따라서, A_i≤W_i인지 확인한다(단계 S350).Therefore, it is confirmed whether or not A _i? W _i (step S350).

확인결과, 단계 S350에서, A_i≤W_i가 아니면, 출력 결과 문자열(output stream)에 에러(즉 오류)가 있는 것으로 검출한다(단계 S321).As a result of the check, if it is not A _i? W _i in step S350, it is detected that there is an error (i.e., error) in the output result string (step S321).

이와 달리, 단계 S350에서, A_i≤W_i이면, 다음 단계로 진행한다.Alternatively, if in step S350, A _i ≤W _i, the process proceeds to the next step.

단계 S390에서, 열의 순서를 나타내는 색인(index)

가 마지막 열을 의미하는

값을 갖는지 확인한다. 확인 결과,

이 아니면, 에러가 있는 것으로 판정하고(단계 S321), 이와 함께 단계 S320로 진행한다. In step S390, an index indicating the order of the columns,

Means the last column

Value. As a result,

It is determined that there is an error (step S321), and the process proceeds to step S320.

최종적으로, 룩어헤드 버퍼가 빌 때 까지 수행되면,

이 되고, 출력 결과 문자열에 에러가 없는 것으로 판정한다(단계 S391).Finally, when the lookahead buffer is performed until the time of the borrow,

, And it is determined that there is no error in the output result string (step S391).

상기 기술한 조건들은 LZ77 알고리즘으로부터 부호화된 output stream이 갖는 특성으로, 위 3가지 조건 특성 중 어느 한 가지라도 만족하지 못한다면 output stream에 오류가 발생하였음을 알 수 있다.The conditions described above are characteristics of the output stream encoded from the LZ77 algorithm. If any one of the above three conditions is not satisfied, it can be seen that an error has occurred in the output stream.

도 3에서 보여주는 LZ77 무손실 압축 데이터의 오류 체크 순서도는 위 3가지 조건의 특성을 체크하는 과정을 포함하며, 상기 순서도에 따르면 LZ77 무손실 압축 데이터의 오류를 체크할 수 있다. 그리고 오류가 발생하지 않았다는 가정 하에 임의의 알고리즘으로 얻은 output stream에 대해서 본 발명은 LZ77 알고리즘으로 압축된 output stream인지 여부를 판별할 수 있다. The error check flowchart of the LZ77 lossless compressed data shown in FIG. 3 includes a process of checking the characteristics of the above three conditions, and it is possible to check the error of the LZ77 lossless compressed data according to the above flowchart. For an output stream obtained by an arbitrary algorithm on the assumption that no error has occurred, the present invention can determine whether or not the output stream is compressed by the LZ77 algorithm.

이러한 판별을 부연설명, 동일한 원본 데이터라 하더라도 압축 과정에 사용된 알고리즘에 따라 압축 데이터의 길이는 다르다. 따라서 LZ77이외의 임의의 압축 알고리즘으로 압축된 데이터는 도 2의 문자열 구조로 표현되지 않는다. This discrimination is further explained by the fact that the length of the compressed data differs according to the algorithm used in the compression process, even if it is the same original data. Therefore, data compressed by any compression algorithm other than LZ77 is not represented by the character string structure of FIG.

만약, 원본 데이터가 임의의 압축 알고리즘으로 압축되었을 때 output stream의 길이가 LZ77 압축 알고리즘으로 압축되었을 때의 output stream의 길이와 일치하여, 도 2의 문자열 구조를 갖게 된다 하더라도, 위 3가지 조건을 이용하여 문자열에 대한 오류 체크를 수행하게 되면, 반드시 위 3가지 조건 중 하나 이상의 조건에 의해 에러 검출 결과가 나오게 된다. Even if the length of the output stream when the original data is compressed by an arbitrary compression algorithm coincides with the length of the output stream when compressed by the LZ77 compression algorithm and the string structure of FIG. 2 is obtained, the above three conditions are used If an error check is performed on a character string, the error detection result is obtained by at least one of the above three conditions.

즉, 에러 검출이라는 본 발명의 목적과는 달리, output stream 데이터에 오류가 발생하지 않았다는 가정 하에 주어진 output stream이 본 발명의 오류 체크를 위한 3조건을 문자열이 종료될 때까지 모두 만족한다면 LZ77 압축 알고리즘으로 압축된 데이터임을 알 수 있다. 반대로, 문자열이 종료되기 전에 3조건 중 어느 하나의 조건이라도 만족하지 못하게 된다면 LZ77 압축 알고리즘이 아닌 임의의 압축 알고리즘으로 압축된 데이터임을 판별할 수 있게 된다.That is, unlike the object of the present invention of error detection, if an output stream given in the absence of an error in the output stream data meets all three conditions for error checking of the present invention until the end of the string, the LZ77 compression algorithm As shown in FIG. On the other hand, if any one of the three conditions is not satisfied before the character string is terminated, it can be determined that the data is compressed by an arbitrary compression algorithm other than the LZ77 compression algorithm.

도 4는 본 발명의 일실시예에 따른 LZ77 무손실 압축 데이터의 오류 체크 장치(400)의 구성 블록도이다. 도 4를 참조하면, 오류 체크 장치(400)는 압축될 문자열을 입력받는 입력부(410), 입력받은 문자열중 이미 압축이 완료된 문자열을 갖는 서치 버퍼와 아직 압축이 완료되지 않은 문자열을 갖는 룩어헤드 버퍼를 갖는 버퍼부(420), 압축될 문자열의 시작 부분에 코딩위치를 설정하고 완료되지 않은 문자열을 갖는 룩어헤드 버퍼의 시작부분부터 일치하는 문자열을 찾아 상기 룩어헤드 버퍼가 빌 때 까지 아웃풋 투플을 순차적으로 출력하는 오류 체크부(430), 순차적으로 출력되는 아웃풀 투플을 미리 설정되는 오류 체크 조건과 비교하여 비교 결과에 따라 출력 결과 문자열을 오류 또는 무오류로 판정하는 오류 판단부(440), 및 출력 결과 문자열을 부호화하는 부호화기(450) 등을 포함하여 구성될 수 있다.FIG. 4 is a block diagram of the configuration of an error checking apparatus 400 for LZ77 lossless compressed data according to an embodiment of the present invention. Referring to FIG. 4, the error check apparatus 400 includes an input unit 410 for receiving a string to be compressed, a search buffer having a string that has already been compressed and a lookup head buffer having a string yet to be compressed, A buffer unit 420 having a buffer to store a character string to be compressed, a coding position at the beginning of the character string to be compressed, and a matching character string from the beginning of the lookahead buffer having an uncompleted character string, An error determination unit 440 for comparing the out-pooled tuples sequentially output with an error check condition that is set in advance, and determining an output result string to be error or no error according to a comparison result; An encoder 450 for encoding a result string, and the like.

상기 코딩 위치를 기준으로 상기 압축될 문자열 중 이미 압축이 완료된 문자열은 상기 서치 버퍼에 배치되고, 상기 코딩 위치를 기준으로 상기 압축될 문자열 중 아직 압축이 완료되지 않은 문자열은 룩어헤드 버퍼에 배치된다.The already compressed string of the string to be compressed is placed in the search buffer based on the coding position, and the string of the string to be compressed, which has not yet been compressed, is placed in the lookahead buffer based on the coding position.

도 4에 기재된 "…부", "…기" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.The term " part ", "..." and the like in Fig. 4 refer to a unit for processing at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.

하드웨어 구현에 있어, 상술한 기능을 수행하기 위해 디자인된 ASIC(application specific integrated circuit), DSP(digital signal processing), PLD(programmable logic device), FPGA(field programmable gate array), 프로세서, 제어기, 마이크로프로세서, 다른 전자 유닛 또는 이들의 조합으로 구현될 수 있다. 소프트웨어 구현에 있어, 상술한 기능을 수행하는 모듈로 구현될 수 있다. 소프트웨어는 메모리 유닛에 저장될 수 있고, 프로세서에 의해 실행된다. 메모리 유닛이나 프로세서는 당업자에게 잘 알려진 다양한 수단을 채용할 수 있다.(DSP), a programmable logic device (PLD), a field programmable gate array (FPGA), a processor, a controller, a microprocessor, and the like, which are designed to perform the above- , Other electronic units, or a combination thereof. In software implementation, it may be implemented as a module that performs the above-described functions. The software may be stored in a memory unit and executed by a processor. The memory unit or processor may employ various means well known to those skilled in the art.

400: 오류 체크 장치
410: 입력부
420: 버퍼부
430: 오류 체크부
440: 오류 판단부
450: 부호화기400: Error check device
410:
420:
430: Error check section
440:
450: Encoder

Claims

(a) receiving a string to be compressed;
(b) setting a coding position at the beginning of the string to be compressed;
(c) a first portion of the lookahead buffer having a yet-to-be-compressed character string to be compressed based on the coding position in a search buffer having a character string already compressed among the characters to be compressed based on the coding position, And sequentially outputting the output tuples until the lookahead buffer is empty; And
(d) comparing out-pooled tuples sequentially output with a predetermined error check condition, and determining the compressed string as error or no error through the LZ77 (Lempel-Ziv) algorithm according to the comparison result,
Wherein the output tuple includes a start position (p) according to the coding position, a character string length (len) that coincides in the search buffer and the lookahead buffer, and a first character (C) that does not match in the lookahead buffer,
The starting position p and the character string length len are represented by L bits of L digits and the first character C is represented by 8 digits of 8 bits. (Lempel-Ziv) error-checking method for lossless compressed data.

The method according to claim 1,
The step (d)
(d-1) confirming that the value of the start position (p) and the string length (len) of the first output tuple is zero (0) as a first condition of the error check condition;
(d-2) checking whether the value of the start position (p) is always greater than or equal to the value of the string length (len) as a second condition of the error check condition; And
(d-3) checking whether the value of the start position (p) is always smaller than or equal to the size (W) of the search buffer as a third condition of the error check condition How to check for errors in data.

3. The method of claim 2,
Wherein the steps (d-2) to (d-3) are repeated until the last output tuple of the LZ77 lossless compressed data is reached.

3. The method of claim 2,
And if the step of (d-1) to (d-3) is not satisfied, it is determined to be an error.

delete

The method according to claim 1,
L is the size of the search buffer (W)

LZ77 < / RTI > lossless compressed data.

The method according to claim 1,
Wherein the character string that can be the first character (C) is an ASCII code (ASCII).

The method according to claim 1,
The compressed string is expressed by Equation

An input unit for inputting a string to be compressed;
A buffer unit having a search buffer having a string that has already been compressed among the input strings and a lookahead buffer having a string that has not yet been compressed;
A coding position is set at the beginning of the character string to be compressed and a matching character string is found from the beginning of the lookahead buffer having an uncompleted character string in the search buffer on the basis of the coding position, An error check unit for sequentially outputting the output tuples; And
And an error determination unit for comparing the out-pooled tuple sequentially output with an error check condition set in advance and determining the compressed string as an error or no error through the LZ77 (Lempel-Ziv) algorithm according to the comparison result,
Wherein the output tuple includes a start position (p) according to the coding position, a character string length (len) that coincides in the search buffer and the lookahead buffer, and a first character (C) that does not match in the lookahead buffer,
The starting position p and the character string length len are represented by L bits of L digits and the first character C is represented by 8 digits of 8 bits. LZ77 (Lempel-Ziv) error checking device for lossless compression data.