KR19980069645A

KR19980069645A - Data Compression Method

Info

Publication number: KR19980069645A
Application number: KR1019970006805A
Authority: KR
Inventors: 박보연
Original assignee: 김광호; 삼성전자 주식회사
Priority date: 1997-02-28
Filing date: 1997-02-28
Publication date: 1998-10-26

Abstract

본 발명은 디지털 컴퓨터 시스템에서 사용되는 임의의 데이터를 압축하는 방법에 관한 것으로서, 본 발명에 의한 데이터 파일에 기록된 데이터를 읽어들여 압축한 다음 압축데이터 파일에 기록하는 방법은 데이터 파일에서 가장 빈도수가 높은 4개의 바이트 단위 데이터를 찾는 제1단계; 압축데이터 파일에 4개의 바이트 단위 데이터를 기록하는 제2단계; 데이터 파일에서 데이터를 6바이트씩 차례대로 읽는 제3단계; 각 패턴별로 대응하는 정보 필드를 생성하고, 읽은 데이터가 각각 바이트 단위로 해당 패턴과 일치하면 대응하는 정보 필드에 소정의 비트를 표시하고, 해당 패턴과 일치하지 않으면 대응하는 정보 필드의 뒤에 해당 데이터를 추가하는 제4단계; 각 패턴별로 대응하는 정보 필드에 추가된 데이터의 크기를 비교하여 가장 작은 것을 선택하고, 가장 작은 것이 다수인 경우에는 정보 필드에 포함된 식별번호가 가장 작은 것을 선택하는 제5단계; 선택된 정보 필드와 그 정보 필드에 추가된 데이터를 압축데이터 파일에 기록하는 제6단계; 및 데이터 파일에 계속 압축할 데이터가 존재하지 않을 때까지 제3단계 내지 제6단계를 반복하는 제7단계를 포함함을 특징으로 한다.The present invention relates to a method of compressing arbitrary data used in a digital computer system. The method of reading and compressing data recorded in a data file according to the present invention and then writing the data to a compressed data file has the highest frequency in the data file. A first step of finding high four byte data; A second step of recording four bytes of data in the compressed data file; A third step of sequentially reading data from the data file by 6 bytes; Create a corresponding information field for each pattern, and if the read data matches the corresponding pattern by byte unit, display a predetermined bit in the corresponding information field, and if it does not match the pattern, put the corresponding data after the corresponding information field. Adding a fourth step; A fifth step of selecting the smallest one by comparing the size of data added to the corresponding information field for each pattern, and selecting the smallest identification number included in the information field if there are a plurality of the smallest ones; A sixth step of recording the selected information field and data added to the information field in a compressed data file; And a seventh step of repeating the third to sixth steps until there is no data to be continuously compressed in the data file.

본 발명에 의하면, 종래의 방식에 비하여 압축율은 다소 떨어지지만, 비트 맵 영상 데이터 또는 반복적인 특성을 가지는 데이터에 적용할 경우 만족할 만한 압축 효율을 얻을 수 있으며, 데이터 압축 과정이 간단하고, 쉽게 구현될 수 있다.According to the present invention, although the compression rate is slightly lower than that of the conventional method, satisfactory compression efficiency can be obtained when applied to bitmap image data or data having repetitive characteristics, and the data compression process is simple and easy to implement. Can be.

Description

Data Compression Method

본 발명은 데이터 압축 방법에 관한 것으로서, 보다 상세하게는 디지털 컴퓨터 시스템에서 사용되는 임의의 데이터를 압축하는 방법에 관한 것이다.The present invention relates to a data compression method, and more particularly, to a method for compressing any data used in a digital computer system.

종래의 기술에 의한 데이터 압축 방식으로는 허프만 코딩 방식 등이 있는데, 이러한 방식들은 통계적인 분석 또는 수학적인 해석을 통해 데이터 압축을 구현한다. 종래의 기술에 의한 데이터 압축 방식들은 통상적으로 뛰어난 압축효율을 보장하나, 그 구현과정이 복잡하고 어려우며, 구현 후에도 알고리듬대로 정확히 구현되었는지 여부를 증명하기 어렵다.Conventional data compression schemes include Huffman coding schemes, which implement data compression through statistical or mathematical analysis. Conventional data compression schemes usually guarantee excellent compression efficiency, but their implementation is complex and difficult, and it is difficult to prove whether they are correctly implemented according to the algorithm after implementation.

본 발명은 상기 문제점을 해결하기 위하여 창작된 것으로서, 종래의 방식에 비하여 압축율은 다소 떨어지더라도 영상 데이터의 압축장치 등 일반적으로 반복적인 데이터를 간단히 압축하기 위한 데이터 압축 방법을 제공함을 그 목적으로 한다.The present invention was created to solve the above problems, and an object thereof is to provide a data compression method for simply compressing repetitive data such as a compression device of video data even though the compression ratio is slightly lower than that of the conventional method.

도 1은 본 발명에 의한 데이터 압축 과정을 시간의 흐름에 의해 도시한 흐름도이다.1 is a flowchart illustrating a data compression process according to the present invention with the passage of time.

도 2는 본 발명에서 사용되는 정보 필드의 구성을 도시한 것이다.2 shows a configuration of an information field used in the present invention.

도 3은 본 발명에 의해 압축된 압축데이터 파일 내의 데이터 배치를 도시한 것이다.Figure 3 shows the data arrangement in a compressed data file compressed by the present invention.

상기의 목적을 달성하기 위하여, 본 발명에 의한 데이터 파일에 기록된 데이터를 읽어들여 압축한 다음 압축데이터 파일에 기록하는 방법은 상기 데이터 파일에서 가장 빈도수가 높은 4개의 바이트 단위 데이터를 찾아 각각 제1 패턴, 제2 패턴, 제3 패턴, 제4 패턴으로 두는 제1단계; 상기 압축데이터 파일에 상기 제1 패턴, 제2 패턴, 제3 패턴, 제4 패턴을 기록하는 제2단계; 상기 데이터 파일에서 데이터를 6바이트씩 차례대로 읽는 제3단계; 상기 각 패턴별로, 대응하는 정보 필드를 생성하고, 상기 읽은 데이터가 각각 바이트 단위로 해당 패턴과 일치하면 상기 대응하는 정보 필드에 소정의 비트를 표시하고, 해당 패턴과 일치하지 않으면 상기 대응하는 정보 필드의 뒤에 해당 데이터를 추가하는 제4단계; 상기 각 패턴별로 상기 대응하는 정보 필드에 추가된 데이터의 크기를 비교하여 가장 작은 것을 선택하고, 상기 가장 작은 것이 다수인 경우에는 상기 정보 필드에 포함된 식별번호가 가장 작은 것을 선택하는 제5단계; 상기 선택된 정보 필드와 그 정보 필드에 추가된 데이터를 상기 압축데이터 파일에 기록하는 제6단계; 및 상기 데이터 파일에 계속 압축할 데이터가 존재하지 않을 때까지 상기 제3단계 내지 상기 제6단계를 반복하는 제7단계를 포함함을 특징으로 한다.In order to achieve the above object, a method of reading and compressing data recorded in a data file according to the present invention, and then recording the data in a compressed data file finds the four most frequent byte data in the data file, respectively A first step of placing the pattern, the second pattern, the third pattern, and the fourth pattern; A second step of recording the first pattern, second pattern, third pattern, and fourth pattern in the compressed data file; A third step of sequentially reading data from the data file by six bytes; For each of the patterns, a corresponding information field is generated, and if the read data coincides with the corresponding pattern on a byte basis, a predetermined bit is displayed in the corresponding information field; if not, the corresponding information field Adding the data after the fourth step; A fifth step of selecting the smallest one by comparing the size of data added to the corresponding information field for each pattern, and selecting the smallest identification number included in the information field when the smallest number is a plurality; A sixth step of recording the selected information field and data added to the information field in the compressed data file; And a seventh step of repeating the third to sixth steps until there is no data to be continuously compressed in the data file.

이하에서 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

데이터 압축을 진행하기 전에 먼저 소스 데이터 파일을 바이트 단위로 검색하여, 가장 빈도수가 높은 4개의 데이터를 추출하여, 각각 제1 패턴(aa), 제2 패턴(bb), 제3 패턴(cc), 제4 패턴(dd)으로 둔다(100단계).Before proceeding with data compression, the source data file is first searched by byte unit, and the four most frequent data are extracted, and the first pattern (aa), the second pattern (bb), the third pattern (cc), The fourth pattern dd is set (step 100).

그리고, 압축된 데이터의 결과파일인 압축데이터 파일에 상기 제1 패턴, 제2 패턴, 제3 패턴, 제4 패턴을 기록한다(110단계).The first pattern, the second pattern, the third pattern, and the fourth pattern are recorded in the compressed data file which is a result file of the compressed data (step 110).

그 다음, 상기 데이터 파일에서 데이터를 6바이트씩 차례대로 읽어온다(120단계).Thereafter, data is read in sequence from six bytes in the data file (step 120).

도 2는 본 발명에서 사용되는 각 패턴에 대응하는 정보 필드의 구성을 도시한 것이다. 도 2에 도시된 바에 의하면, 상기 정보 필드는 상기 대응하는 패턴을 구별하기 위한 상위 2비트의 식별 필드(200)와 상기 120단계에서 읽어들인 6바이트의 데이터가 대응하는 패턴과 일치하는 지 여부를 기록하기 위한 6비트의 데이터압축 필드를 구비한다.2 shows the configuration of information fields corresponding to each pattern used in the present invention. As shown in FIG. 2, the information field indicates whether the identification field 200 of the upper two bits for distinguishing the corresponding pattern and the six bytes of data read in step 120 correspond to the corresponding pattern. It has a 6-bit data compression field for recording.

다시 도 1을 살펴보면, 먼저 상기 120단계에서 읽어들인 데이터를 상기 제1 패턴(aa)과 비교한다. 즉, 상기 120단계에서 읽어들인 데이터를 한 바이트씩 차례로 상기 제1 패턴과 비교하여 일치하면 상기 제1 패턴에 대응하는 정보 필드의 데이터압축 필드의 해당 비트에 1을 표시하고, 일치하지 않으면 상기 해당 비트에 0을 표시하고 상기 대응하는 정보 필드의 뒤에 해당 바이트를 추가한다(130단계).Referring back to FIG. 1, first, data read in step 120 is compared with the first pattern aa. That is, when the data read in step 120 is matched with the first pattern one by one in sequence, 1 is displayed in the corresponding bit of the data compression field of the information field corresponding to the first pattern. A zero is indicated in the bit and the corresponding byte is added after the corresponding information field (step 130).

이와같이, 상기 120단계에서 읽어들인 데이터를 상기 제2 패턴, 제3 패턴, 제4 패턴과 차례로 비교하면서 각 패턴에 대하여 압축을 시도한다(132단계, 134단계, 136단계). 따라서, 동일한 압축이 4번 시도된다.As described above, the data read in step 120 is compared with the second pattern, the third pattern, and the fourth pattern, and compression is performed for each pattern (steps 132, 134, and 136). Thus, the same compression is tried four times.

이때, 상기 각 패턴에 대응하는 정보 필드에 포함된 식별 필드에는 제1 패턴에 대하여는 2진 00값을 갖고, 제2 패턴에 대하여는 2진 01을, 제3 패턴에 대하여는 2진 10을, 제4 패턴에 대하여는 2진 11값을 각각 갖는다.In this case, the identification field included in the information field corresponding to each of the patterns has a binary value of 00 for the first pattern, binary 01 for the second pattern, binary 10 for the third pattern, and fourth Each pattern has a binary 11 value.

그 다음, 상기 각 패턴별로 상기 대응하는 정보 필드에 추가된 해당 데이터의 크기를 비교하여 가장 작은 것을 선택하고, 상기 가장 작은 것이 다수인 경우에는 상기 정보 필드에 포함된 식별번호가 가장 작은 것을 압축된 데이터로서 선택한다(140단계).Next, the smallest one is selected by comparing the size of the corresponding data added to the corresponding information field for each pattern, and if the smallest number is many, the smallest identification number included in the information field is compressed. Select as data (step 140).

이후에, 상기 선택된 정보 필드와 그 정보 필드에 추가된 데이터를 상기 압축데이터 파일에 기록한다(150단계).Thereafter, the selected information field and data added to the information field are recorded in the compressed data file (step 150).

상기 제3단계 내지 상기 제6단계의 과정은 상기 데이터 파일에 계속 압축할 데이터가 존재하는 한, 계속하여 반복한다(160단계).The process of the third to sixth steps is repeated as long as there is data to be continuously compressed in the data file (step 160).

도 3은 본 발명에 의해 압축된 압축데이터 파일 내의 데이터 배치를 도시한 것이다. 압축데이터 파일의 헤더에는 상기 100단계에서 추출한 가장 빈도수가 높은 4개의 데이터가 차례로 기록되고, 그 뒤에는 압축된 데이터가 기록된다.Figure 3 shows the data arrangement in a compressed data file compressed by the present invention. In the header of the compressed data file, four data having the highest frequency extracted in step 100 are sequentially recorded, followed by compressed data.

이하에서 상기 압축된 데이터를 복원하는 과정을 간단히 설명한다.Hereinafter, the process of restoring the compressed data will be briefly described.

먼저, 압축데이터 파일에서 헤더 부분에서 가장 빈도수가 높은 4개의 데이터를 차례로 읽는다. 다음, 정보 필드를 읽고, 그 정보 필드의 데이터압축 필드에서 0으로 설정된 비트의 수만큼 정보 필드에 추가된 데이터를 읽어들인다. 이후 상기 정보 필드의 데이터압축 필드의 각 비트를 차례로 검사하여, 상기 비트의 값이 1인 경우에는 상기 가장 빈도수가 높은 4개의 데이터 중에서 상기 정보 필드의 식별 필드에 대응하는 데이터를 찾고, 상기 비트의 값이 0인 경우에는 상기 정보 필드에 추가된 데이터에서 차례로 찾아 원래의 데이터를 복원할 수 있다. 이후, 상기의 과정을 압축데이터 파일에 데이터가 있는 동안 반복하여 계속함으로써 전 압축데이터 파일의 데이터를 모두 복원할 수 있다.First, the four most frequent data in the header portion of the compressed data file are read in sequence. Next, the information field is read, and data added to the information field is read by the number of bits set to 0 in the data compression field of the information field. Thereafter, each bit of the data compression field of the information field is sequentially examined, and when the value of the bit is 1, the data corresponding to the identification field of the information field is found among the four most frequent data. If the value is 0, the original data may be restored in order from the data added to the information field. Thereafter, by repeating the above process while there is data in the compressed data file, it is possible to restore all the data of the entire compressed data file.

Claims

In the method of reading and compressing the data recorded in the data file, and then recording the compressed data file,

A first step of finding the four most frequent byte data in the data file and placing the first byte, the second pattern, the third pattern, and the fourth pattern, respectively;

A second step of recording the first pattern, second pattern, third pattern, and fourth pattern in the compressed data file;

A third step of sequentially reading data from the data file by six bytes;

For each of the patterns, a corresponding information field is generated, and if the read data coincides with the corresponding pattern on a byte basis, a predetermined bit is displayed in the corresponding information field; if the read data does not match the corresponding pattern, the corresponding information field Adding the data after the fourth step;

A fifth step of selecting the smallest one by comparing the size of data added to the corresponding information field for each pattern, and selecting the smallest identification number included in the information field when the smallest number is a plurality;

A sixth step of recording the selected information field and data added to the information field in the compressed data file; And

And a seventh step of repeating the third to sixth steps until there is no data to be continuously compressed in the data file.

The method of claim 1, wherein the information field is

An identification field of upper two bits for distinguishing the corresponding pattern; And

And a 6-bit data compression field for recording whether or not the 6-byte data read in the third step matches the corresponding pattern.