KR101705461B1

KR101705461B1 - Method and apparatus for encoding and decoding strings

Info

Publication number: KR101705461B1
Application number: KR1020150122016A
Authority: KR
Inventors: 장지훈; 이승은
Original assignee: 서울과학기술대학교 산학협력단
Priority date: 2015-08-28
Filing date: 2015-08-28
Publication date: 2017-02-09

Abstract

The present invention provides a dictionary based compressing method capable of being implemented on hardware. The present invention comprises: a step of searching for a partial string corresponding to a window of the entire strings in each of a plurality of dictionaries; a step of compressing the partial string when the partial string is searched in each of the plurality of dictionaries; and a step of registering the partial strings in any one of the plurality of dictionaries when the partial string is not searched in each of the plurality of dictionaries.

Description

[0001] METHOD AND APPARATUS FOR ENCODING AND DECODING STRINGS [0002]

이하의 실시예들은 데이터 압축 및 해제 방법에 관한 것으로, 보다 구체적으로는 사전 방식의 데이터 관리 방식을 이용하여 데이터를 압축하는 방법 및 해제하는 방법에 관한 것이다.The following embodiments relate to data compression and decompression methods, and more particularly, to a method of compressing and releasing data using a dictionary-based data management method.

데이터 압축 기술은, 결과적인 표현이 원래의 표현방식이 사용한 것보다 더 적은 수의 비트를 갖도록 데이터를 인코딩하는, 즉 평상시보다 공간을 덜 점유하도록 데이터를 저장하는 프로세스이다. 압축기술은 통신장치들이 같은 양의 데이터를 더 적은 비트 수로 전송 또는 저장하는 것을 가능하게 한다. 압축 작업은 원본 데이터를 받아들여 압축된 데이터를 생성하는 소정의 인코딩 알고리즘을 포함한다. Data compression techniques are processes that encode data such that the resulting representation has fewer bits than the original representation uses, i.e., the process of storing the data to occupy less space than usual. Compression techniques enable communication devices to transmit or store the same amount of data in fewer bits. The compression operation includes a predetermined encoding algorithm for receiving the original data and generating compressed data.

데이터 압축은 백업 유틸리티, 스프레드시트 애플리케이션 및 데이터베이스 관리 시스템에서 광범위하게 이용된다. 비트-맵(bit-mapped) 그래픽과 같은 어떤 종류의 데이터는 데이터 압축을 통하여 원래 사이즈의 몇 분의 1로 압축이 가능하다.Data compression is widely used in backup utilities, spreadsheet applications and database management systems. Some types of data, such as bit-mapped graphics, can be compressed to a fraction of their original size through data compression.

데이터 압축은 주로 두 종류의 압축, 즉 '무-손실(lossless) 압축'과 '유-손실(lossy) 압축'으로 나뉘어 진다. 무-손실 압축은 가역적이어서 원래의 데이터가 재구성될 수 있다. 반면, 유-손실 데이터 압축 체계는 약간의 데이터 손실이 발생할 수 있지만 더 높은 압축률을 달성할 수 있다. Data compression is mainly divided into two types of compression: lossless compression and lossy compression. The lossless compression is reversible so that the original data can be reconstructed. On the other hand, a worst-case data compression scheme can achieve a higher compression rate, although some data loss may occur.

무-손실 데이터 압축 체계는 부분 문자열, 실행 프로그램 등의 데이터에 적용될 수 있다. 데이터 압축을 통하여 많은 양의 저장 공간이 절약될 수 있다. 데이터 압축은 데이터 압축 알고리즘을 이용하여 달성된다. 수행될 데이터 압축의 종류에 따라서 여러 개의 개별적인 알고리즘들이 사용될 수 있다. 사전 기반 압축(혹은 사전형 압축)은 무-손실 압축에 해당한다.The lossless data compression scheme can be applied to data such as substrings, executable programs, and the like. A large amount of storage space can be saved through data compression. Data compression is achieved using a data compression algorithm. Several individual algorithms may be used depending on the type of data compression to be performed. Dictionary-based compression (or preshaped compression) corresponds to no-loss compression.

허프만 코딩(Huffman coding), 산술적 코딩(arithmetic coding), 사전 기반/ 치환 알고리즘(Dictionary based/ Substitutional algorithm), 동적 생성형 사전(dynamically generated dictionary) 등과 같은 다양한 알고리즘들을 활용하여 데이터 압축이 가능하다. 상기 사전들은 복잡한 데이터 유형, 빈번한 데이터 변화 및/또는 명백한 경계가 없는 데이터 값들로써 데이터 압축비를 향상시킬 수 있다.Data compression is possible by using various algorithms such as Huffman coding, arithmetic coding, dictionary based / substitution algorithm, and dynamically generated dictionary. The dictionaries can improve the data compression ratio with complex data types, frequent data changes, and / or data values with no obvious boundaries.

소프트웨어로 구현된 사전형 압축 알고리즘은 보통 해쉬 함수를 사용하여 사전의 메모리 주소를 구한다. 일반적인 해쉬 함수들은 32-bit 또는 16-bit의 해쉬 값을 반환하여 이를 메모리 주소로 이용한다. 다만, 일반적인 PC의 경우 MMU 또는 캐쉬 등을 통해 변환된 주소의 크기에 해당하는 메모리가 필요하지 않을 수 있다. 하지만 소프트웨어로 구현된 사전형 압축 알고리즘을 하드웨어로 제조하는 경우, 시스템은 원칙적으로 변환된 주소의 크기에 해당하는 메모리가 필요하다. 소프트웨어 기반 압축의 사전은 4GB 메모리 공간 하나가 필요할 수 있다.Software-implemented dictionary compression algorithms usually use a hash function to obtain the dictionary memory address. Common hash functions return a 32-bit or 16-bit hash value and use it as a memory address. However, in the case of a general PC, a memory corresponding to the size of an address converted through an MMU or a cache may not be required. However, when a software-implemented dictionary compression algorithm is fabricated with hardware, the system needs a memory corresponding to the size of the converted address in principle. A dictionary of software-based compression may require a single 4 GB memory space.

4GB의 메모리 공간이 필요한 경우에도, MMU 등을 통해 가상메모리를 사용하여 4GB보다 더 작은 물리적 메모리로 시스템을 운용할 수 있다. 하지만 이 경우 추가적인 처리과정이 필요할 수 있다.Even if you need 4GB of memory space, you can run the system with less than 4GB of physical memory using virtual memory, such as MMU. However, additional processing may be required in this case.

따라서, 하드웨어로 압축 시스템을 구현할 때 소프트웨어로 구현된 사전형 압축 알고리즘을 메모리 공간에 대한 별도의 처리 없이 그대로 적용할 경우 매우 큰 메모리를 필요로 한다. 또한, 필요한 메모리의 크기를 줄이기 위해서는 복잡한 별도의 처리가 필요하다.Therefore, when a compression system is implemented in hardware, if a pre-compression algorithm implemented by software is directly applied to the memory space without processing, very large memory is required. In addition, complicated and separate processing is required to reduce the required memory size.

본 발명의 일실시예로서 문자열을 복수의 사전에서 검색하는 구성, 검색되는 경우 문자열을 압축하는 구성 및 검색되지 않는 경우 사전에 등록하는 구성을 제공함으로써 하드웨어 상에서 별도의 처리 없이 압축 처리량을 향상시키는 방법을 제공한다.As one embodiment of the present invention, there is provided a method of searching a plurality of dictionaries in a plurality of dictionaries, a method of compressing a string when being searched, and a method of registering dictionaries in advance when not searched, .

본 발명의 일실시예로서 압축된 부분을 검색하는 구성, 압축된 부분에 대응하는 데이터를 사전에서 추출하는 구성 및 압축된 부분을 추출된 데이터로 대체하는 구성을 제공함으로써 하드웨어 상에서 사전 기반 압축 알고리즘으로 압축된 데이터를 압축 해제하는 방법을 제공한다.As an embodiment of the present invention, by providing a configuration for searching for a compressed portion, a configuration for extracting data corresponding to a compressed portion from a dictionary, and a configuration for replacing a compressed portion with extracted data, And provides a method for decompressing compressed data.

본 발명의 일실시예에 따른 문자열 압축 방법은, 전체 문자열에서 윈도우에 대응하는 부분 문자열을 복수의 사전들 각각에서 검색하는 단계, 복수의 사전들 각각에서 상기 부분 문자열이 검색되는 경우, 상기 부분 문자열을 압축하는 단계 및 복수의 사전들 각각에서 상기 부분 문자열이 검색되지 않는 경우, 상기 부분 문자열을 복수의 사전들 중 어느 하나의 사전에 등록하는 단계를 포함할 수 있다.According to an embodiment of the present invention, there is provided a character string compression method comprising the steps of: searching a plurality of dictionaries for a partial character string corresponding to a window in an entire character string; when the partial character string is searched in each of a plurality of dictionaries, And registering the partial string in any one of the plurality of dictionaries when the partial string is not retrieved in each of the plurality of dictionaries.

상기 문자열 압축 방법은 상기 검색하는 단계는, 상기 전체 문자열에서 상기 윈도우를 이동하면서 추출한 복수의 부분 문자열을 복수의 사전들 각각에서 검색하는 문자열 압축 방법일 수 있다.In the character string compression method, the searching step may be a character string compression method of searching a plurality of partial strings extracted from the entire character string while moving the window, in each of the plurality of dictionaries.

상기 문자열 압축 방법은, 상기 검색하는 단계는, 윈도우 내의 각 문자로부터 시작하는 부분 문자열을 복수의 사전들 각각에서 검색하는 문자열 압축 방법일 수 있다.In the string compression method, the searching step may be a string compression method for searching a plurality of dictionaries for a partial string starting from each character in the window.

상기 문자열 압축 방법은, 상기 검색하는 단계는, 상기 윈도우에 포함되는 문자의 개수에 대응하는 복수의 사전에서 상기 부분 문자열을 검색하는 문자열 압축 방법일 수 있다.In the string compression method, the searching step may be a string compression method of searching for the partial string in a plurality of dictionaries corresponding to the number of characters included in the window.

상기 문자열 압축 방법은, 상기 검색하는 단계는, 상기 부분 문자열에서 시작 문자를 변환한 주소값에 기초하여 상기 부분 문자열을 복수의 사전들 각각에서 검색하는 문자열 압축 방법일 수 있다.In the string compression method, the searching step may be a string compression method for searching the plurality of dictionaries for each of the plurality of dictionaries based on an address value obtained by converting a starting character in the partial string.

상기 문자열 압축 방법은, 상기 등록하는 단계는, 상기 복수의 사전들 중 상기 윈도우 내의 각 문자의 위치에 대응되는 사전에 상기 부분 문자열의 시작 문자를 변환한 주소값에 기초하여 상기 부분 문자열 및 상기 전체 문자열에 대한 상기 부분 문자열의 위치를 등록하는 문자열 압축 방법일 수 있다.The character string compression method according to claim 1, wherein the registering step comprises: a step of registering, in a dictionary corresponding to a position of each character in the window among the plurality of dictionaries, And a string compression method for registering the position of the partial string with respect to the character string.

상기 문자열 압축 방법은, 상기 등록하는 단계는, 상기 윈도우 내의 각 문자의 위치에 대응되는 사전의 상기 부분 문자열의 시작 문자를 변환한 주소값에 다른 부분 문자열 및 상기 다른 부분 문자열의 위치가 등록된 경우, 전체 문자열에서 등장하는 빈도수를 비교하여 상기 부분 문자열과 상기 다른 부분 문자열 중 어느 것을 등록할지 결정하는 단계를 더 포함하는 문자열 압축 방법일 수 있다.The character string compression method according to claim 1, wherein the step of registering further comprises the step of registering, when the position of the other partial string and the position of the other partial string are registered in the address value obtained by converting the starting character of the partial string in the dictionary corresponding to the position of each character in the window And determining whether to register the partial string or the other partial string by comparing the frequencies appearing in the entire string.

상기 문자열 압축 방법은, 상기 압축하는 단계는, 복수의 사전들 중에서 상기 부분 문자열 중 최소 길이 이상의 일치하는 문자열이 검색되는 경우, 상기 일치하는 문자열 대신 상기 부분 문자열의 일치하는 길이와 전체 문자열에 대한 상기 부분 문자열의 위치를 출력하는 문자열 압축 방법일 수 있다.The method of claim 1, wherein, in the case of searching for a matching character string having a minimum length among the plurality of dictionaries among a plurality of dictionaries, And may be a string compression method that outputs the location of the substring.

본 발명의 일실시예에 따른 문자열 압축 해제 방법은, 부분 문자열의 일치하는 길이와 전체 문자열에 대한 상기 부분 문자열의 위치로 압축된 문자열에서 압축된 부분을 검색하는 단계, 압축된 부분의 상기 위치에 대응하는 문자열로부터 압축된 길이만큼의 문자열을 복구하는 단계 및 상기 복구된 문자열로 상기 압축된 부분을 대체하는 단계를 포함할 수 있다.A method for decompressing a character string according to an embodiment of the present invention includes the steps of searching for a compressed portion in a compressed character string with a matching length of a partial character string and a position of the partial character string with respect to the entire character string, Recovering a string of the compressed length from the corresponding string, and replacing the compressed portion with the recovered string.

본 발명의 일실시예로서 문자열을 복수의 사전에서 각 사전이 가지고 있는 문자열만큼 동시에 검색하는 구성, 검색되는 경우 문자열을 압축하는 구성 및 검색되지 않는 경우 사전에 등록하는 구성을 제공함으로써 하드웨어 상에서 별도의 처리 없이 압축 처리량을 향상시킬 수 있다.As an embodiment of the present invention, it is possible to provide a configuration in which a character string is simultaneously searched in a plurality of dictionaries by a character string held by each dictionary, a configuration in which a character string is searched when it is searched, The compression throughput can be improved without processing.

본 발명의 일실시예로서 압축된 부분을 검색하는 구성, 압축된 부분에 대응하는 데이터를 사전에서 추출하는 구성 및 압축된 부분을 추출된 데이터로 대체하는 구성을 제공함으로써 하드웨어 상에서 사전 기반 압축 알고리즘으로 압축된 데이터를 압축 해제할 수 있다.As an embodiment of the present invention, by providing a configuration for searching for a compressed portion, a configuration for extracting data corresponding to a compressed portion from a dictionary, and a configuration for replacing a compressed portion with extracted data, The compressed data can be decompressed.

도 1은 일실시예에 따른 사전 기반 압축 알고리즘을 하드웨어 상에서 구현한 장치를 나타낸다.
도 2는 일실시예에 따른 메모리 상에서 복수의 사전이 구현된 형태를 나타낸다.
도 3은 일실시예에 따른 압축하려는 데이터에 대해 윈도우(window) 및 제2(look-ahead window) 윈도우가 이동하는 모습을 나타낸다.
도 4는 일실시예에 따른 윈도우와 복수의 사전의 대응 관계, 각 사전을 구성하는 문자열 메모리와 오프셋 메모리의 구조 및 아스키 코드표와 주소와의 관계를 나타낸다.
도 5는 일실시예에 따른 압축을 수행하는 알고리즘을 나타낸다.
도 6은 일실시예에 따른 검색 단계를 나타낸다.
도 7은 일실시예에 따른 등록 단계를 나타낸다.
도 8은 일실시예에 따른 압축 단계를 나타낸다.FIG. 1 shows an apparatus that implements a dictionary-based compression algorithm according to one embodiment in hardware.
2 illustrates a plurality of dictionary implemented forms on memory according to one embodiment.
FIG. 3 illustrates a window and a look-ahead window moving with respect to data to be compressed according to an embodiment.
Fig. 4 shows the relationship between a window and a plurality of dictionaries according to an embodiment, a structure of a character string memory and an offset memory constituting each dictionary, and a relationship between an ASCII code table and an address.
5 shows an algorithm for performing compression according to one embodiment.
6 shows a search step according to one embodiment.
7 illustrates a registration step according to one embodiment.
8 illustrates a compression step according to one embodiment.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일실시예에 따른 사전 기반 압축 알고리즘을 하드웨어 상에서 구현한 장치를 나타낸다. FIG. 1 shows an apparatus that implements a dictionary-based compression algorithm according to one embodiment in hardware.

메모리(130)는 복수의 사전을 위한 저장 공간을 제공할 수 있다. 각 사전은 문자열 메모리와 오프셋 메모리의 쌍으로 구성될 수 있다. 문자열 메모리는 압축 하려는 전체 문자열 중의 부분 문자열을 저장하고, 오프셋 메모리는 전체 문자열 중의 부분 문자열의 위치를 저장할 수 있다.The memory 130 may provide storage for a plurality of dictionaries. Each dictionary can consist of a pair of string memory and offset memory. The string memory stores the substring of the entire string to be compressed, and the offset memory stores the position of the substring in the entire string.

프로세서(110)는 입출력부(120)로부터 입력 받은 전체 문자열에서 윈도우(window)를 이동하면서 윈도우가 포함하는 각 문자열을 시작점으로 하는 부분 문자열에 대해 검색단계, 등록단계 및 압축 단계를 진행할 수 있다. 여기서 윈도우는 현재 검색, 등록 및 압축 단계를 수행하려는 범위를 의미할 수 있다. 프로세서(110)는 부분 문자열을 메모리(130)에 저장된 사전에서 검색할 수 있다. 부분 문자열이 사전에서 검색된 경우, 프로세서(110)는 압축을 수행할 수 있다. 압축된 결과는 입출력부(120)를 통해 출력될 수 있다. 부분 문자열이 사전에서 검색되지 않은 경우, 프로세서(110)는 압축을 수행하지 않고 사전에 부분 문자열을 등록할 수 있다.The processor 110 may perform a search step, a registration step, and a compression step on a substring whose starting point is the character string included in the window while moving the window from the entire character string input from the input / output unit 120. [ Here, the window may mean a range where the current search, registration, and compression steps are to be performed. Processor 110 may retrieve the substring from a dictionary stored in memory 130. [ If the substring is retrieved from the dictionary, the processor 110 may perform compression. The compressed result may be output through the input / output unit 120. If the substring is not retrieved from the dictionary, the processor 110 may register the substring in advance without performing compression.

입출력부(120)는 압축하려는 전체 문자열을 입력 받아 프로세서(110)에 전송할 수 있다. 입출력부(120)는 전체 문자열 중 압축되지 않은 부분과 압축된 부분을 연결하여 출력할 수 있다.The input / output unit 120 may receive the entire string to be compressed and transmit it to the processor 110. The input / output unit 120 may output an uncompressed portion and a compressed portion of the entire character string.

도 2는 일실시예에 따른 메모리(130) 상에서 복수의 사전이 구현된 형태를 나타낸다.FIG. 2 illustrates a plurality of pre-implemented forms on memory 130 according to one embodiment.

메모리(130)는 복수의 사전을 저장하기 위한 공간을 제공할 수 있다. 각 사전은 문자열 메모리(210)와 오프셋 메모리(220)로 구성될 수 있다. 문자열 메모리(210)는 압축하려는 전체 문자열에서 윈도우에 포함된 각 문자로부터 시작하는 부분 문자열이 복수의 사전에서 검색되지 않을 경우 부분 문자열을 등록하기 위해 필요한 저장 공간일 수 있다. 문자열 메모리(210)는 깊이(depth)와 넓이(width)를 가질 수 있다. 여기서 깊이는 문자열 메모리의 주소를 의미할 수 있다.The memory 130 may provide space for storing a plurality of dictionaries. Each dictionary may consist of a string memory 210 and an offset memory 220. The string memory 210 may be a storage space necessary for registering a partial string when a partial string starting from each character included in the window in the entire string to be compressed is not searched in a plurality of dictionaries. The string memory 210 may have a depth and a width. Where depth can refer to the address of the string memory.

오프셋 메모리는 전체 문자열 중 부분 문자열의 위치를 저장하기 위해 필요한 공간을 제공할 수 있다. 즉, 부분 문자열이 사전에서 검색되지 않아 특정 사전의 문자열 메모리(210)에 부분 문자열이 등록되는 경우, 동일한 사전의 오프셋 메모리에 부분 문자열의 위치를 저장할 수 있다. 오프셋 메모리는 깊이(depth)와 넓이(width)를 가질 수 있다. 여기서 깊이는 문자열 메모리의 주소를 의미할 수 있다.The offset memory may provide the space required to store the location of the substring of the entire string. That is, if the partial string is not retrieved from the dictionary and the partial string is registered in the string memory 210 of the specific dictionary, the position of the partial string can be stored in the offset memory of the same dictionary. Offset memory can have depth and width. Where depth can refer to the address of the string memory.

도 3은 일실시예에 따른 압축하려는 데이터에 대해 윈도우가 이동하는 모습을 나타낸다.FIG. 3 illustrates a window moving with respect to data to be compressed according to an embodiment.

제1 윈도우(301)는 전체 문자열 중에서 현재 압축을 시도하려는 문자열을 포함할 수 있다. 제2 윈도우(302)는 다음 구간에서 압축을 시도하려는 문자열을 포함할 수 있고 제1 윈도우(301)에 바로 붙어있을 수 있다. 제2 윈도우(302)는 제1 윈도우(301)와 같은 크기일 수 있다. 명세서 전체에 걸쳐 단순한 "윈도우"라는 표현은 제1 윈도우를 의미할 수 있다.The first window 301 may include a character string to be currently compressed among the entire character strings. The second window 302 may include a string to be compressed in the next section and may be directly attached to the first window 301. The second window 302 may be the same size as the first window 301. A simple "window" expression throughout the specification may refer to a first window.

소프트웨어로 구현된 압축 알고리즘의 경우와 달리 하드웨어로 구현된 압축의 경우 제1 윈도우(301)와 같은 길이의 부분 문자열을 사전에 저장해야 하기 때문에 다음에 어떤 부분 문자열이 등장해야 하는지 알고 있어야 한다. 따라서 제2 윈도우(302)가 필요할 수 있다.Unlike the case of the compression algorithm implemented by software, in the case of compression implemented by hardware, since a partial string having the same length as the first window 301 must be stored in advance, it is necessary to know what partial string should appear next. Accordingly, the second window 302 may be required.

프로세서(110)는 검색, 등록 및 압축 단계를 제1 윈도우(301) 내의 각 문자로부터 시작하는 전체 부분 문자열에 대해 동시에 수행할 수 있다. 이 점에서 프로세서(110)는 병렬 처리를 수행할 수 있다. 프로세서(110)가 제1 윈도우(301) 내의 각 문자로부터 시작하는 모든 부분 문자열에 대해 상기 과정을 완료한 경우, 제1 윈도우(301)는 제2 윈도우(302)의 위치로 이동할 수 있고 제2 윈도우(302)는 다음 구간으로 이동할 수 있다. 제1 윈도우(301)는 제1 윈도우(301) 내에 문자를 포함하지 않을 때까지 이동할 수 있다. 제1 윈도우(301)가 문자를 포함하지 않는다면 압축 과정이 종료된다.Processor 110 may perform the searching, registering, and compressing steps concurrently on the entire substring starting from each character in the first window 301. [ In this regard, the processor 110 may perform parallel processing. When the processor 110 completes the process for all substrings starting from each character in the first window 301, the first window 301 can move to the position of the second window 302, Window 302 may move to the next section. The first window 301 can move until the first window 301 does not contain a character. If the first window 301 does not contain a character, the compression process is terminated.

도 4는 일실시예에 따른 제1 윈도우(301)와 복수의 사전의 대응 관계, 각 사전을 구성하는 문자열 메모리(401)와 오프셋 메모리의 구조(402) 및 아스키 코드표(403)와 주소와의 관계를 나타낸다.4 shows a correspondence relationship between a first window 301 and a plurality of dictionaries according to an embodiment, a structure of a character string memory 401 and an offset memory 402 constituting each dictionary, an ASCII code table 403, .

제1 윈도우(301) 내의 각 문자의 위치는 복수의 사전 각각에 대응할 수 있다. 제1 윈도우(301)가 포함하는 문자의 개수는 사전의 개수와 동일할 수 있다. 예를 들어, 윈도우가 포함하는 문자의 개수가 N이라면 사전의 개수도 N개일 수 있다. 제1 윈도우(301)와 사전의 대응은 등록 단계에서 의미가 있다. 예를 들어, 등록 단계에서 윈도우(301)에 포함된 5번째 문자로부터 시작하는 부분 문자열은 5번째 사전에 등록될 수 있다. 다른 실시예로서, 사전의 개수는 윈도우가 포함하는 문자의 개수보다 크거나 작을 수 있다. 이 경우 프로세서(110)은 미리 설정된 기준에 따라 복수의 사전과 윈도우가 포함하는 문자의 위치를 연관 지을 수 있다. The position of each character in the first window 301 can correspond to each of a plurality of dictionaries. The number of characters included in the first window 301 may be the same as the number of dictionaries. For example, if the number of characters included in the window is N, the number of dictionaries may be N. [ The correspondence between the first window 301 and the dictionary is meaningful in the registration step. For example, in the registration step, the substring starting from the fifth character included in the window 301 can be registered in the fifth dictionary. In another embodiment, the number of dictionaries may be greater or less than the number of characters the window contains. In this case, the processor 110 may associate a plurality of dictionaries and positions of characters included in the window according to a preset reference.

문자열 메모리(401)는 깊이(depth)와 넓이(width)를 가질 수 있다. 깊이는 문자열 메모리의 주소를 의미할 수 있다. 일실시예로서 부분 문자열의 첫 번째 문자를 주소로 변환한 값을 주소로 사용하는 경우 알파벳 문자 하나는 1 바이트이므로 깊이는 0 ~ 255의 주소, 즉 28개의 주소를 가질 수 있다. 문자를 주소로 변환하는 함수는 아스키 코드 변환에 의한 것일 수 있다. 문자열의 최대 압축 길이를 제1 윈도우(301)가 포함할 수 있는 문자의 수로 정한 경우, 넓이는 제1 윈도우(301)가 포함할 수 있는 문자의 수와 동일할 수 있다. 예를 들어, 제1 윈도우(301)가 포함할 수 있는 문자의 개수가 N이라면, 문자열 메모리(401)의 넓이도 N일 수 있다. 따라서 프로세서(110)가 부분 문자열을 최대로 압축하는 경우, 프로세서(110)는 제1 윈도우(301)가 포함하는 문자의 개수의 길이만큼의 부분 문자열(길이 N)을 문자열 메모리(401)에 저장된 문자열(길이 N)로 압축할 수 있다.The string memory 401 may have a depth and a width. Depth can refer to the address of the string memory. In an embodiment, when a value obtained by converting the first character of a substring into an address is used as an address, one alphabet character is one byte, and thus the depth may have an address of 0 to 255, that is, 28 addresses. A function that converts a character to an address may be by ASCII code conversion. If the maximum compression length of the character string is set to the number of characters that can be included in the first window 301, the width may be equal to the number of characters that the first window 301 can include. For example, if the number of characters that can be included in the first window 301 is N, the width of the character string memory 401 may also be N. [ Therefore, when the processor 110 compresses the partial string to the maximum, the processor 110 stores the partial string (length N) corresponding to the length of the number of characters included in the first window 301 in the string memory 401 It can be compressed to a string (length N).

문자열 메모리(401)는 현재 압축이 수행되는 제1 윈도우(301)가 포함하는 문자의 개수와 같은 길이의 문자열을 저장할 수 있다. 제1 윈도우(301) 내에서 첫 번째 이외의 문자로부터 시작하는 부분 문자열은 제1 윈도우(301)의 길이보다 짧기 때문에 프로세서(110)는 부족한 길이는 제2 윈도우(302)의 문자를 가져와 저장할 수 있다. 즉, 프로세서(110)는 문자열 메모리(401)의 넓이와 동일한 길이의 부분 문자열을 저장할 수 있다.The string memory 401 may store a string having a length equal to the number of characters included in the first window 301 in which compression is currently performed. Since the substring starting from the first non-first character in the first window 301 is shorter than the length of the first window 301, the processor 110 can retrieve and store the characters of the second window 302 have. That is, the processor 110 may store a substring of a length equal to the width of the string memory 401. [

소프트웨어로 구현된 사전 기반 압축 알고리즘은 주소 변환 시 해쉬 함수를 사용한다. 이때 해쉬 테이블의 버킷이 32비트라면 232의 메모리 공간이 필요하게 된다. 이를 하드웨어에서 그대로 구현할 경우 매우 큰 메모리가 필요하기 때문에, 하드웨어로 구현된 본 발명의 일실시예는 소프트웨어로 구현된 압축 알고리즘 보다 단순한 해쉬 함수를 이용할 수 있다.A software - based dictionary - based compression algorithm uses a hash function for address translation. At this time, if the bucket of the hash table is 32 bits, 232 memory space is required. Since a very large amount of memory is required when implementing the same in hardware, an embodiment of the present invention implemented by hardware can use a simple hash function rather than a compression algorithm implemented by software.

프로세서(110)는 아스키 코드표(403)를 이용하여 단순한 해쉬 함수를 구현할 수 있다. 프로세서(110)는 압축하려는 각각의 부분 문자열의 첫 문자의 아스키 코드 표(403)에 따라 나온 숫자를 주소값으로 사용할 수 있다. 아스키 코드는 0 ~ 255의 값을 가지고 있으므로 아스키 코드 표(403)에 따라 도출된 숫자는 도 4의 문자열 메모리(401) 또는 오프셋 메모리(402)의 깊이에 대응될 수 있다.The processor 110 may implement a simple hash function using the ASCII code table 403. [ The processor 110 may use the number derived from the ASCII code table 403 of the first character of each substring to be compressed as an address value. Since the ASCII code has a value of 0 to 255, the number derived according to the ASCII code table 403 may correspond to the depth of the character string memory 401 or the offset memory 402 of FIG.

오프셋 메모리(402)는 깊이(depth)와 넓이(width)를 가질 수 있다. 오프셋 메모리(402)의 깊이는 문자열 메모리(401)의 깊이와 동일한 방식으로 주소를 의미할 수 있다. 넓이는 부분 문자열의 위치를 저장하기 때문에 전체 문자열의 길이에 따라 다를 수 있다. 전체 문자열의 길이가 256이라면 오프셋 메모리(402)는 256개의 위치를 저장할 수 있어야 하기 때문에 넓이는 1바이트일 수 있다. 즉, K는 8일 수 있다.The offset memory 402 may have a depth and a width. The depth of the offset memory 402 may refer to the address in the same manner as the depth of the string memory 401. The width stores the location of the substring, so it may vary depending on the length of the entire string. If the length of the entire string is 256, the offset memory 402 must be able to store 256 positions, so the width may be 1 byte. That is, K may be 8.

도 5는 일실시예에 따른 압축을 수행하는 알고리즘을 나타낸다.5 shows an algorithm for performing compression according to one embodiment.

프로세서(110)는 전체 문자열에서 윈도우(301)를 이동하면서 윈도우(301) 내의 각 문자열을 시작점으로 하는 부분 문자열에 대해 전체 과정을 진행할 수 있다. 윈도우(301)는 제1 윈도우(301)를 의미할 수 있다. 단계(510)에서 프로세서(110)는 현재 윈도우(301) 내에 있는 각 문자로부터 시작하는 부분 문자열 각각을 복수의 사전에서 검색할 수 있다. 복수의 사전에서 검색하는 것은 각 부분 문자열 전체에 대해 동시에 수행될 수 있다. 따라서, 프로세서(110)는 윈도우(301) 단위로 병렬 처리를 수행할 수 있다. 이때, 프로세서(110)는 각 문자의 변환된 주소값에 기초하여 복수의 사전의 문자열 메모리에 저장된 문자열과 부분 문자열을 비교할 수 있다.The processor 110 may move the window 301 from the entire character string to the entire process for the partial character string starting from the character string in the window 301. [ The window 301 may refer to the first window 301. At step 510, the processor 110 may search for each of the substrings starting from each character in the current window 301 in a plurality of dictionaries. Searching in a plurality of dictionaries can be performed simultaneously for all of the substrings. Accordingly, the processor 110 may perform the parallel processing in units of the windows 301. At this time, the processor 110 may compare the string stored in the string memory of the plurality of dictionaries with the partial string based on the converted address value of each character.

소프트웨어로 구현된 사전 기반 압축 알고리즘은 1바이트씩 입력 버퍼에서 값을 읽어와서 윈도우(301)를 1바이트씩 슬라이딩 해가면서 직접 비교하기 때문에 직렬적으로 처리한다. 하지만 본 발명의 일실시예에 따르면, 하드웨어로 구현된 사전 기반 압축 알고리즘은 윈도우(301)내에 포함된 각 문자로부터 시작하는 부분 문자열 전체에 대해 복수의 사전에서 동시에 검색 과정이 진행되기 때문에 병렬 처리를 수행할 수 있다. 예를 들어 윈도우(301)가 8바이트라면 프로세서(110)는 8개의 문자로부터 시작하는 부분 문자열 8개를 동시에 복수의 사전과 비교하기 때문에 프로세서(110)는 병렬 처리를 수행할 수 있다.The dictionary-based compression algorithm implemented by the software reads the values in the input buffer one byte at a time, and directly scans the window 301 one byte at a time, thereby processing it in a serial manner. However, according to the embodiment of the present invention, the dictionary-based compression algorithm implemented in hardware can perform the parallel processing because the searching process is performed in a plurality of dictionaries simultaneously for the whole substrings starting from each character included in the window 301 Can be performed. For example, if the window 301 is 8 bytes, the processor 110 may perform parallel processing because it compares 8 subsequences starting with 8 characters to a plurality of dictionaries at the same time.

부분 문자열의 길이는 윈도우(301)(또는 제1 윈도우)의 길이와 동일할 수 있다. 이때 부족한 길이의 문자는 제2 윈도우(302)에서 가져와 보충할 수 있다.The length of the substring may be the same as the length of the window 301 (or the first window). At this time, characters of a short length can be taken from the second window 302 and supplemented.

단계(520)에서 프로세서(110)는 압축하려는 부분 문자열과 사전(문자열 메모리(210))에서 검색된 문자열의 일치 길이를 판단할 수 있다. 프로세서(110)는 일치 길이가 미리 설정한 최소 길이 이상인 경우 압축을 수행할 수 있다. 일실시예로 위치(offset)와 일치 길이가 보통 3바이트를 차지하므로 4바이트 이상 일치해야 이득이 1바이트일 수 있다. 따라서 4바이트 이상 일치해야 압축을 수행하게 될 수 있다. 여기서 최소 길이는 4바이트일 수 있다.At step 520, the processor 110 may determine the matching length of the substring to compress and the string searched in the dictionary (string memory 210). Processor 110 may perform compression if the match length is greater than or equal to a predetermined minimum length. In one embodiment, since the offset and the matching length usually occupy 3 bytes, it is necessary to match 4 bytes or more so that the gain can be 1 byte. Therefore, it is necessary to match 4 bytes or more to perform compression. Where the minimum length may be 4 bytes.

단계(530)에서 프로세서(110)는 부분 문자열의 압축을 수행할 수 있다. 프로세서(110)는 현재 압축이 수행되어야 하는 부분 문자열이 사전(문자열 메모리(210))에 등록되어 있는 경우, 그 문자열이 등장한 위치(offset)와 일치 길이(match length)로 현재 압축이 수행되어야 하는 문자열을 대체할 수 있다.At step 530, the processor 110 may perform compression of the substring. When the substream to be currently compressed is registered in the dictionary (string memory 210), the processor 110 determines that the current compression should be performed with a match length that matches the offset of the string You can substitute a string.

압축하려는 부분 문자열의 위치(offset)는 윈도우(301)의 위치에서 오프셋 메모리(220)에서 읽어온 값을 빼서 구할 수 있다. 다시 말해 프로세서(110)는 부분 문자열과 동일한 문자열(최소 길이 이상 일치하는 문자열)이 등장한 위치 (절대적 위치) 대신 부분 문자열과 사전에 등록된 동일한 문자열(최소 길이 이상 일치하는 문자열)의 상대적인 위치를 구할 수 있다. 절대적 위치보다 상대적 위치가 일반적으로 더 작기 때문에 상대적 위치를 압축 결과로 출력할 경우 압축 효과를 높일 수 있다.The offset of the partial string to be compressed can be obtained by subtracting the value read from the offset memory 220 at the position of the window 301. In other words, the processor 110 obtains the relative position of the partial string and the same string registered in advance (the string matching the minimum length or more) instead of the position (absolute position) where the same string . Since the relative position is generally smaller than the absolute position, the compression effect can be increased when the relative position is output as the compression result.

일치 길이가 최소 길이 미만인 경우 프로세서(110)는 부분 문자열이 사전에서 검색되지 않았는지 판단할 수 있다. 다시 말해 프로세서(110)는 부분 문자열과 사전(문자열 메모리(210))에 등록된 문자열의 일치 길이가 0인지 판단할 수 있다. 일치하는 길이가 0인 경우 부분 문자열은 사전에 등록되지 않은 문자열이므로 프로세서(110)는 다음에 검색할 부분 문자열의 압축을 위해 사전에 일치 길이가 0인 부분 문자열을 등록할 수 있다. 이때, 등록하고자 하는 사전 및 주소의 위치가 이미 등록된 문자열과 겹치는 경우 출현 빈도수를 기초로 더 높은 빈도수를 가진 문자열을 등록할 수 있다.If the match length is less than the minimum length, the processor 110 may determine whether the substring has not been retrieved from the dictionary. In other words, the processor 110 may determine whether the matching length of the partial string and the string registered in the dictionary (string memory 210) is zero. If the matching length is 0, the partial string is not registered in the dictionary. Therefore, the processor 110 may register a partial string having a matching length of 0 in advance to compress the partial string to be searched next. At this time, when the position of the dictionary and address to be registered overlap with the already registered string, a string having a higher frequency can be registered based on the appearance frequency.

다른 실시예로 프로세서(110)는 부분 문자열과 사전(문자열 메모리(210))에 등록된 문자열의 일치 길이가 최소 길이 미만인지 판단할 수 있다. 일치하는 길이가 최소 길이 미만인 경우 부분 문자열을 사전에 등록할 수 있다. 이때, 단계(540)에서 등록하고자 하는 사전 및 주소가 겹치는 경우, 부분 문자열은 일치하는 부분을 포함하는 이미 등록된 문자열과 부분 문자열의 출현 빈도수를 비교하여 빈도수가 큰 문자열을 사전에 등록할 수 있다. In another embodiment, the processor 110 may determine whether the matching length of the substring and the string registered in the dictionary (string memory 210) is less than the minimum length. If the matching length is less than the minimum length, the substring can be registered in advance. At this time, if the dictionary and address to be registered overlap in step 540, the substream may be registered in advance by comparing the appearance frequency of the already registered string including the matching part with the frequency of occurrence of the substring .

단계(550)에서 프로세서(110)는 복수의 사전들 중 윈도우(301)에 포함된 문자의 위치에 대응되는 사전을 찾을 수 있다. 즉, 프로세서(110)가 윈도우(301)에 포함된 특정 문자로부터 시작하는 부분 문자열을 등록하려는 경우, 프로세서(110)는 부분 문자열의 첫 문자(특정 문자)가 윈도우가 포함하는 문자 중에서 몇 번째 문자인지를 파악한 후, 같은 순번의 사전에 부분 문자열을 등록할 수 있다. 다른 실시예로서, 프로세서(110)은 미리 설정된 기준에 따라 복수의 사전 중 어느 사전에 부분 문자열을 등록할 지 결정할 수 있다. 다른 실시예로서, 프로세서(110)은 사전이 선택된 경우 미리 설정된 기준에 따라 사전의 문자열 메모리의 어느 주소에 부분 문자열을 등록할 지 결정할 수 있다. 또한, 프로세서(110)는 부분 문자열의 시작 문자를 변환한 주소값에 기초하여 부분 문자열을 등록할 주소를 찾을 수 있다. 프로세서(110)는 찾아낸 사전의 해당 주소에 기초하여 문자열 메모리(210)에 부분 문자열을 등록하고, 오프셋 메모리(220)에 전체 문자열에 대한 부분 문자열의 위치를 등록할 수 있다.In step 550, the processor 110 may look up a dictionary corresponding to the position of the character contained in the window 301 among the plurality of dictionaries. That is, if the processor 110 wishes to register a substring starting from a specific character contained in the window 301, the processor 110 determines whether the first character (a specific character) of the substring is the first character And then register the partial string in the dictionary of the same sequence number. In another embodiment, the processor 110 may determine in which dictionary of the plurality of dictionaries to register a substring according to a preset criteria. In another embodiment, the processor 110 may determine at which address in the dictionary string memory to register the substring according to a preset criteria if the dictionary is selected. In addition, the processor 110 can find an address to register the substring based on the address value that has converted the start character of the substring. The processor 110 may register the partial string in the string memory 210 based on the corresponding address of the found dictionary and register the position of the partial string in the offset memory 220 with respect to the entire string.

윈도우(301) 내의 각 문자로부터 시작하는 부분 문자열에 대한 검색, 등록, 압축이 모두 완료된 경우, 단계(560)에서 윈도우(301)는 다음 구간으로 이동할 수 있다. 단계(570)에서 프로세서(110)는 윈도우(301)에 남아있는 문자가 없는 지를 판단할 수 있다. 남아있는 문자가 없다면 프로세서(110)는 전체 문자열에 대해 압축이 종료되었다고 판단할 수 있다. 남아있는 문자가 있다면 프로세서(110)는 단계(510)부터 전체 과정을 반복할 수 있다.When the search, registration, and compression for the substring starting from each character in the window 301 are all completed, the window 301 can move to the next section in step 560. At step 570, the processor 110 may determine if there are no characters remaining in the window 301. [ If there are no remaining characters, the processor 110 may determine that compression has been completed for the entire string. If there are any remaining characters, the processor 110 may repeat the entire process from step 510.

도 6은 일실시예에 따른 검색 단계를 나타낸다.6 shows a search step according to one embodiment.

전체 문자열(610)은 세 구간으로 나뉘어져 있을 수 있다. 제1 윈도우(301)는 현재 1구간에 위치할 수 있다. 이때 제1 윈도우(301)에 포함된 각 문자로부터 시작되는 제1 윈도우(301)와 동일한 길이의 부분 문자열들(620)이 검색 대상이 될 수 있다. 프로세서(110)는 윈도우가 포함하는 문자의 개수의 길이와 동일한 길이의 문자열 단위로 검색을 수행하므로, 부족한 문자는 제2 윈도우(302)에 포함된 문자를 통해 보충할 수 있다. 예를 들어, HABCDEYU의 경우 제1 윈도우(301) 내에는 H밖에 없기 때문에 프로세서(110)는 제2 윈도우(302)로부터 ABCDEYU를 보충할 수 있다.The entire string 610 may be divided into three sections. The first window 301 may be located in the first section. At this time, partial strings 620 having the same length as the first window 301 starting from each character included in the first window 301 can be searched. The processor 110 searches for a character string having a length equal to the length of the number of characters included in the window so that the missing character can be supplemented through the characters included in the second window 302. [ For example, in the case of HABCDEYU, the processor 110 can supplement ABCDEYU from the second window 302 because it is only H in the first window 301. [

문자열 메모리(630)와 오프셋 메모리(640)는 제1 윈도우(301)에 포함된 문자의 개수와 동일한 넓이를 가질 수 있다. 검색단계에서 각 부분 문자열(620)은 동시에 모든 문자열 메모리(630)의 해당 주소에 저장된 문자열과 비교될 수 있다. 다시 말하면, 부분 문자열(620) 전체에 대해 동시에 검색이 이루어지므로 프로세서(110)는 병렬 처리를 수행할 수 있다. 여기서 해당 주소는 각 부분 문자열의 첫 문자를 주소 변환한 값을 의미할 수 있다. 예를 들어, 도 4의 아스키 코드표(403)에서 A는 97로 변환되므로 프로세서(110)는 문자열 메모리(630)의 97번 주소에 저장된 문자열과 ABCDEFGH를 비교할 수 있다.The character string memory 630 and the offset memory 640 may have the same width as the number of characters included in the first window 301. In the search step, each substring 620 can be compared with a string stored in the corresponding address of all the string memory 630 at the same time. In other words, because the search is made simultaneously for the entire substring 620, the processor 110 can perform parallel processing. Here, the address may refer to a value obtained by address-converting the first character of each substring. For example, since A is converted to 97 in the ASCII code table 403 of FIG. 4, the processor 110 can compare ABCDEFGH with the string stored at address 97 of the string memory 630.

프로세서(110)는 부분 문자열(620)과 문자열 메모리(630)에 저장된 문자열의 일치 길이를 판단할 수 있다. 일치 길이가 최소 길이 이상인 경우 도 5의 압축단계(530)를 진행할 수 있다. 일치 길이가 최소길이 미만인 경우 아예 일치하는 길이가 없다면, 즉 검색되지 않은 문자열로서 사전에 등록되지 않은 문자열이라면 도 5의 등록단계(550)를 진행할 수 있다. The processor 110 may determine the matching length of the string stored in the substring 620 and the string memory 630. If the match length is greater than or equal to the minimum length, the compression step 530 of FIG. 5 may proceed. If the match length is less than the minimum length, then the registering step 550 of FIG. 5 may proceed if there is no matching length, i. E., A string not previously registered as a non-searched string.

일치 길이가 있지만 최소길이 미만인 경우 프로세서(110)는 압축을 수행하지 않고 그대로 출력할 수 있다. 이는 최소 길이 미만의 부분 문자열을 압축하는 경우 압축의 이득이 없기 때문이다.If there is a match length but less than the minimum length, the processor 110 can output it without performing compression. This is because there is no gain in compression when compressing substrings less than the minimum length.

도 7은 일실시예에 따른 등록 단계를 나타낸다.7 illustrates a registration step according to one embodiment.

도 7에서 제1 윈도우(301)는 1구간에 위치하고 있다. 프로세서(110)는 부분 문자열이 사전에서 검색되지 않았는지 판단할 수 있고 검색되지 않은 경우 (혹은 일치 길이가 0인 경우) 등록단계를 수행할 수 있다. 제1 윈도우(301)가 포함하는 문자의 개수와 사전의 개수가 동일할 수 있고, 제1 윈도우(301)가 포함하는 각 문자의 위치는 복수의 사전 각각에 대응할 수 있다. In FIG. 7, the first window 301 is located in one section. The processor 110 may determine whether the substring has not been retrieved from the dictionary and may perform the registration step if it is not retrieved (or the match length is zero). The number of characters included in the first window 301 may be the same as the number of dictionaries and the position of each character included in the first window 301 may correspond to each of the plurality of dictionaries.

프로세서(110)는 복수의 사전들 중 제1 윈도우(301)가 포함하는 문자의 위치에 대응되는 사전을 찾을 수 있다. 또한, 프로세서(110)는 부분 문자열의 시작 문자를 변환한 주소값에 기초하여 해당 주소를 찾을 수 있다. 프로세서(110)는 찾아낸 사전의 해당 주소에 기초하여 문자열 메모리(630)에 부분 문자열을 등록하고, 오프셋 메모리(710)에 전체 문자열에 대한 부분 문자열의 위치를 등록할 수 있다.The processor 110 can find a dictionary corresponding to a position of a character included in the first window 301 among a plurality of dictionaries. The processor 110 may also find the address based on the address value that translated the start character of the substring. The processor 110 may register the partial string in the string memory 630 based on the corresponding address of the found dictionary and register the position of the partial string in the offset memory 710 with respect to the entire string.

예를 들어 도 7에서 프로세서(110)가 BCDEFGHA를 등록하는 경우, 1구간에서 제1 윈도우가 포함하는 문자 중 B는 두 번째 문자이므로 두 번째 사전인 사전 1에 대응할 수 있다. 프로세서(110)는 부분 문자열의 첫 번째 문자인 B의 아스키 코드표에 의한 변환의 결과값인 98에 해당하는 사전의 주소에 부분 문자열 BCDEFGHA을 등록할 수 있다.For example, when the processor 110 registers BCDEFGHA in FIG. 7, among the characters included in the first window in the section 1, B is the second character, so it can correspond to the dictionary 1 which is the second dictionary. The processor 110 may register the substring BCDEFGHA at the address of the dictionary corresponding to 98, which is the result of the conversion by the ASCII code table of B, which is the first character of the substring.

제1 윈도우(301)가 포함하는 각 문자의 위치에 대응되는 사전에서, 부분 문자열의 시작 문자를 변환한 주소값에 대응하는 문자열 메모리(630) 및 오프셋 메모리(710)에 다른 부분 문자열 및 상기 다른 부분 문자열의 위치가 이미 등록된 경우 덮어쓰는 문제가 발생할 수 있다. 이 경우 프로세서(110)는 전체 문자열에서 이미 등록된 부분 문자열과 현재 등록하려는 부분 문자열이 전체 문자열에서 나타나는 빈도수를 비교하여 어느 것을 등록할 지 결정할 수 있다. In the dictionary corresponding to the position of each character included in the first window 301, the string memory 630 corresponding to the address value obtained by converting the start character of the partial string, and the other partial strings in the offset memory 710, If the position of the substring has already been registered, a problem of overwriting may occur. In this case, the processor 110 can determine which one to register by comparing the frequency of appearing in the entire string with the partial string already registered in the entire character string and the partial character string to be registered at present.

도 8은 일실시예에 따른 압축 단계를 나타낸다.8 illustrates a compression step according to one embodiment.

제1 윈도우(301)는 1구간에서 사전 0의 주소 97에 ABCDEFGH를 등록한 후 현재 구간2에 위치해 있을 수 있다. 부분 문자열 ABCDEYUI 중 실제로 압축이 이루어지는 ABCDE를 포함하는 등록된 부분 문자열인 ABCDEFGH가 사전 0의 주소 97에 등록되어 있다. 압축이 수행되는 최소 길이를 4바이트로 설정한 경우, 부분 문자열 ABCDEYUI와 등록된 부분 문자열 ABCDEFGH은 일치 길이(5 바이트)가 최소 길이(4 바이트) 이상이므로 프로세서(110)는 ABCDE에 대해 압축을 수행할 수 있다.The first window 301 may be located in the current section 2 after ABCDEFGH is registered at the address 97 of the dictionary 0 in the first section. The registered substring ABCDEFGH containing the ABCDE in which the actual compression of the substring ABCDEYUI is actually registered is registered at the address 97 of the dictionary 0. When the minimum length of compression is set to 4 bytes, the processor 110 compresses ABCDE since the length of the matching string (5 bytes) is equal to or more than the minimum length (4 bytes) of the partial string ABCDEYUI and the registered partial string ABCDEFGH can do.

프로세서(110)는 실제 압축이 수행 되어야 할 문자열(ABCDE)이 등장한 상대적 위치(offset)와 일치 길이(match length)로 실제 압축이 수행 되어야 할 문자열 (ABCDE)을 대체할 수 있다. 위치(offset)는 제1 윈도우(301)의 위치에서 오프셋 메모리(640)에서 읽어온 값을 빼서 구할 수 있다. 여기서 오프셋 메모리(640)에서 읽어온 값은 동일한 주소의 문자열 메모리에 등록된 부분 문자열이 전체 문자열에서 나타나는 위치를 의미할 수 있다. The processor 110 may replace a character string (ABCDE) to be actually compressed with a relative position offset in which a character string (ABCDE) to be actually compressed is present. The position offset can be obtained by subtracting the value read from the offset memory 640 from the position of the first window 301. Here, the value read from the offset memory 640 may indicate the position where the partial string registered in the string memory of the same address appears in the entire string.

제1 윈도우(301)의 위치는 제1 위도우(301)가 포함하는 첫 문자의 위치를 의미할 수 있다. 도 8에서 각 문자의 위치를 ABCDEFGH 순으로 A의 위치를 0이라고 한다면, 제1 윈도우(301)의 위치는 8일 수 있다. 오프셋 메모리(640)에서 읽어온 값은 0일 수 있다. 따라서 문자열(ABCDE)이 등장한 상대적 위치(offset)는 8 - 0 = 8이 될 수 있다.The position of the first window 301 may indicate the position of the first character included in the first widow 301. In FIG. 8, if the position of each character is ABCDEFGH and the position of A is 0, the position of the first window 301 may be 8. The value read from the offset memory 640 may be zero. Therefore, the relative position offset of the string (ABCDE) can be 8 - 0 = 8.

1구간의 각 문자로부터 시작하는 부분 문자열들은 모두 사전에서 검색되지 않는 문자열로서 어느 것도 압축되지 않는다면, 2구간의 A로부터 시작하는 부분 문자열까지 처리한 결과는 결과(820)와 같을 수 있다. 결과(820)에서 ABCDEFGH는 1구간의 문자열이 압축되지 않아서 그대로 출력된 것이며 85는 ABCDEYUI 중 ABCDE가 압축된 결과이고, 여기서 8은 위치(offset)를 의미하고 5는 일치하는 길이(ABCDE)를 의미할 수 있다.If the substrings starting from each character in the first section are not searched in the dictionary and if neither of them is compressed, the result of processing from the section A to the substrings starting from A may be the same as the result 820. In the result (820), ABCDEFGH is the output of the character string of the first section as it is not compressed, and 85 is the result of ABCDE of ABCDEYUI being compressed, where 8 is the position offset and 5 means the matching length (ABCDE) can do.

본 발명의 다른 실시예에 따르면, 하드웨어에서 구현된 사전 기반 압축 알고리즘에 의해 압축된 문자열을 해제하는 방법을 제공할 수 있다. According to another embodiment of the present invention, it is possible to provide a method of releasing a compressed string by a dictionary-based compression algorithm implemented in hardware.

프로세서는 부분 문자열의 일치하는 길이와 전체 문자열에 대한 부분 문자열의 위치로 압축된 문자열에서 압축된 부분을 검색할 수 있다. 이후 압축된 부분의 위치에 대응하는 문자열로부터 압축된 길이만큼의 문자열을 복구할 수 있다. 프로세서는 복구된 문자열로 상기 압축된 부분을 대체함으로써 압축 해제를 완료할 수 있다.The processor can retrieve the compressed portion of the compressed string by matching the length of the substring and the position of the substring for the entire string. The compressed string can be recovered from the string corresponding to the position of the compressed portion. The processor can complete decompression by replacing the compressed portion with the recovered string.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어부로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software components to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced. Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

301: 제1 윈도우
302: 제2 윈도우
401: 문자열 메모리
402: 오프셋 메모리
403: 아스키 코드표301: First window
302: second window
401: String memory
402: offset memory
403: ASCII code table

Claims

Retrieving a substring corresponding to the window in the entire string in each of the plurality of dictionaries;
When a matching portion of at least one dictionary registered in the plurality of dictionaries and the partial string is searched for, the length of the matching portion and the position of the matching portion with respect to the entire character string are used, Compressing the matching portion within a string;
Registering the partial string in any one of the plurality of dictionaries when the partial string is not searched in each of the plurality of dictionaries as a match of a predetermined minimum length or more,
A string compression method that includes.

The method according to claim 1,
Wherein the searching comprises:
And searching the plurality of dictionaries for each of the plurality of partial strings extracted while moving the window in the entire string.

The method according to claim 1,
Wherein the searching comprises:
And a substring starting from a character corresponding to each position in the window is searched in each of the plurality of dictionaries.

The method according to claim 1,
Wherein the searching comprises:
And searching for the substring in the plurality of dictionaries corresponding to the number of characters included in the window.

The method according to claim 1,
Wherein the searching comprises:
And searching for each of the plurality of dictionaries in the plurality of dictionaries based on an address value obtained by converting a start character in the partial string.

6. The method of claim 5,
Wherein the searching comprises:
And searching said plurality of dictionaries for each of said plurality of dictionaries based on an address value obtained by converting a start character in said partial string by an ASCII code table.

The method according to claim 1,
Wherein the registering step comprises:
The position of the partial character string for the entire character string and the position of the partial character string for the entire character string based on the address value obtained by converting the starting character of the partial character string into a dictionary related to the position of the character corresponding to each position in the window among the plurality of dictionaries A string compression method that registers.

8. The method of claim 7,
Wherein the registering step comprises:
When a position of another partial string and a position of another partial string are registered in an address value obtained by converting a starting character of the partial string in a dictionary corresponding to a position of a character corresponding to each position in the window, And determining which of the partial string and the other partial string is to be registered.

Retrieving a compressed portion in a compressed substring using a length of a matching portion of a plurality of pre-registered strings and a pre-compressed substring and a position of the matching portion with respect to the entire string;
Restoring the compressed substring of the length of the matching portion using the plurality of pre-registered strings corresponding to the position; And
Replacing the compressed portion with the recovered string,
The compressed substring may be a sub-
The pre-compressed substring corresponding to the window in the entire string is searched in each of the plurality of dictionaries, and a matching portion of at least one dictionary registered in the plurality of dictionaries with the pre- Compresses the matching portion in the pre-compressed substring using the length of the matching portion and the position of the matching portion with respect to the entire character string, If the partial string before compression is not searched for a match of at least a predetermined minimum length, the partial string before compression is registered by being registered in any one of the plurality of dictionaries
How to uncompress a string.