KR20040087503A

KR20040087503A - Data compression method for multi-byte character language

Info

Publication number: KR20040087503A
Application number: KR1020030021924A
Authority: KR
Inventors: 조균연
Original assignee: 주식회사 팬택
Priority date: 2003-04-08
Filing date: 2003-04-08
Publication date: 2004-10-14
Also published as: CN1536768A; KR100494876B1; CN100474781C

Abstract

PURPOSE: A method for compressing 2-byte character data is provided to save a storage space by compressing a 2-byte text message and storing the compressed data. CONSTITUTION: The maximum number of character strings, the number of code words and initial dictionary entry number are initialized, characters with high frequency are stored in a basic dictionary table, and a variable indicating a next codeword to be registered is initialized(S101). An additional compression-available codeword is stored in a supplementary dictionary table including the basic dictionary table, and the variable indicating the next codeword to be registered is re-initialized(S102). Message data is sequentially inputted and if the inputted data is included in the compression-available codeword, a corresponding code is searched from the dictionary table and outputted(S103). It is checked whether the corresponding code is in the dictionary(S104). It is determined whether it is the end of the data(S105). If so, a flushing process is performed(S106).

Description

DATA COMPRESSION METHOD FOR MULTI-BYTE CHARACTER LANGUAGE}

본 발명은 2바이트 문자 데이터 압축 방법에 관한 것으로, 특히, 이동 통신 단말기에 있어서 SMS(Short Message Service)나 EMS(Enhanced Messaging Service)의 메시지의 저장 공간을 절약하기 위한 2바이트 문자 압축 알고리즘을 적용하는 2바이트 문자 데이터 압축 방법에 관한 것이다.The present invention relates to a two-byte character data compression method, and more particularly, to apply a two-byte character compression algorithm for saving a storage space of a message of a Short Message Service (SMS) or an Enhanced Messaging Service (EMS) in a mobile communication terminal. A double-byte character data compression method.

일반적으로, 사용자는 이동통신 단말기의 메시지 송수신 기능(SMS, EMS)을 이용하여 다양한 정보를 주고받는다. 대부분의 이동통신 단말기는 이러한 메시지에 대한 압축을 거의 하고 있지 않으며 일부 압축을 수행하는 단말기도 영어 알파벳에 적합한 압축 알고리즘을 사용하고 있다.In general, a user sends and receives various information using a message transmission / reception function (SMS, EMS) of a mobile communication terminal. Most mobile communication terminals rarely compress these messages, and some compression terminals use compression algorithms suitable for the English alphabet.

그러나, 이러한 압축 알고리즘의 경우 한글이나 중문과 같은 언어에 있어서는 많은 용장성을 갖고 있으므로 상대적으로 압축 효율이 떨어져 더 많은 메모리를 요구하므로 저장 공간을 효과적으로 절약하지 못하는 문제점이 있다.However, such a compression algorithm has a lot of redundancy in a language such as Korean or Chinese, so that the compression efficiency is relatively low, requiring more memory, which does not effectively save storage space.

상기 문제점을 해결하기 위하여 안출된 본 발명은, 단말기의 메시지 처리 모듈에 있어서, 2바이트의 문자(한글, 중문) 단위로 메시지를 압축하여 저장함으로써, 저장 공간을 절약할 수 있는 2바이트 문자 데이터 압축 방법을 제공하는데 그 목적이 있다.The present invention has been made to solve the above problems, in the message processing module of the terminal, by compressing and storing the message in units of two bytes of characters (Korean, Chinese), 2-byte character data compression that can save storage space The purpose is to provide a method.

도 1은 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법을 나타낸 동작흐름도,1 is a flowchart illustrating a method of compressing double-byte character data according to an embodiment of the present invention;

도 2는 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법에 있어서, 상기 사전 테이블에서 매핑 과정을 통하여 해당 코드를 찾아 출력하는 단계(압축단계)를 상세히 설명한 동작 흐름도,2 is a flowchart illustrating a method of finding and outputting a corresponding code through a mapping process in the dictionary table (compression step) in a method of compressing 2-byte character data according to an embodiment of the present invention;

도 3은 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법에 있어서, 상기 해당 코드사전을 관리하는 사전 생성/관리 단계를 상세히 설명한 동작 흐름도.3 is a flowchart illustrating in detail a pre-generation / management step of managing the corresponding code dictionary in the 2-byte character data compression method according to an embodiment of the present invention.

상기 목적을 달성하기 위하여 본 발명은, 복수개의 압축 가능 부호어를 빈도수에 기반하여 생성하고 기본 사전 테이블에 저장하며, 등록될 다음 부호어를 나타내는 변수를 초기화하는 단계; 입력되는 메시지 데이터가 2바이트 문자인지 식별하여 받아들이는 입력 단계; 입력된 데이터가 상기 압축 가능 부호어에 포함되는지 여부를 비교하고, 상기 압축 가능 부호어에 포함되는 경우에는 상기 사전 테이블에서 매핑 과정을 통하여 해당 코드를 찾아 출력하고, 상기 해당 코드가 사전에 없는 경우 사전에 등록하는 단계; 데이터의 끝인지 여부를 판단하고, 데이터의 입력이 끝나지 않은 경우에는 메시지 데이터가 순차적으로 입력되는 단계로 돌아가는 단계; 및 데이터가 끝난 경우에는 플러시 과정을 수행하는 단계를 포함하고, 상기 압축 가능 부호어의 부호화된 해당 코드의 비트수는, 상기 압축 가능 부호어가 비트를 줄일 수 있는 한계값보다 작으면 log₂(C1+1)-1 비트로 출력하고, 해당 부호어가 한계값보다 크면 log₂(C1+1)비트로 출력하며, 상기 C1은 현재 할당된 부호어 수인 것을 특징으로 한다.In order to achieve the above object, the present invention comprises the steps of generating a plurality of compressible codewords based on the frequency and store in a basic dictionary table, and initializing a variable representing the next codeword to be registered; An input step of identifying and accepting whether the input message data is a 2-byte character; Compare whether or not the input data is included in the compressible codeword, and if it is included in the compressible codeword, find and output the corresponding code through a mapping process in the dictionary table, and if the corresponding code is not in the dictionary Registering in advance; Determining whether it is the end of the data, and if the input of the data is not finished, returning to the step of sequentially inputting message data; And performing a flushing process when the data is over, and if the number of bits of the coded code of the compressible codeword is smaller than a limit value for reducing the bit, log ₂ (C1). If the codeword is +1) -1 bits, and the corresponding codeword is larger than the limit value, the signal is output as log ₂ (C1 + 1) bits, and C1 is the number of currently assigned codewords.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여 본 발명의 가장 바람직한 실시예들을 첨부된 도면을 참조하여 설명하기로 한다.Hereinafter, the most preferred embodiments of the present invention will be described with reference to the accompanying drawings so that those skilled in the art can easily implement the technical idea of the present invention. .

도 1은 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법을 나타낸 동작흐름도로서, 이에 관하여 설명하면 다음과 같다.FIG. 1 is a flowchart illustrating a two-byte character data compression method according to an embodiment of the present invention.

먼저, 최대문자열수(N7), 부호어수(N2), 초기사전엔트리번호(N5) 등을 초기화하고, 빈도수가 높은 문자를 기본 사전 테이블에 저장하며, 등록될 다음 부호어를 나타내는 변수(C1)를 초기화하는데(S101), 문자 압축을 위한 부호어의 구성은하기 표에 따른다. 여기서, 문자 압축에 필요한 부호어를 구하기 위하여 한글 및 영문 혼합화일에서 완성형 한글 2350자의 출현 빈도를 구한다음 이를 정렬하여 고찰하고, 이들중 2%에 해당하는 자주 사용되는 470자를 부호어로 등록한다. 이때, 이러한 2%에 해당하는 470개의 문자가 전체 출현 빈도의 85% 이상을 차지함을 알 수 있다. 이에 따라 상기 변수(C1)의 초기화 값은 471이 될 수 있다.First, the maximum number of strings (N7), number of codewords (N2), initial dictionary entry number (N5), etc. are initialized, the high frequency characters are stored in the basic dictionary table, and the variable (C1) indicating the next codeword to be registered. To initialize (S101), the configuration of the codeword for character compression is according to the following table. Here, in order to obtain a codeword necessary for character compression, the frequency of occurrence of 2350 characters of the completed Hangul characters in Korean and English mixed files is obtained, sorted and considered, and 470 frequently used codewords corresponding to 2% are registered. In this case, it can be seen that 470 characters corresponding to this 2% occupy more than 85% of the total appearance frequency. Accordingly, the initialization value of the variable C1 may be 471.

0~2550-255 ASCIIASCII 256~725256-725 한글코드(470자)Korean code (470 characters) 726~1023726 ~ 1023 10비트 부호화10-bit encoding 1024~20471024-2047 11비트 부호화11-bit encoding 2048~40952048-4095 12비트 부호화12-bit encoding

그 후, 초기화된 변수를 참조하여 추가적인 압축 가능 부호어를 상기 기본 사전 테이블을 포함하는 부가 사전 테이블에 저장하고, 등록될 다음 부호어를 나타내는 변수(C1)를 재초기화한다(S102). 여기서, 압축 가능 부호어의 부호화된 해당 코드의 비트수는 하기 수학식에 따른다.Thereafter, the additional compressible codeword is stored in the additional dictionary table including the basic dictionary table with reference to the initialized variable, and the variable C1 indicating the next codeword to be registered is reinitialized (S102). Here, the number of bits of the corresponding coded code of the compressible codeword is given by the following equation.

여기서, 상기 C1은 현재 할당된 부호어 수, lim은 비트를 줄일 수 있는 한계값을 의미한다. 따라서, 부호어를 비트열로 바꿀때 부호어가 정해진 한계값(lim)보다 작으면 log₂(C1+1)-1 비트로 출력하고, 해당 부호어가 한계값보다 크면 log₂(C1+1)비트로 출력한다.Here, C1 represents a currently allocated number of codewords and lim represents a limit value for reducing bits. Therefore, if a codeword is smaller than the limit value (lim) codeword determined when converting to a bit string log ₂ (C1 + 1) -1 bits, the output, and the code word is greater than the threshold value log ₂ (C1 + 1) bits output do.

예컨대, 상기 C1이 750일 경우 lim=(1024-750-1)=273이므로 압축할 때 부호어가 0에서 273 사이일 경우에는 9비트로 부호화하여 출력하고, 274부터 749까지는 각 부호어에 274를 더하여 10비트로 부호화하여 출력하게 된다.For example, when C1 is 750, since lim = (1024-750-1) = 273, when the codeword is 0 to 273 when compressing, 9-bit is encoded and output, and 274 is added to each codeword from 274 to 749. 10 bits are encoded and output.

압축 해제 시에는 부호어 비트를 9비트로 읽고 이 읽은 값이 274보다 작으면 그 값을 부호어 코드로 취하고, 크면 다시 10비트를 읽어서 274를 뺀 값을 부호어 비트로 취하게 된다. 하기 표 2는 상술한 방식에 의한 본 발명의 사전 테이블 구조를 나타낸 것이다.When decompressing, the codeword bit is read as 9 bits, and if the read value is less than 274, the value is taken as a codeword code. If the read value is larger, 10 bits are read again and the value obtained by subtracting 274 is taken as the codeword bit. Table 2 below shows the dictionary table structure of the present invention according to the above-described method.

압축 가능 부호어Compressible codeword 부호화된 코드Coded code 10진값Decimal value 00 000000000000000000 00 1One 000000001000000001 1One 22 000000010000000010 22 .. .. .. .. .. .. 273273 100010001100010001 273273 274274 10001001001000100100 548(274+274)548 (274 + 274) 275275 10001001011000100101 549(274+275)549 (274 + 275) .. .. .. .. .. .. 749749 11111111111111111111 1023(274+749)1023 (274 + 749)

그 후, 메시지 데이터가 순차적으로 입력된다. 입력된 데이터는 상기 압축 가능 부호어에 포함되는지 여부가 비교되고, 상기 압축 가능 부호어에 포함되는 경우에는 상기 사전 테이블에서 매핑 과정을 통하여 해당 코드를 찾아 출력한다(S103). 그 후, 상기 해당 코드를 사전에 있는지 조사하여 사전에 없으면사전에 등록하는 사전생성 단계를 수행한다(S104).Thereafter, message data is input sequentially. Whether the input data is included in the compressible codeword is compared, and if it is included in the compressible codeword, the corresponding code is found and output through a mapping process in the dictionary table (S103). Thereafter, if the corresponding code is checked in advance, if it does not exist in advance, a pre-generating step of registering in advance is performed (S104).

그 후, 데이터의 끝인지 여부를 판단하고, 데이터가 끝나지 않은 경우에는 메시지 데이터가 순차적으로 입력되는 단계로 돌아간다(S105).After that, it is determined whether or not the end of the data, and if the data is not finished, the process returns to the step of sequentially inputting the message data (S105).

만약, 데이터가 끝난 경우에는 플러시(Flush) 과정을 수행한다(S106). 여기서, 상기 플러시(Flush) 과정이라 함은, 메모리 저장 방법에 있어서, 데이터를 8비트나 16비트로 저장하게 되는데, 압축된 데이터는 가변 길이의 비트 수를 가지게 되므로, 마지막에 저장되는 데이터가 8비트나 16비트에 맞아 떨어지지 않는 경우에 마지막에 남는 비트를 0으로 채우게 되는 과정을 말한다.If the data is over, a flush process is performed (S106). In this case, the flushing process stores data in 8 or 16 bits in the memory storage method. Since the compressed data has a variable number of bits, the last data stored is 8 bits or 16 bits. If the bit does not fall, the last bit is filled with zeros.

도 2는 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법에 있어서, 상기 사전 테이블에서 매핑 과정을 통하여 해당 코드를 찾아 출력하는 단계, 즉, 압축 코드를 출력하는 단계를 상세히 설명한 동작 흐름도로서, 이에 관하여 설명하면 다음과 같다.FIG. 2 is a flowchart illustrating a method of finding and outputting a corresponding code through a mapping process in the dictionary table, that is, outputting a compressed code in the 2-byte character data compression method according to an embodiment of the present invention. This will be described below.

먼저, 입력된 데이터의 첫번째 바이트를 읽는다(S201).First, the first byte of the input data is read (S201).

그 후, 상기 첫번째 바이트가 제1 할당 범위인지 여부를 판단한다(S202). 여기서, 완성형 한글의 경우 첫번째 바이트는 16진수 B0에서 C8까지 25개의 숫자가 할당되어 있으므로, 상기 제1 할당 범위는 상기 16진수 B0에서 C8까지일 수 있다.Thereafter, it is determined whether the first byte is in the first allocation range (S202). Here, in the case of the completed Hangul, 25 numbers are allocated from the hexadecimal number B0 to C8, so the first allocation range may be from the hexadecimal number B0 to C8.

만약, 상기 첫번째 바이트가 제1 할당 범위인 경우에는, 입력된 데이터의 두번째 바이트를 읽는다(S203).If the first byte is the first allocation range, the second byte of the input data is read (S203).

한편, 상기 첫번째 바이트가 제1 할당 범위가 아닌 경우에는, 완성형 한글이아니므로 해당 아스키 코드를 결정한다(S207).On the other hand, when the first byte is not the first allocation range, the ASCII code is determined (S207) because it is not a completed Hangul.

그 후, 상기 두번째 바이트가 제2 할당 범위인지 여부를 판단한다(S204). 여기서, 완성형 한글의 경우 두번째 바이트는 16진수 A1에서 FE까지 94개의 숫자가 할당되어 있으므로, 상기 제2 할당 범위는 상기 16진수 A1에서 FE까지일 수 있다.Thereafter, it is determined whether the second byte is within the second allocation range (S204). Here, in the case of the completed Hangul, the second byte has 94 numbers assigned from hexadecimal A1 to FE, and the second allocation range may be from hexadecimal A1 to FE.

만약, 상기 두번째 바이트가 상기 제2 할당 범위인 경우에는, 입력된 데이터가 상기 사전 테이블에 포함되어 있는지 판단한다(S205).If the second byte is the second allocation range, it is determined whether input data is included in the dictionary table (S205).

한편, 상기 두번째 바이트가 상기 제2 할당 범위가 아닌 경우에는, 완성형 한글이 아니므로 해당 아스키 코드를 결정한다(S207).On the other hand, when the second byte is not the second allocation range, the corresponding ASCII code is determined since it is not a complete Hangul (S207).

만약, 입력된 데이터가 상기 사전 테이블에 포함되어 있는 경우에는 해당 코드값을 결정한다(S206).If the input data is included in the dictionary table, the corresponding code value is determined (S206).

한편, 입력된 데이터가 상기 사전 테이블에 포함되어 있지 않은 경우에는, 출현 빈도가 높은 한글이 아니므로 해당 아스키 코드를 결정한다(S207).On the other hand, if the input data is not included in the dictionary table, the ASCII code is determined since the occurrence frequency is not high in Korean (S207).

도 3은 본 발명의 일 실시예에 의한 2바이트 문자 데이터 압축 방법에 있어서, 상기 해당 코드가 사전에 존재하는 지를 검사하여 없으면 사전에 등록하며 사전에 등록된 코드 중 자주 사용하지 않는 코드는 제거하는 사전 관리 단계를 상세히 설명한 동작 흐름도로서, 이에 관하여 설명하면 다음과 같다.3 is a method of compressing double-byte character data according to an embodiment of the present invention, in which the corresponding code is registered in advance if there is no check in the dictionary, and codes which are not frequently used among the pre-registered codes are removed. An operation flowchart illustrating the pre-management step in detail will be described below.

먼저, 상기 부호어의 문자열이 최대 문자열 수(N7)를 초과하는지 여부를 판단하고, 상기 부호어의 문자열이 최대 문자열 수(N7)를 초과하는 경우에는 사전관리 단계를 종료한다(S301).First, it is determined whether the character string of the codeword exceeds the maximum number of strings N7. If the character string of the codeword exceeds the maximum number of strings N7, the pre-management step is terminated (S301).

만약, 상기 문자열이 최대 문자열 수(N7)를 초과하지 않는 경우에는, 상기 사전 테이블에 존재하는지 여부를 판단하고, 상기 사전 테이블에 존재하는 경우에는 사전 관리 단계를 종료한다(S302).If the character string does not exceed the maximum number of characters N7, it is determined whether the character string exists in the dictionary table, and if the character string exists in the dictionary table, the dictionary management step is terminated (S302).

만약, 상기 사전 테이블에 존재하지 않는 경우에는 상기 문자열을 신규 변수(C1)에 할당한다(S303).If it does not exist in the dictionary table, the character string is assigned to the new variable C1 (S303).

그 후, 신규 변수(C1)는 다음에 생성될 문자열의 부호어에 할당되기 위해 증가된다(S304).Thereafter, the new variable C1 is incremented to be assigned to the codeword of the character string to be generated next (S304).

그 후, 증가된 변수(C1)가 부호어의 수(N2) 이상인지 여부를 판단한다(S305).Thereafter, it is determined whether the increased variable C1 is equal to or greater than the number N2 of codewords (S305).

만약, 증가된 변수(C1)가 부호어의 수(N2) 이상인 경우에는 증가된 변수(C1)에 사전 앤트리 번호(N5)를 할당하고, 증가된 변수(C1)가 부호어의 수(N2) 미만인 경우에는 사전 앤트리 번호(N5)를 할당하지 않는다(S306).If the increased variable C1 is equal to or greater than the number N2 of codewords, the pre-entry number N5 is assigned to the increased variable C1, and the increased variable C1 is the number of codewords N2. If less than), the pre-entry number N5 is not allocated (S306).

그 후, 증가된 신규 변수(C1)에 할당된 노드가 문자열의 맨 마지막 문자를 가리키는 노드인 리프(leaf) 노드이거나 사용되지 않는 노드인지(C1==NULL) 여부를 판단하고, 증가된 변수(C1)에 할당된 노드가 사전 트리에서 문자열의 맨 마지막 문자를 가리키는 노드인 리프(leaf) 노드이거나 사용되지 않는 노드가 아닌 경우에는 신규 변수(C1)가 다음에 생성될 문자열의 부호어에 할당되기 위해 증가되는 단계로 돌아간다(S307).Then, it is determined whether the node assigned to the increased new variable C1 is a leaf node which is the node pointing to the last character of the string or an unused node (C1 == NULL), and the increased variable ( If the node assigned to C1) is a leaf node, which is the node that points to the last character of the string in the dictionary tree, or is not an unused node, a new variable (C1) is assigned to the codeword of the next string to be created. Return to the step is increased (S307).

만약, 증가된 변수(C1)에 할당된 노드가 문자열의 맨 마지막 문자를 가리키는 노드인 리프(leaf) 노드이거나 사용되지 않는 노드인 경우에는, 사전 트리에서변수(C1)를 제거하여 새로운 문자열 부호어가 할당될 수 있도록 준비한다(S308).If the node assigned to the incremented variable (C1) is a leaf node, which is the node that points to the last character of the string, or is an unused node, the new string codeword is removed by removing the variable (C1) from the dictionary tree. Prepare to be allocated (S308).

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지로 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited to the drawings shown.

본 발명은 단말기의 메시지 처리 모듈에 있어서, 2바이트 문자(한글, 중문 등)의 메시지를 압축하여 저장함으로써, 저장 공간을 절약할 수 있는 장점이 있다. 즉, 영문과 한글이 혼합된 텍스트 문서를 사용하여 본 발명의 방법을 적용하면, 평균 압축율이 기존의 방법에 비하여 대략 22% 정도의 개선 효과를 가져오는 장점이 있다.The present invention has an advantage in that a message processing module of a terminal can save a storage space by compressing and storing a message of two-byte characters (Korean, Chinese, etc.). That is, when the method of the present invention is applied using a text document mixed with English and Korean, the average compression ratio is improved by about 22% compared to the conventional method.

Claims

Generating a plurality of compressible codewords based on the frequency and storing the plurality of compressible codewords in a basic dictionary table, and initializing a variable representing a next codeword to be registered;

An input step of identifying and accepting whether the input message data is a 2-byte character;

Compare whether or not the input data is included in the compressible codeword, and if it is included in the compressible codeword, find and output the corresponding code through a mapping process in the dictionary table, and if the corresponding code is not in the dictionary Registering in advance;

Determining whether it is the end of the data, and if the input of the data is not finished, returning to the step of sequentially inputting message data; And

Steps to perform flush process when data is finished

Including,

The number of bits of the corresponding coded code of the compressible codeword is output as log ₂ (C1 + 1) -1 bit if the compressible codeword is smaller than the threshold that can reduce the bit, and if the codeword is larger than the threshold log ₂ (C1 + 1) bits, where C1 is the number of codewords currently allocated

2 byte character data compression method.

The method of claim 1,

Storing an additional compressible codeword in an additional dictionary table including the base dictionary table with reference to the initialized variable, and reinitializing a variable representing the next codeword to be registered

The two-byte character data compression method further comprising.

The method of claim 1,

In order to obtain the compressible codeword, the appearance frequency of 2350 characters of the completed Korean characters in Korean and English mixed files is obtained, sorted and discussed, and 455 to 485 characters frequently used are registered as codewords.

2 byte character data compression method.

The method of claim 1,

By using the combination of two or more bytes, such as Hangul, in Chinese, the frequency is measured and only the characters that appear frequently are registered as basic codes in advance.

2 byte character data compression method.

The method of claim 1,

Finding and outputting the corresponding code through the mapping process in the dictionary table,

Reading a first byte of input data;

Determining whether the first byte is a first allocation range;

If the first byte is a first allocation range, reading a second byte of input data;

If the first byte is not the first allocation range, determining a corresponding ASCII code because it is not a complete Hangul;

Determining whether the second byte is a second allocation range;

If the second byte is the second allocation range, determining whether input data is included in the dictionary table;

If the second byte is not the second allocation range, determining a corresponding ASCII code because it is not a complete Hangul;

Determining a corresponding code value when the input data is included in the dictionary table; And

If the input data is not included in the dictionary table, determining the corresponding ASCII code since the occurrence frequency is not high.

Double-byte character data compression method comprising a.

The method of claim 5,

The first allocation range is from hexadecimal B0 to C8

2 byte character data compression method.

The method of claim 5,

The second allocation range is from hexadecimal A1 to FE

2 byte character data compression method.