CN100474781C - Compression method of two-byte character data - Google Patents

Compression method of two-byte character data Download PDF

Info

Publication number
CN100474781C
CN100474781C CNB2003101242211A CN200310124221A CN100474781C CN 100474781 C CN100474781 C CN 100474781C CN B2003101242211 A CNB2003101242211 A CN B2003101242211A CN 200310124221 A CN200310124221 A CN 200310124221A CN 100474781 C CN100474781 C CN 100474781C
Authority
CN
China
Prior art keywords
byte
data
code
compressible
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2003101242211A
Other languages
Chinese (zh)
Other versions
CN1536768A (en
Inventor
赵畇衍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pan Thai Co ltd
Original Assignee
Pantech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pantech Co Ltd filed Critical Pantech Co Ltd
Publication of CN1536768A publication Critical patent/CN1536768A/en
Application granted granted Critical
Publication of CN100474781C publication Critical patent/CN100474781C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明提供了一种在终端机的信息处理模块中,以2字节字符(朝鲜字符、汉语)为单位对信息进行压缩后再存储,从而可以减少存储空间的2字节字符数据的压缩方法。本发明的2字节字符数据的压缩方法的特征在于包括:根据频率数生成多个可压缩代码字,存储在基本词典表中,将登记的表示下一个代码字的变量初始化的步骤;识别输入的信息数据是否是2字节字符,并接收的输入步骤;比较输入的数据是否包含在该可压缩代码字中,当包含在该可压缩的代码字中时,从该词典表中经过映射过程搜索符合代码并输出,当词典中没有该符合代码时,将其登记在词典中的步骤;判断是否是数据的尾数,当数据没有输入完时,返回依次输入信息数据的输入步骤;以及当是数据的尾数时,进行清除过程的步骤,当编码该可压缩代码字得到的符合代码的位数比该可压缩代码字可以降低位的临界值小时,以log2(C1+1)-1位输出,当符合代码字比临界值大时,以log2(C1+1)位输出,该C1是当前被赋值的代码字数。

Figure 200310124221

The present invention provides a method for compressing information in units of 2-byte characters (Korean characters, Chinese) before storing in an information processing module of a terminal, thereby reducing storage space for 2-byte character data. . The compression method of 2-byte character data of the present invention is characterized in that comprising: generate a plurality of compressible codewords according to the frequency number, store in the basic dictionary table, the step of the variable initialisation of the representation next codeword of registering; Identify input Whether the information data is a 2-byte character, and receive the input step; compare whether the input data is contained in the compressible code word, and when contained in the compressible code word, go through the mapping process from the dictionary table Searching for the matching code and outputting it, when there is no such matching code in the dictionary, registering it in the dictionary; judging whether it is the mantissa of the data, when the data has not been input, returning to the input step of inputting the information data in sequence; and when it is During the mantissa of data, carry out the step of clearing process, when the number of digits that conforms to the code obtained by encoding this compressible code word is smaller than the critical value that this compressible code word can reduce bit, take log 2 (C1+1)-1 bit Output, when the matching code word is larger than the critical value, it is output with log 2 (C1+1) bits, where C1 is the number of code words currently assigned.

Figure 200310124221

Description

2字节字符数据的压缩方法 Compression method for 2-byte character data

技术领域 technical field

本发明涉及一种2字节字符数据的压缩方法,更具体地说,涉及一种为了减少移动通信终端机中的SMS(Short Message Service)和EMS(Enhanced Messaging Service)的信息存储空间,利用2字节字符压缩算法的2字节字符数据的压缩方法。The present invention relates to a kind of compression method of 2-byte character data, relate to a kind of in order to reduce the information storage space of SMS (Short Message Service) and EMS (Enhanced Messaging Service) in mobile communication terminal, utilize 2 Compression method for 2-byte character data of byte character compression algorithm.

背景技术 Background technique

一般情况下,客户利用移动通信终端机的信息发送接收功能(SMS、EMS),进行各式各样的信息交换。大部分的移动通信终端机几乎不对这种信息进行压缩,进行部分压缩的终端机也只是利用适合英文字母的压缩算法。Generally, customers use the information sending and receiving functions (SMS, EMS) of mobile communication terminals to exchange various information. Most mobile communication terminals hardly compress such information, and terminals that perform partial compression only use compression algorithms suitable for English letters.

不过,当采用这种压缩算法时,象朝鲜字符和汉语这样的语言,因为大多具有冗长性的特点,所以相对地压缩效率低,并且需要更多的内存,存在不能有效地降低存储空间的问题。However, when using this compression algorithm, languages such as Korean characters and Chinese are relatively redundant because most of them have the characteristics of redundancy, so the compression efficiency is relatively low, and more memory is required, and there is a problem that the storage space cannot be effectively reduced. .

[专利文献1]日本特開平2-255977(日本专利第1990-255977号公告)[Patent Document 1] Japanese Patent Laid-Open No. 2-255977 (Japanese Patent Publication No. 1990-255977)

[专利文献2]日本特開平9-069785(日本专利第1997-069785号公告)[Patent Document 2] Japanese Patent Laid-Open No. 9-069785 (Japanese Patent No. 1997-069785 publication)

发明内容 Contents of the invention

本发明克服了上述不足,其目的在于提供一种在终端机的信息处理模块中,以2字节字符(朝鲜字符、汉语)为单位对信息进行压缩并存储,从而可以减少存储空间的2字节字符数据的压缩方法。The present invention overcomes the above-mentioned disadvantages, and its purpose is to provide a 2-byte character (Korean character, Chinese) to compress and store information in the information processing module of the terminal, thereby reducing storage space. Compression method for byte character data.

为了实现上述目的,本发明的2字节字符数据的压缩方法的特征在于包括:根据频率数生成多个可压缩代码字,并存储在基本词典表中,将登记的表示下一个代码字的变量初始化的步骤;参照被初始化了的变量,将追加的可压缩代码字存储在包含所述基本词典表在内的附加词典表中,将登记的表示下一个代码字的变量重新初始化的步骤;识别输入的信息数据是否是2字节字符,并接收的输入步骤;比较输入的数据是否包含在该可压缩代码字中,当包含在该可压缩的代码字中时,从该词典表中经过映射过程搜索符合代码并输出,当词典中没有该符合代码时,将其登记在词典中的步骤;判断是否是数据的尾数,当数据没有输入完时,返回依次输入信息数据的输入步骤;以及当是数据的尾数时,进行清除过程的步骤,所述清除过程是指在存储器存储方法中,以8位或16位存储数据,但为了被压缩了的数据具有可变长度的位数,当最后存储的数据不是8位或16位的时候,将最后剩下的位用0填满的过程;当将该可压缩代码字编码得到的符合代码的位数比该可压缩代码字可以降低位的临界值小时,以log2(C1+1)-1位输出,当符合代码字的位数比临界值大时,以log2(C1+1)位输出,该C1是当前被赋值的代码字的数。In order to achieve the above object, the compression method of the 2-byte character data of the present invention is characterized in that comprising: generating a plurality of compressible code words according to the frequency number, and storing them in the basic dictionary table, and registering the variable representing the next code word The step of initialization; referring to the initialized variable, storing the additional compressible codeword in the additional dictionary table including the basic dictionary table, and reinitializing the registered variable representing the next codeword; identifying Whether the input information data is a 2-byte character, and receive the input step; compare whether the input data is contained in the compressible code word, and when contained in the compressible code word, it is mapped from the dictionary table The process searches for the matching code and outputs it. When there is no such matching code in the dictionary, register it in the dictionary; judge whether it is the mantissa of the data, when the data has not been input, return to the input step of inputting the information data in sequence; and when When it is the mantissa of the data, the step of performing a clearing process, the clearing process refers to storing data in 8 bits or 16 bits in the memory storage method, but for the compressed data to have a variable length of bits, when the last When the stored data is not 8-bit or 16-bit, the process of filling the last remaining bit with 0; when the number of digits of the code obtained by encoding the compressible codeword is lower than the number of digits that can be reduced by the compressible codeword When the critical value is small, it will be output with log2(C1+1)-1 bits. When the number of digits corresponding to the code word is greater than the critical value, it will be output with log2(C1+1) bits. The C1 is the number of code words currently assigned .

本发明的的有益效果是,在终端机的信息处理模块中,通过压缩2字节字符(朝鲜字符、汉语等)的信息并进行存储,可以减少存储空间。也就是说,利用本发明的方法压缩英语和朝鲜字符混合的文本文件时,与现有的压缩方法相比,平均压缩率具有大约22%左右的改善效果。The beneficial effect of the present invention is that, in the information processing module of the terminal, by compressing and storing information of 2-byte characters (Korean characters, Chinese, etc.), the storage space can be reduced. That is to say, when using the method of the present invention to compress a text file with mixed English and Korean characters, the average compression rate has an improvement effect of about 22% compared with the existing compression method.

附图说明 Description of drawings

图1是本发明一个实施例中的2字节字符数据的压缩方法的操作流程图。FIG. 1 is an operation flowchart of a method for compressing 2-byte character data in one embodiment of the present invention.

图2是对在本发明的一个实施例的2字节字符数据的压缩方法中,从该词典表中经过映射过程搜索符合代码并输出的步骤(压缩步骤)进行详细说明的操作流程图。Fig. 2 is to in the compression method of 2 byte character data of an embodiment of the present invention, from this dictionary table through mapping process search accords with code and the step (compression step) that output is described in detail operation flowchart.

图3是对在本发明的一个实施例的2字节字符数据的压缩方法中管理该符合代码词典的词典生成/管理步骤进行详细说明的操作流程图。FIG. 3 is an operation flowchart explaining in detail the dictionary generation/management steps for managing the code-compliant dictionary in the compression method of 2-byte character data according to an embodiment of the present invention.

具体实施方式 Detailed ways

为了方便说明,本发明的2字节字符数据的压缩方法以韩国语为例进行说明。但同样适用于诸如汉语、日语等的以2字节标记的语言。因此,在本实施例中,仅对韩国语的压缩情况进行说明,但本发明并不仅限于韩国语,这对本领域的技术人员来说是显而易见的。For the convenience of description, the method for compressing 2-byte character data in the present invention is described by taking Korean as an example. But the same applies to languages marked in 2 bytes such as Chinese, Japanese, etc. Therefore, in this embodiment, only the compression of the Korean language is described, but the present invention is not limited to the Korean language, which is obvious to those skilled in the art.

以下对照附图对本发明的实施例进行说明。Embodiments of the present invention will be described below with reference to the accompanying drawings.

图1是本发明的一个实施例中的2字节字符数据的压缩方法的操作流程图,以下将对与此相关的情况进行说明。FIG. 1 is an operation flow chart of a method for compressing 2-byte character data in an embodiment of the present invention, and the related situation will be described below.

首先,初始化最大字符串数(N7)、代码字数(N2)、初始词典条目号码(N5)等,将频率数高的字符收藏在基本词典表中,并将登记的表示下一个代码字的变量C1初始化(S101),用于字符压缩的代码字的构成如下表所述。这里,为了找到字符压缩所需要的代码字,从朝鲜字符和英语混合文件中找出完成型朝鲜字符2350个字的出现频率后,将其排列并观察,将其中2%经常使用的470个字作为代码字登记。这种情况下,该2%的470个字符整体出现频率达到85%以上。因此,该变量C1的初始化值能够为471。First, initialize the maximum number of character strings (N7), the number of code words (N2), the initial dictionary entry number (N5), etc., store the characters with high frequency numbers in the basic dictionary table, and store the registered variable representing the next code word C1 initialization (S101), the composition of the code word used for character compression is described in the following table. Here, in order to find the code words needed for character compression, after finding out the occurrence frequency of 2350 words of complete Korean characters from the Korean character and English mixed file, they are arranged and observed, and 470 words that are 2% of them are frequently used Register as a code word. In this case, the overall occurrence frequency of the 2% of the 470 characters reaches more than 85%. Therefore, the initialization value of the variable C1 can be 471.

表1:Table 1:

  0~255 ASCII(美国信息交换标准码) 256~725 朝鲜字符代码(470个字) 726~1023 10位编码 1024~2047 11位编码 2048~4095 12位编码 0~255 ASCII (American Standard Code for Information Interchange) 256~725 Korean character code (470 characters) 726~1023 10-bit encoding 1024~2047 11-bit encoding 2048~4095 12-bit encoding

接着,对照被初始化的变量,将追加的可压缩的代码字存储在包含该基本词典表在内的附加词典表中,重新初始化登记的表示下一个代码字的变量C1(S102)。在此,编码可压缩代码字的符合代码的位数取决于下面的公式。Next, an additional compressible codeword is stored in the additional dictionary table including the basic dictionary table in comparison with the initialized variable, and the registered variable C1 indicating the next codeword is reinitialized (S102). Here, the number of bits of the conforming code encoding the compressible codeword depends on the following formula.

公式1:(C1+lim)≤2log(C1+1)-1Formula 1: (C1+lim)≤2 log(C1+1) -1

公式2:lim=C3-C1-1Formula 2: lim=C3-C1-1

公式3:C3=2log(C1+1) Formula 3: C3=2 log(C1+1)

在此,该C1是指当前被赋值的代码字数,lim是指代码字能降低位的临界值。因此,将代码字转换为位列的时候,如果代码字比所确定的临界值(lim)小,则以log2(C1+1)-1位输出,如果符合代码字比临界值大,则以log2(C1+1)位输出。Here, the C1 refers to the number of code words currently assigned, and lim refers to the critical value of a code word that can reduce bits. Therefore, when the code word is converted into a bit sequence, if the code word is smaller than the determined critical value (lim), it will be output in log 2 (C1+1)-1 bits, and if the code word is larger than the critical value, then Output in log 2 (C1+1) bits.

例如,该C1为750时,lim=(1024-750-1)=273,所以,压缩时代码字位于0至273之间,以9位编码输出,如果压缩时代码字位于274至749之间,各代码字再加上274,以10位编码输出For example, when the C1 is 750, lim=(1024-750-1)=273, so the code word is between 0 and 273 during compression and output with 9-bit encoding, if the code word is between 274 and 749 during compression , add 274 to each code word, and output in 10-bit code

解除压缩时,以9位读出代码字位,如果该读出的值比274小,则将其值作为代码字代码读取,如果该读出的值比274大,则重新以10位读出,将减去274的值作为代码字代码读出。下列的表2以上述的方式表示本发明的词典表构造。When uncompressing, read out the code word bit with 9 bits, if the read value is smaller than 274, then read its value as the code word code, if the read value is greater than 274, then read it again with 10 bits out, the value subtracted by 274 is read out as a code word code. The following Table 2 shows the structure of the dictionary table of the present invention as described above.

表2:Table 2:

  可压缩代码字 被编码的代码 10进制 0 000000000 0 1 000000001 1 2 000000010 2 . . . . . . 273 100010001 273 274 1000100100 548(274+274) 275 1000100101 549(274+275) . . . . . . 749 1111111111 1023(274+749) compressible codeword encoded code 10 hex 0 000000000 0 1 000000001 1 2 000000010 2 . . . . . . 273 100010001 273 274 1000100100 548(274+274) 275 1000100101 549(274+275) . . . . . . 749 1111111111 1023(274+749)

其后,依次输入信息数据。比较输入的数据是否包含在该可压缩的代码字中,当包含在该可压缩的代码字中时,从该词典表中经过映射过程,搜索符合代码并输出(S103)。然后,确认该符合代码是否存在于词典中,当词典中没有时,进行在词典中登记的词典生成步骤(S104)。Thereafter, information data are input sequentially. Compare whether the input data is included in the compressible code word, if it is included in the compressible code word, go through the mapping process from the dictionary table, search for the matching code and output it (S103). Then, it is checked whether the matching code exists in the dictionary, and if it is not in the dictionary, a dictionary creation step of registering in the dictionary is performed (S104).

之后,判断是否是数据的尾数,当不是数据的尾数时,返回到依次输入信息数据的步骤(S105)。Afterwards, it is judged whether it is the end of the data, and if it is not the end of the data, it returns to the step of sequentially inputting information data (S105).

如果是数据的尾数,则进行清除过程(Flush)(S106)。在此,所说的该清除过程是指在存储器存储方法中,以8位或16位存储数据,但为了被压缩了的数据具有可变长度的位数,当最后存储的数据不是8位或16位的时候,将最后剩下的位用0填满的过程。If it is the mantissa of the data, a flushing process (Flush) is performed (S106). Here, the clearing process refers to storing data in 8-bit or 16-bit in the memory storage method, but for the compressed data to have a variable-length number of bits, when the last stored data is not 8-bit or 16-bit For 16 bits, the process of filling the last remaining bits with 0s.

图2是对在本发明的一个实施例中的2字节字符数据的压缩方法中,从该词典表中经过映射过程,搜索符合代码并输出的步骤(压缩步骤)进行详细说明的操作流程图,与此相关的说明如下所述。Fig. 2 is to in the compression method of 2 byte character data in one embodiment of the present invention, from this dictionary table, through mapping process, the operation flow diagram that searches the step (compression step) that matches code and outputs in detail , instructions for this are described below.

首先读出输入数据的第一个字节(S201)。First, the first byte of input data is read (S201).

其后判断该第一个字节是否在第1赋值范围内(S202)。这里,当是完成型朝鲜字符的时候,因为第一个字节赋有从16进制的B0到C8的25个数字,所以该第1赋值范围可以是从16进制的B0到C8。Then judge whether the first byte is in the first assignment range (S202). Here, when it is a complete Korean character, since the first byte is assigned 25 numbers from B0 to C8 in hexadecimal, the first assignment range can be from B0 to C8 in hexadecimal.

如果该第一个字节位于第1赋值范围内,读出输入数据的第二个字节(S203)。If the first byte is within the first assigned range, the second byte of the input data is read (S203).

另一方面,如果该第一个字节不在第1赋值范围内,因为不是完成型的朝鲜字符,所以确定是美国信息交换标准码中的字符(S207)。On the other hand, if the first byte is not within the first assignment range, it is determined to be a character in the American Standard Code for Information Interchange (S207) because it is not a complete Korean character.

其后判断该第二个字节是否在第2赋值范围内(S204)。这里,当是完成型朝鲜字符的时候,因为第二个字节赋有从16进制的A1到FE的94个数字,所以该第2赋值范围可以是从16进制的A1到FE。Then judge whether the second byte is in the second assignment range (S204). Here, when it is a complete Korean character, since the second byte is assigned 94 numbers from A1 to FE in hexadecimal, the second assignment range can be from A1 to FE in hexadecimal.

如果该第二个字节位于该第2赋值范围内,判断输入数据是否包含在该词典表中(S205)。If the second byte is within the second assignment range, it is judged whether the input data is included in the dictionary table (S205).

另一方面,如果该第二个字节不在第2赋值范围内,因为不是完成型的朝鲜字符,所以确定是美国信息交换标准码中的字符On the other hand, if the second byte is not within the range of the second assignment, because it is not a complete Korean character, it is determined to be a character in the American Standard Code for Information Interchange

(S207)。(S207).

如果输入的数据包含在该词典表中,确定是符合代码值(S206)。If the input data is contained in the dictionary table, it is determined that the code value is met (S206).

另一方面,如果输入的数据没有包含在该词典表中,因为不是出现频率高的朝鲜字符,所以确定是美国信息交换标准码中的字符(S207)。On the other hand, if the input data is not included in the dictionary table, it is determined to be a character in ASI since it is not a Korean character with a high frequency of appearance (S207).

图3是对在本发明的一个实施例的2字节字符数据的压缩方法中检查该符合代码是否存在于词典中,如果词典中没有就登记在词典中,并除去登记在词典中的不经常使用的代码的词典管理步骤进行详细说明的操作流程图,与此相关的说明如下所述。Fig. 3 checks whether this conforming code exists in the dictionary in the compression method of 2 byte character data of an embodiment of the present invention, if not just register in the dictionary in the dictionary, and remove the infrequent ones registered in the dictionary The operation flow chart that explains the dictionary management procedure of the code used in detail, and the explanation related to it is as follows.

首先判断该代码字的字符串(长度)是否超过最大字符串数(N7),如果该代码字的字符串超过最大字符串数(N7)则终止词典管理步骤(S301)。First judge whether the string (length) of this code word exceeds maximum string number (N7), if the string of this code word exceeds maximum string number (N7), then terminate dictionary management step (S301).

如果该代码字的字符串没有超过最大字符串数(N7),则判断是否存在于该词典表中,当存在该词典表中时,则终止词典管理步骤(S302)。If the character string of this codeword does not exceed maximum character string number (N7), then judge whether to exist in this dictionary table, when exist in this dictionary table, then terminate the dictionary management step (S302).

如果词典表中不存在,向新变量C1赋值该字符串(S303)。If it does not exist in the dictionary table, assign the character string to the new variable C1 (S303).

接着,新变量C1为了被接着生成的字符串的代码字赋值而增加其值(S304)。Next, the value of the new variable C1 is increased to be assigned to the code word of the next generated character string (S304).

接着,判断增加的变量C1是否大于代码字数(N2)(S305)。Next, it is judged whether the increased variable C1 is greater than the number of code words (N2) (S305).

如果增加的变量C1大于代码字数(N2),向增加的变量C1赋值词典条目号码(N5),如果增加的变量C1小于代码字数(N2)时,不向其赋值词典条目号码(N5)(S306)。If the increased variable C1 is greater than the number of code words (N2), the dictionary entry number (N5) is assigned to the increased variable C1, if the increased variable C1 is less than the number of code words (N2), the dictionary entry number (N5) is not assigned to it (S306 ).

然后,判断赋值给增加的新变量C1的节点是否是作为表示字符串末尾字符的节点的叶(leaf)节点或是否是不被使用的节点(C1==NULL),当赋值给增加的新变量的节点不是表示词典条目中字符串末尾字符的节点的叶(leaf)节点或不是不被使用的节点时,返回到新变量C1为了被接着生成的字符串的代码字赋值而增加其值的步骤(S307)。Then, judge whether the node assigned to the increased new variable C1 is the leaf (leaf) node as the node representing the end character of the character string or whether it is an unused node (C1==NULL), when assigned to the increased new variable When the node is not the leaf node of the node representing the end character of the character string in the dictionary entry or is not an unused node, return to the step of increasing the value of the new variable C1 in order to be assigned a value by the code word of the next generated character string (S307).

如果赋值给增加的变量C1的节点是表示字符串末尾字符的节点的叶(leaf)节点或是不被使用的节点时,则从词典条目中除去变量C1,准备赋值新的字符串的代码字(S308)。If the node assigned to the increased variable C1 is a leaf (leaf) node representing a character string end character or an unused node, the variable C1 is removed from the dictionary entry, and the code word of a new character string is ready to be assigned (S308).

本发明并不限于上述实施例所公开的范围。在本发明的技术主题内可以进行各种改进、变更,这些改进、变更也从属于本发明的技术范畴,受本发明保护。The present invention is not limited to the scope disclosed in the above embodiments. Various improvements and changes can be made within the technical subject matter of the present invention, and these improvements and changes also belong to the technical scope of the present invention and are protected by the present invention.

Claims (8)

1.一种2字节字符数据的压缩方法,其特征在于包括:1. a compression method of 2 byte character data, is characterized in that comprising: 根据频率数生成多个可压缩代码字,存储在基本词典表中,将登记的表示下一个代码字的变量初始化的步骤;Generate a plurality of compressible codewords according to the frequency number, store in the basic dictionary table, the step of initializing the variable representing the next codeword registered; 参照被初始化了的变量,将追加的可压缩代码字存储在包含所述基本词典表在内的附加词典表中,将登记的表示下一个代码字的变量重新初始化的步骤;Referring to the initialized variable, storing the additional compressible codeword in the additional dictionary table including the basic dictionary table, and reinitializing the registered variable representing the next codeword; 识别输入的信息数据是否是2字节字符,并接收的输入步骤;Identify whether the input information data is a 2-byte character, and receive the input step; 比较输入的数据是否包含在所述可压缩代码字中,当包含在所述可压缩的代码字中时,从所述词典表中经过映射过程搜索符合代码并输出,当词典中没有所述符合代码时,将其登记在词典中的步骤;Compare whether the input data is included in the compressible code word, if it is included in the compressible code word, search the matching code from the dictionary table through the mapping process and output it, if there is no matching code in the dictionary When code, the step of registering it in the dictionary; 判断是否是数据的尾数,当数据没有输入完时,返回依次输入信息数据的输入步骤;以及Judging whether it is the mantissa of the data, when the data has not been input, return to the input step of sequentially inputting information data; and 当是数据的尾数时,进行清除过程的步骤,所述清除过程是指在存储器存储方法中,以8位或16位存储数据,但为了被压缩了的数据具有可变长度的位数,当最后存储的数据不是8位或16位的时候,将最后剩下的位用0填满的过程;When it is the mantissa of the data, the step of performing a clearing process, the clearing process refers to storing data with 8 bits or 16 bits in the memory storage method, but for the compressed data to have a variable length of bits, when When the last stored data is not 8-bit or 16-bit, the process of filling the last remaining bit with 0; 当编码所述可压缩代码字得到的符合代码的位数比所述可压缩代码字可以降低位的临界值小时,以log2(C1+1)-1位输出,当符合代码字的位数比临界值大时,以log2(C1+1)位输出,所述C1是当前被赋值的代码字数。When the number of digits of the code obtained by encoding the compressible codeword is smaller than the critical value that the compressible codeword can reduce, output with log 2 (C1+1)-1 bits, when the number of digits of the codeword is met When it is greater than the critical value, it is output in log 2 (C1+1) bits, where C1 is the number of code words currently assigned. 2.根据权利要求1所述的2字节字符数据的压缩方法,其特征在于:2. the compression method of 2 byte character data according to claim 1, is characterized in that: 为了找到所述可压缩代码字,从2字节字符和1字节字符的混合文件中找到完成型的所述2字节字符的出现频率后,将其排列并分析,将其中经常使用的字符作为代码字登记。In order to find the compressible codeword, after finding the occurrence frequency of the 2-byte characters of the completion type from the mixed file of 2-byte characters and 1-byte characters, arrange and analyze them, and use the frequently used characters Register as a code word. 3.根据权利要求1所述的2字节字符数据的压缩方法,其特征在于:3. the compression method of 2 byte character data according to claim 1, is characterized in that: 从利用2字节以上的组合表现的字符开始测量频率数,仅将经常使用的字符作为基本代码字登记在词典中。Frequency numbers are measured from characters represented by combinations of 2 bytes or more, and only frequently used characters are registered in the dictionary as basic code words. 4.根据权利要求2所述的2字节字符数据的压缩方法,其特征在于:4. the compression method of 2 byte character data according to claim 2, is characterized in that: 所述2字节字符是汉语,所述1字节字符是英文字符。The 2-byte characters are Chinese, and the 1-byte characters are English characters. 5.根据权利要求2所述的2字节字符数据的压缩方法,其特征在于:5. the compression method of 2 byte character data according to claim 2, is characterized in that: 所述2字节字符是韩国语,所述1字节字符是英文字符。The 2-byte characters are Korean and the 1-byte characters are English characters. 6.根据权利要求1所述的2字节字符数据的压缩方法,其特征在于:6. the compression method of 2 byte character data according to claim 1, is characterized in that: 从所述词典表中经过映射过程搜索符合代码并输出的步骤包括:The step of searching and outputting matching codes through a mapping process from the dictionary table includes: 读出输入数据第一个字节的步骤;The step of reading the first byte of the input data; 判断所述第一个字节是否位于第1赋值范围的步骤;The step of judging whether the first byte is located in the first assignment range; 当所述第一个字节位于第1赋值范围时,读出输入数据的第二个字节的步骤;When the first byte is in the first assignment range, the step of reading the second byte of the input data; 当所述第一个字节没有位于所述第1赋值范围时,因为不是完成型的朝鲜字符,所以确定是美国信息交换标准码中的字符的步骤;When the first byte is not located in the first assignment range, because it is not a complete North Korean character, the step of determining that it is a character in the American Standard Code for Information Interchange; 判断所述第二个字节是否位于第2赋值范围的步骤;The step of judging whether the second byte is located in the second assignment range; 当所述第二个字节位于所述第2赋值范围时,判断输入的数据是否包含在所述词典表中的步骤;When the second byte is located in the second assignment range, a step of judging whether the input data is included in the dictionary table; 当所述第二个字节没有位于所述第2赋值范围时,因为不是完成型的朝鲜字符,所以确定是美国信息交换标准码中的字符的步骤;When the second byte is not located in the second assignment range, because it is not a complete Korean character, the step of determining that it is a character in the American Standard Code for Information Interchange; 当输入数据包含在所述词典表中时,确定是符合代码值的步骤;以及When the input data is contained in said dictionary table, the step of determining a code value is met; and 当输入数据没有包含在所述词典表中时,因为不是出现频率高的朝鲜字符,所以确定是美国信息交换标准码中的字符的步骤。When the input data is not included in the dictionary table, since it is not a Korean character with a high frequency of appearance, it is determined that it is a character in the American Standard Code for Information Interchange. 7.根据权利要求4所述的2字节字符数据的压缩方法,其特征在于:7. the compression method of 2 byte character data according to claim 4, is characterized in that: 所述第1赋值范围是从16进制的B0到C8。The first assignment range is from B0 to C8 in hexadecimal. 8.根据权利要求4所述的2字节字符数据的压缩方法,其特征在于:8. the compression method of 2 byte character data according to claim 4, is characterized in that: 所述第2赋值范围是从16进制的A1到FE。The second assignment range is from A1 to FE in hexadecimal.
CNB2003101242211A 2003-04-08 2003-12-31 Compression method of two-byte character data Expired - Fee Related CN100474781C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR10-2003-0021924A KR100494876B1 (en) 2003-04-08 2003-04-08 Data compression method for multi-byte character language
KR1020030021924 2003-04-08
KR10-2003-0021924 2003-04-08

Publications (2)

Publication Number Publication Date
CN1536768A CN1536768A (en) 2004-10-13
CN100474781C true CN100474781C (en) 2009-04-01

Family

ID=34374057

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101242211A Expired - Fee Related CN100474781C (en) 2003-04-08 2003-12-31 Compression method of two-byte character data

Country Status (2)

Country Link
KR (1) KR100494876B1 (en)
CN (1) CN100474781C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100755533B1 (en) * 2005-07-25 2007-09-06 주식회사 팬택 Character set generation method and apparatus
KR101386169B1 (en) * 2007-08-09 2014-04-17 삼성전자주식회사 Apparatus and method for compression and restoration SMS
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 A Chinese data compression and decompression method and related equipment
EP2781072B1 (en) 2011-11-15 2015-10-21 Citrix Systems Inc. Systems and methods for compressing short text by dictionaries in a network
US9300322B2 (en) * 2014-06-20 2016-03-29 Oracle International Corporation Encoding of plain ASCII data streams
US9779071B2 (en) * 2015-07-13 2017-10-03 Fujitsu Limited Non-transitory computer-readable recording medium, encoding method, encoding apparatus, decoding method, and decoding apparatus
CN112416315B (en) * 2020-06-16 2024-05-14 上海哔哩哔哩科技有限公司 Compression method of CSS code, electronic device and storage medium
CN114880523B (en) * 2022-04-27 2025-05-16 深圳市优必选科技股份有限公司 String processing method, device, electronic device and storage medium
KR102633001B1 (en) * 2023-03-27 2024-02-05 주식회사 무브먼츠 Method for implementing underground facilities as ar in an offline environment using combined data precessing of qr code and nfc

Also Published As

Publication number Publication date
KR20040087503A (en) 2004-10-14
CN1536768A (en) 2004-10-13
KR100494876B1 (en) 2005-06-14

Similar Documents

Publication Publication Date Title
US5635932A (en) Lempel-ziv compression with expulsion of dictionary buffer matches
US6906644B2 (en) Encoding and decoding apparatus with matching length means for symbol strings
US5663721A (en) Method and apparatus using code values and length fields for compressing computer data
US5001478A (en) Method of encoding compressed data
CN100495318C (en) Integer data compression method, device and decompression method, device
US9223765B1 (en) Encoding and decoding data using context model grouping
US6100824A (en) System and method for data compression
US7460033B2 (en) Method for creating an in-memory physical dictionary for data compression
KR100353171B1 (en) Method and apparatus for performing adaptive data compression
JPH07104971A (en) Compression method using small-sized dictionary applied to network packet
CN100474781C (en) Compression method of two-byte character data
US7864085B2 (en) Data compression method and apparatus
US9236881B2 (en) Compression of bitmaps and values
JPS6356726B2 (en)
CN100578943C (en) An optimized Huffman decoding method and device
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
US20200084596A1 (en) Efficient short message compression
US5010344A (en) Method of decoding compressed data
US6240213B1 (en) Data compression system having a string matching module
CN104077272B (en) A kind of method and apparatus of dictionary compression
US20090083267A1 (en) Method and System for Compressing Data
KR100399495B1 (en) Method to convert unicode text to mixed codepages
CN112506876B (en) Lossless compression query method supporting SQL query
CN110287147B (en) Character string sorting method and device
Zobel et al. Compact in-memory models for compression of large text databases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1070189

Country of ref document: HK

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1070189

Country of ref document: HK

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Seoul, South Kerean

Patentee after: Pantech property management Co.

Address before: Seoul, South Kerean

Patentee before: PANTECH Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20161026

Address after: Seoul, South Kerean

Patentee after: PANTECH CO.,LTD.

Address before: Seoul, South Kerean

Patentee before: Pantech property management Co.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200609

Address after: Seoul, South Kerean

Patentee after: Pan Thai Co.,Ltd.

Address before: Seoul, South Kerean

Patentee before: Pantech Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090401