CN100474781C - Compression method of two-byte character data - Google Patents

Compression method of two-byte character data Download PDF

Info

Publication number
CN100474781C
CN100474781C CNB2003101242211A CN200310124221A CN100474781C CN 100474781 C CN100474781 C CN 100474781C CN B2003101242211 A CNB2003101242211 A CN B2003101242211A CN 200310124221 A CN200310124221 A CN 200310124221A CN 100474781 C CN100474781 C CN 100474781C
Authority
CN
China
Prior art keywords
byte
character
data
code word
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2003101242211A
Other languages
Chinese (zh)
Other versions
CN1536768A (en
Inventor
赵畇衍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pan Thai Co ltd
Original Assignee
Pantech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pantech Co Ltd filed Critical Pantech Co Ltd
Publication of CN1536768A publication Critical patent/CN1536768A/en
Application granted granted Critical
Publication of CN100474781C publication Critical patent/CN100474781C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method for compressing 2-byte character data is provided to save a storage space by compressing a 2-byte text message and storing the compressed data. The maximum number of character strings, the number of code words and initial dictionary entry number are initialized, characters with high frequency are stored in a basic dictionary table, and a variable indicating a next codeword to be registered is initialized(S101). An additional compression-available codeword is stored in a supplementary dictionary table including the basic dictionary table, and the variable indicating the next codeword to be registered is re-initialized(S102). Message data is sequentially inputted and if the inputted data is included in the compression-available codeword, a corresponding code is searched from the dictionary table and outputted(S103). It is checked whether the corresponding code is in the dictionary(S104). It is determined whether it is the end of the data(S105). If so, a flushing process is performed(S106).

Description

The compression method of 2 byte character data
Technical field
The present invention relates to a kind of compression method of 2 byte character data, more particularly, relate to a kind ofly, utilize the compression method of 2 byte character data of 2 byte character compression algorithms in order to reduce SMS (Short Message Service) in the mobile communication terminal and the information stores space of EMS (Enhanced Messaging Service).
Background technology
Generally speaking, the client utilizes the information of mobile communication terminal to send receiving function (SMS, EMS), carries out information exchange miscellaneous.Most mobile communication terminal compresses this information hardly, and the terminating machine that carries out the part compression also just utilizes the compression algorithm that is fit to English alphabet.
But, when adopting this compression algorithm, resemble the such language of Korea's character and Chinese, because have the characteristics of tediously long property mostly, so relatively compression efficiency is low, and need more internal memory, existence can not reduce the problem of memory space effectively.
The flat 2-255977 of [patent documentation 1] Ri Bente Open (1990-255977 number bulletin of Japan Patent)
The flat 9-069785 of [patent documentation 2] Ri Bente Open (1997-069785 number bulletin of Japan Patent)
Summary of the invention
The present invention has overcome above-mentioned deficiency, it is a kind of in the message processing module of terminating machine that its purpose is to provide, with 2 byte characters (Korea's character, Chinese) is that unit compresses information and stores, thereby can reduce the compression method of 2 byte character data of memory space.
To achieve these goals, the compression method of 2 byte character data of the present invention is characterised in that and comprises: generate a plurality of compressible code words according to frequency number, and be stored in the basic dictionary table, with the step of the initialization of variable of the next code word of the expression of registration; With reference to the variable that has been initialised, the compressible code word of appending is stored in comprises described basic dictionary table in interior additional dictionary table, the step that the variable of the next code word of expression of registration is reinitialized; Whether the information data of identification input is 2 byte characters, and the input step that receives; Relatively whether Shu Ru data are included in this compressible code word, in the time of in being included in this compressible code word, from this dictionary table, meet code and output, when not having this to meet code in the dictionary, it is registered in step in the dictionary through mapping process search; Judge whether it is the mantissa of data, when data have not been imported, return the input step of input information data successively; And when being the mantissa of data, carry out the step of reset procedure, described reset procedure is meant in memory storage methods, with 8 or 16 storage data, but the figure place that has variable-length for the data that have been compressed, when the data of last storage are not 8 or 16, with remaining at last position with 0 process of filling up; A critical value that can reduce than this compressible code word when the figure place that meets code that will this compressible code word coding obtains hour, with log2 (C1+1)-1 output, when the figure place that meets code word is bigger than critical value, with the output of log2 (C1+1) position, this C1 is current by the number of the code word of assignment.
Beneficial effect of the present invention is, in the message processing module of terminating machine, the information by compressing 2 byte characters (Korea's character, Chinese etc.) is also stored, and can reduce memory space.That is to say that when utilizing the text of method compression English of the present invention and Korea's character mixing, compare with existing compression method, the mean pressure shrinkage has about about 22% the effect of improving.
Description of drawings
Fig. 1 is the operational flowchart of the compression method of 2 byte character data in the one embodiment of the invention.
Fig. 2 is in the compression method of the 2 byte character data of one embodiment of the present of invention, and search meets the operational flowchart that the step (compression step) of code and output is elaborated through mapping process from this dictionary table.
Fig. 3 is the operational flowchart that this dictionary generation/management process that meets dictionary of management in the compression method of the 2 byte character data of one embodiment of the present of invention is elaborated.
Embodiment
For convenience of description, the compression method of 2 byte character data of the present invention is that example describes with the Korean.But be equally applicable to language with 2 type flags such as Chinese, Japanese etc.Therefore, in the present embodiment, only Korean compression situation is described, but the present invention is not limited in Korean, this will be readily apparent to persons skilled in the art.
Below the contrast accompanying drawing describes embodiments of the invention.
Fig. 1 is the operational flowchart of the compression method of 2 byte character data in one embodiment of the present of invention, below will relevant therewith situation be described.
At first, the maximum character string number (N7) of initialization, code number of words (N2), initial dictionary entry number (N5) etc., the character that frequency number is high is collected in the basic dictionary table, and the variable C1 initialization (S101) of the next code word of the expression that will register, be used for character compression code word to be constructed as follows table described.Here, in order to find the needed code word of character compression, from Korea's character and English mixed file, find out the frequency of occurrences of 2350 words of finishing type Korea character after, with its arrangement and observation, 2% 470 words that often use are wherein registered as code word.In this case, the whole frequency of occurrences of these 470 characters of 2% reaches more than 85%.Therefore, the initialization value of this variable C1 can be 471.
Table 1:
0~255 ASCII (ASCII)
256~725 Korea's character code (470 words)
726~1023 10 codings
1024~2047 11 codings
2048~4095 12 codings
Then, the variable that is initialised of contrast is stored in the compressible code word of appending and comprises this basic dictionary table in interior additional dictionary table, reinitializes the variable C1 (S102) of the next code word of expression of registration.At this, the figure place that meets code of the compressible code word of encoding depends on following formula.
Formula 1:(C1+lim)≤2 Log (C1+1)-1
Formula 2:lim=C3-C1-1
Formula 3:C3=2 Log (C1+1)
At this, this C1 is meant current by the code number of words of assignment, and lim is meant that code word can reduce the critical value of position.Therefore, code word is converted to when ranking, if code word is littler than determined critical value (lim), then with log 2(C1+1)-1 an output is if it is bigger than critical value to meet code word, then with log 2(C1+1) position output.
For example, this C1 is 750 o'clock, lim=(1024-750-1)=273, so compression epoch code word is between 0 to 273, with 9 coding outputs, if compress the epoch code word between 274 to 749, each code word adds 274, with 10 coding outputs
When removing compression,,, then its value is read as the code word code,, then read with 10 again, the value that deducts 274 is read as the code word code if this value of reading is bigger than 274 if this value of reading is littler than 274 with 9 read the code word bits.Following table 2 is represented dictionary table structure of the present invention in above-mentioned mode.
Table 2:
Compressible code word The code that is encoded 10 systems
0 000000000 0
1 000000001 1
2 000000010 2
. . .
. . .
273 100010001 273
274 1000100100 548(274+274)
275 1000100101 549(274+275)
. . .
. . .
749 1111111111 1023(274+749)
Thereafter, input information data successively.Relatively whether Shu Ru data are included in this compressible code word, and in the time of in being included in this compressible code word, through mapping process, search meets code and output (S103) from this dictionary table.Then, confirm that this meets code and whether is present in the dictionary, when not having in the dictionary, the dictionary that carries out registering in dictionary generates step (S104).
Afterwards, judge whether it is the mantissa of data, when not being the mantissa of data, turn back to the step of input information data (S105) successively.
If the mantissa of data then carries out reset procedure (Flush) (S106).At this, said this reset procedure is meant in memory storage methods, with 8 or 16 storage data, but has the figure place of variable-length for the data that have been compressed, when the data of last storage are not 8 or 16, with remaining at last position with 0 process of filling up.
Fig. 2 is in the compression method in one embodiment of the invention 2 byte character data, process mapping process from this dictionary table, search meets the operational flowchart that is elaborated of step (compression step) of code and output, and relevant therewith being described as follows is described.
At first read first byte (S201) of input data.
Judge that this first byte is whether in the 1st assignment scope (S202) thereafter.Here, when being finishing type Korea character, because first byte possesses 25 numerals from the B0 of 16 systems to C8, so the 1st assignment scope can be that B0 from 16 systems is to C8.
If this first byte is positioned at the 1st assignment scope, read second byte (S203) of input data.
On the other hand, if this first byte is not in the 1st assignment scope, because be not Korea's character of finishing type, so determine it is character (S207) in the ASCII.
Judge that this second byte is whether in the 2nd assignment scope (S204) thereafter.Here, when being finishing type Korea character, because second byte possesses 94 numerals from the A1 of 16 systems to FE, so the 2nd assignment scope can be that A1 from 16 systems is to FE.
If this second byte is positioned at the 2nd assignment scope, judge whether the input data are included in this dictionary table (S205).
On the other hand, if this second byte be not in the 2nd assignment scope, because be not Korea's character of finishing type, so determine it is character in the ASCII
(S207)。
If the data of input are included in this dictionary table, determine to meet code value (S206).
On the other hand, if the data of input are not included in this dictionary table, because be not the high Korea's character of the frequency of occurrences, so determine it is character (S207) in the ASCII.
Fig. 3 is to checking that in the compression method of the 2 byte character data of one embodiment of the present of invention this meets code and whether is present in the dictionary, if just be not registered in the dictionary in the dictionary, and remove the operational flowchart that the dictionary management step that is registered in the code that does not often use in the dictionary is elaborated, relevant therewith being described as follows is described.
Whether the character string (length) of at first judging this code word surpasses maximum character string number (N7), if the character string of this code word surpasses maximum character string number (N7) then stops dictionary management step (S301).
If the character string of this code word does not surpass maximum character string number (N7), then judge whether to be present in this dictionary table, in the time of in having this dictionary table, then stop dictionary management step (S302).
If do not exist in the dictionary table, to this character string of new variables C1 assignment (S303).
Then, new variables C1 is for the code word assignment of the character string that then generated and increase its value (S304).
Then, greater than code number of words (N2) (S305) whether the variable C1 of judgement increase.
If the variable C1 that increases is greater than code number of words (N2), to the variable C1 assignment dictionary entry number (N5) that increases, if the variable C1 that increases is during less than code number of words (N2), not to its assignment dictionary entry number (N5) (S306).
Then, whether the node of judging the new variables C1 that assignment give to increase is as leaf (leaf) node of the node of expression character string end character or is not the node (C1==NULL) that is not used, when the node of the new variables that assignment give to increase is not leaf (leaf) node of the node of character string end character in the expression dictionary entry or when not being the node that is not used, turn back to new variables C1 increases its value for the code word assignment of the character string that then generated step (S307).
If when the node of the variable C1 that assignment give to increase is leaf (leaf) node of node of expression character string end character or the node that is not used, then from dictionary entry, remove variable C1, prepare the code word (S308) of the new character string of assignment.
The present invention is not limited to the disclosed scope of the foregoing description.Can carry out various improvement, change in technical theme of the present invention, these improvement, change also are subordinated to technology category of the present invention, protected by the present invention.

Claims (8)

1. the compression method of byte character data is characterized in that comprising:
Generate a plurality of compressible code words according to frequency number, be stored in the basic dictionary table, with the step of the initialization of variable of the next code word of the expression of registration;
With reference to the variable that has been initialised, the compressible code word of appending is stored in comprises described basic dictionary table in interior additional dictionary table, the step that the variable of the next code word of expression of registration is reinitialized;
Whether the information data of identification input is 2 byte characters, and the input step that receives;
Relatively whether Shu Ru data are included in the described compressible code word, in the time of in being included in described compressible code word, from described dictionary table, meet code and output through mapping process search, described when meeting code when not having in the dictionary, it is registered in step in the dictionary;
Judge whether it is the mantissa of data, when data have not been imported, return the input step of input information data successively; And
When being the mantissa of data, carry out the step of reset procedure, described reset procedure is meant in memory storage methods, with 8 or 16 storage data, but the figure place that has variable-length for the data that have been compressed, when the data of last storage are not 8 or 16, with remaining at last position with 0 process of filling up;
When the figure place that meets code that obtains of the described compressible code word of coding can reduce the critical value hour of position than described compressible code word, with log 2(C1+1)-1 an output is when the figure place that meets code word is bigger than critical value, with log 2(C1+1) position output, described C1 is current by the code number of words of assignment.
2. the compression method of 2 byte character data according to claim 1 is characterized in that:
In order to find described compressible code word, from the mixed file of 2 byte characters and 1 byte character, find the frequency of occurrences of described 2 byte characters of finishing type after, with its arrangement and analysis, the character that wherein often uses is registered as code word.
3. the compression method of 2 byte character data according to claim 1 is characterized in that:
Begin the measuring frequency number from the character that utilizes the combination performance more than 2 bytes, only the character that will often use is registered in the dictionary as the basic code word.
4. the compression method of 2 byte character data according to claim 2 is characterized in that:
Described 2 byte characters are Chinese, and described 1 byte character is an English character.
5. the compression method of 2 byte character data according to claim 2 is characterized in that:
Described 2 byte characters are Koreans, and described 1 byte character is an English character.
6. the compression method of 2 byte character data according to claim 1 is characterized in that:
The step that meets code and output through mapping process search from described dictionary table comprises:
Read the step of first byte of input data;
Judge whether described first byte is positioned at the step of the 1st assignment scope;
When described first byte is positioned at the 1st assignment scope, read the step of second byte of input data;
When described first byte is not positioned at described the 1st assignment scope, because be not Korea's character of finishing type, so determine it is the step of the character in the ASCII;
Judge whether described second byte is positioned at the step of the 2nd assignment scope;
When described second byte was positioned at described the 2nd assignment scope, whether the data of judging input were included in the step in the described dictionary table;
When described second byte is not positioned at described the 2nd assignment scope, because be not Korea's character of finishing type, so determine it is the step of the character in the ASCII;
When the input data are included in the described dictionary table, determine to meet the step of code value; And
When the input data are not included in the described dictionary table, because be not the high Korea's character of the frequency of occurrences, so determine it is the step of the character in the ASCII.
7. the compression method of 2 byte character data according to claim 4 is characterized in that:
Described the 1st assignment scope is that B0 from 16 systems is to C8.
8. the compression method of 2 byte character data according to claim 4 is characterized in that:
Described the 2nd assignment scope is that A1 from 16 systems is to FE.
CNB2003101242211A 2003-04-08 2003-12-31 Compression method of two-byte character data Expired - Fee Related CN100474781C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR10-2003-0021924A KR100494876B1 (en) 2003-04-08 2003-04-08 Data compression method for multi-byte character language
KR1020030021924 2003-04-08
KR10-2003-0021924 2003-04-08

Publications (2)

Publication Number Publication Date
CN1536768A CN1536768A (en) 2004-10-13
CN100474781C true CN100474781C (en) 2009-04-01

Family

ID=34374057

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101242211A Expired - Fee Related CN100474781C (en) 2003-04-08 2003-12-31 Compression method of two-byte character data

Country Status (2)

Country Link
KR (1) KR100494876B1 (en)
CN (1) CN100474781C (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100755533B1 (en) * 2005-07-25 2007-09-06 주식회사 팬택 Method and apparatus of generating character set
KR101386169B1 (en) * 2007-08-09 2014-04-17 삼성전자주식회사 Apparatus and method for compression and restoration SMS
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 Chinese data compression method and Chinese data decompression method and related devices
WO2013074658A1 (en) * 2011-11-15 2013-05-23 Citrix Systems, Inc. Systems and methods for compressing short text by dictionaries in a network
US9300322B2 (en) * 2014-06-20 2016-03-29 Oracle International Corporation Encoding of plain ASCII data streams
CN106354699B (en) * 2015-07-13 2021-05-18 富士通株式会社 Encoding method, encoding device, decoding method, and decoding device
CN112416315B (en) * 2020-06-16 2024-05-14 上海哔哩哔哩科技有限公司 Compression method of CSS code, electronic device and storage medium
KR102633001B1 (en) * 2023-03-27 2024-02-05 주식회사 무브먼츠 Method for implementing underground facilities as ar in an offline environment using combined data precessing of qr code and nfc

Also Published As

Publication number Publication date
KR20040087503A (en) 2004-10-14
KR100494876B1 (en) 2005-06-14
CN1536768A (en) 2004-10-13

Similar Documents

Publication Publication Date Title
JP3553106B2 (en) Text compression driver construction method and input text string compression method
US6778103B2 (en) Encoding and decoding apparatus using context
US5635932A (en) Lempel-ziv compression with expulsion of dictionary buffer matches
US7770091B2 (en) Data compression for use in communication systems
CN104579360B (en) A kind of method and apparatus of data processing
KR20040007442A (en) Method for compressing/decompressing a structured document
US7973680B2 (en) Method and system for creating an in-memory physical dictionary for data compression
CN100474781C (en) Compression method of two-byte character data
CN101350858A (en) Method for decoding short message and user terminal
ES2586409T3 (en) Method and apparatus for arithmetic coding and decoding
CN101534124B (en) Compression algorithm for short natural language
CN101751451B (en) Chinese data compression method and Chinese data decompression method and related devices
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
JPS6356726B2 (en)
US7864085B2 (en) Data compression method and apparatus
CN116827354B (en) File data distributed storage management system
CN109831544B (en) Code storage method and system applied to email address
CN116894016A (en) Log compression method and device for rail transit signals
CN114466082B (en) Data compression and data decompression method and system and artificial intelligent AI chip
CN112835925B (en) SQL statement analysis method for embedded chip
CN112506876B (en) Lossless compression query method supporting SQL query
CN101729076B (en) Nonperfect code table based Huffman decoding method for analyzing code length
Guthrie et al. Efficient Minimal Perfect Hash Language Models.
CN101465902B (en) Compression communication method of mobile phone short message
CN111342844B (en) LZW coding and improved run-length coding-based radar data lossless compression and decompression method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1070189

Country of ref document: HK

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1070189

Country of ref document: HK

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Seoul, South Kerean

Patentee after: Pantech property management Co.

Address before: Seoul, South Kerean

Patentee before: PANTECH Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20161026

Address after: Seoul, South Kerean

Patentee after: PANTECH CO.,LTD.

Address before: Seoul, South Kerean

Patentee before: Pantech property management Co.

TR01 Transfer of patent right

Effective date of registration: 20200609

Address after: Seoul, South Kerean

Patentee after: Pan Thai Co.,Ltd.

Address before: Seoul, South Kerean

Patentee before: Pantech Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090401