WO2011002499A1 - Dynamic pattern elimination based compression method for tex-based signaling protocols - Google Patents
Dynamic pattern elimination based compression method for tex-based signaling protocols Download PDFInfo
- Publication number
- WO2011002499A1 WO2011002499A1 PCT/US2010/001845 US2010001845W WO2011002499A1 WO 2011002499 A1 WO2011002499 A1 WO 2011002499A1 US 2010001845 W US2010001845 W US 2010001845W WO 2011002499 A1 WO2011002499 A1 WO 2011002499A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pattern
- patterns
- dynamic pattern
- found
- text message
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
Definitions
- This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means.
- this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
- Any text-based protocol would have predefined keywords with special purposes that are agreed between parties to communicate with each other.
- a trivial way used to reduce the size of messages is to use shorter forms to replace those long, predefined keywords.
- the existing technologies of text-based compression can be categorized into two different groups.
- One is dictionary-based and another one is to use a standard compression algorithm such as Huffman codes.
- Dictionary-based techniques usually use static dictionaries that are created before transmission of a message and/or dynamic dictionaries that are included in the message. Those techniques include US Serial No. 6,976,081, US Serial No. 5,999,949, US Serial No. 7,412,541, US Serial No. 6,807,173, US Serial No. 6,883,035, and US Serial No. 6,976,081.
- Replacing the longer words with a shorter form is a simple example of using a static dictionary at both the compressor and the decompressor.
- This disclosure proposes a method, Dynamic Pattern Elimination, to eliminate redundant patterns the content of which is not known a priori. The proposed method identifies the redundant patterns on the fly and does not require any dictionary.
- This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means.
- this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
- FIGURE 1 is an example of a partial SIP message
- FIGURE 2 is an example of a partial SIP message with markers
- FIGURE 3 is an example of a compressed SIP message
- FIGURE 4 is a table describing the mapping between variables and patterns.
- This disclosure describes a method to achieve a higher compression ratio than by just replacing known longer patterns with shorter forms.
- the preferred embodiment is specifically designed for a wireless environment as a wireless link is prone to errors. With a smaller message size, one has a higher probability of successful transmission as well as reduced latency over the wireless link.
- Figures 1 and 2 show an example of a partial SIP message before and after markers are inserted.
- FIG. 3 An example of a compressed message is shown in Figure 3 and a mapping table between variables and patterns is shown in Figure 4.
- a special marker, ⁇ in the example above, is used to indicate the beginning of a pattern and the corresponding variable. By doing so, the decompressor is able to reconstruct the mapping between variables and patterns. If the decompressor finds the variable in the message, it could replace it with the pattern it found.
- a marker is to identify possible duplicate patterns, we could add identification of dynamic patterns later without breaking compatibility because the additional markers are inserted by the compressor, and the decompressor could still decompress the message with additional markers.
- This application disclosed a general approach to eliminate duplicate patterns in text-based protocol.
- the regular expression is used to identify candidate patterns to be removed. Then one examines the message for special markers and variables to compress and decompress the message.
- the advantages of this method include:
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
This disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori, by identifying the candidate dynamic patterns and marking them, then checking to see if there are any duplicate occurrences within the entire message by searching for markers, if a marker is found, checking to see if the pattern occurred before, if not, assigning a unique variable to the pattern, if so replacing the pattern with the variable that was assigned for this pattern, and if a pattern is found only once, removing the variable assigned to it.
Description
TITLE
DYNAMIC PATTERN ELIMINATION BASED COMPRESSION METHOD FOR TEXT-BASED SIGNALING PROTOCOLS
INVENTORS
SHIH-CHUN CHANG
SH)DHARDHA GARIGE
SREEKANT NATO
HAI VU
FIELD OF THE INVENTION
This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means. Specifically, this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
BACKGROUND OF THE INVENTION
Any text-based protocol would have predefined keywords with special purposes that are agreed between parties to communicate with each other. A trivial way used to reduce the size of messages is to use shorter forms to replace those long, predefined keywords. However, there may still be text patterns that are repeated or redundant in a message.
The existing technologies of text-based compression can be categorized into two different groups. One is dictionary-based and another one is to use a standard compression algorithm such as Huffman codes. Dictionary-based techniques usually use static dictionaries that are created before transmission of a message and/or dynamic dictionaries that are included in the message. Those techniques include US Serial No. 6,976,081, US Serial No. 5,999,949, US Serial No. 7,412,541, US Serial No. 6,807,173, US Serial No. 6,883,035, and US Serial No. 6,976,081. Replacing the longer words with a shorter form is a simple example of using a static dictionary at both the compressor and the decompressor. This disclosure proposes a method, Dynamic Pattern Elimination, to eliminate redundant patterns the content of which is
not known a priori. The proposed method identifies the redundant patterns on the fly and does not require any dictionary.
BRIEF SUMMARY OF THE INVENTION
This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means. Specifically, this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
For a fuller understanding of the nature and objects of the invention, reference should be made to the following detailed description taken in connection with the accompanying drawings.
DESCRIPTION OF THE DRAWINGS
For a fuller understanding of the nature and objects of the invention, reference should be made to the accompanying drawings, in which:
FIGURE 1 is an example of a partial SIP message;
FIGURE 2 is an example of a partial SIP message with markers;
FIGURE 3 is an example of a compressed SIP message; and
FIGURE 4 is a table describing the mapping between variables and patterns.
DETAmED DESCRIPTION OF THE INVENTION
This disclosure describes a method to achieve a higher compression ratio than by just replacing known longer patterns with shorter forms. The preferred embodiment is specifically designed for a wireless environment as a wireless link is prone to errors. With a smaller message size, one has a higher probability of successful transmission as well as reduced latency over the wireless link.
The basic idea is to identify duplicate patterns that cannot be known before hand. However, those patterns and the location may be predicated. Therefore, one uses a regular expression to identify the candidate patterns at the first stage, and remove duplicate patterns in the next stage. In this disclosure SEP signaling protocol is used as the preferred embodiment to illustrate the compression method.
In order to remove duplicate dynamic patterns, one first needs to identify them. This is done by inserting a marker before a candidate pattern so that it can be analyzed later. Note that the representation of markers is chosen such that they would not appear in normal SEP messages. Examples and the notations shown in this document are for preferred embodiment purposes only and other notations can be easily substituted by those skilled in the art. After analyzing characteristics of SIP messages, the inventors of this application found the IP address and User name patterns have a higher probability of being repeated at several points within a message. For example, below are regular expressions to identify and insert markers for IP address and user name:
IP address - s/([: ;\"@])([0-9\.]+)([: ;\">]|\r)ΛlΛ\2~\3/g
User name - s/([:\"])([a-zA-Z0-9\.]+)([\"@])ΛlΛ\2~\3/g
Note that additional identifications of dynamic patterns could be added later as discussed below. Figures 1 and 2 show an example of a partial SIP message before and after markers are inserted.
After identifying the candidate dynamic patterns, one checks to see if there are any duplicate occurrences within the entire message using the following steps.
1. Search for markers
2. If a marker is found, check if the pattern occurred before.
3. If not, assign a unique variable to the pattern, otherwise, replace the pattern with the variable that was assigned for this pattern.
4. If a pattern is found only once, remove the variable assigned to it.
An example of a compressed message is shown in Figure 3 and a mapping table between variables and patterns is shown in Figure 4.
At the decompressor, one only needs to find the markers and restore each pattern corresponding to a marker. A special marker, Λ in the example above, is used to indicate the beginning of a pattern and the corresponding variable. By doing so,
the decompressor is able to reconstruct the mapping between variables and patterns. If the decompressor finds the variable in the message, it could replace it with the pattern it found. As the purpose of a marker is to identify possible duplicate patterns, we could add identification of dynamic patterns later without breaking compatibility because the additional markers are inserted by the compressor, and the decompressor could still decompress the message with additional markers.
This application disclosed a general approach to eliminate duplicate patterns in text-based protocol. The regular expression is used to identify candidate patterns to be removed. Then one examines the message for special markers and variables to compress and decompress the message. The advantages of this method include:
a) Detection of the duplicate patterns on the fly without knowing the actual patterns.
b) Forward compatibility. One is able to add an additional regular expression to identify more patterns with prior version of implementation.
c) It's a generic solution for text-based protocols.
Since certain changes may be made in the above described dynamic compression method for text based signaling protocols without departing from the scope of the invention herein involved. It is intended that all matter contained in the description thereof, or shown in the accompanying figures, shall be interpreted as illustrative and not in a limiting sense.
Claims
1. A method for compressing and decompressing a text message with regularly occurring dynamic patterns where such dynamic patterns are not known in advance comprising:
compressing a text message by first inserting markers that are represented by characters not normally used in the text message where any candidate dynamic patterns are found;
then searching the text message for markers and when a marker is found determine if the candidate dynamic pattern has occurred before;
then if said candidate dynamic pattern has not occurred before assign said candidate dynamic pattern with a unique variable not normally used in the text message;
then if said candidate dynamic pattern has occurred before replace said candidate dynamic pattern with said unique variable that was assigned to said candidate dynamic pattern before;
then if said candidate dynamic pattern is found only once remove said unique variable assigned to it; and,
decompressing said text message by searching said text message and replacing each said unique variable found with the corresponding candidate dynamic pattern said unique variable replaced.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA2766910A CA2766910A1 (en) | 2009-07-01 | 2010-06-28 | Dynamic pattern elimination based compression method for text-based signaling protocols |
EP10794485.2A EP2449706A4 (en) | 2009-07-01 | 2010-06-28 | Dynamic pattern elimination based compression method for text-based signaling protocols |
MX2012000005A MX2012000005A (en) | 2009-07-01 | 2010-06-28 | Dynamic pattern elimination based compression method for tex-based signaling protocols. |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US26995109P | 2009-07-01 | 2009-07-01 | |
US61/269,951 | 2009-07-01 | ||
US12/803,380 US8090394B2 (en) | 2009-07-01 | 2010-06-25 | Dynamic pattern elimination based compression method for text-based signaling protocols |
US12/803,380 | 2010-06-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2011002499A1 true WO2011002499A1 (en) | 2011-01-06 |
WO2011002499A8 WO2011002499A8 (en) | 2012-01-19 |
Family
ID=43411341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2010/001845 WO2011002499A1 (en) | 2009-07-01 | 2010-06-28 | Dynamic pattern elimination based compression method for tex-based signaling protocols |
Country Status (5)
Country | Link |
---|---|
US (1) | US8090394B2 (en) |
EP (1) | EP2449706A4 (en) |
CA (1) | CA2766910A1 (en) |
MX (1) | MX2012000005A (en) |
WO (1) | WO2011002499A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2559209A4 (en) * | 2010-04-12 | 2014-07-09 | Flight Focus Pte Ltd | Use of a meta language for processing of aviation related messages |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030091189A1 (en) * | 1993-11-18 | 2003-05-15 | Rhoads Geoffrey B. | Arrangement for embedding subliminal data in imaging |
US20070028088A1 (en) * | 2005-08-01 | 2007-02-01 | Coskun Bayrak | Polymorphic encryption method and system |
US20070136492A1 (en) * | 2005-12-08 | 2007-06-14 | Good Technology, Inc. | Method and system for compressing/decompressing data for communication with wireless devices |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8085171B2 (en) * | 2006-03-24 | 2011-12-27 | University Of Mississippi | High-speed data compression based on set associative cache mapping techniques |
US8326605B2 (en) * | 2008-04-24 | 2012-12-04 | International Business Machines Incorporation | Dictionary for textual data compression and decompression |
-
2010
- 2010-06-25 US US12/803,380 patent/US8090394B2/en not_active Expired - Fee Related
- 2010-06-28 EP EP10794485.2A patent/EP2449706A4/en not_active Withdrawn
- 2010-06-28 MX MX2012000005A patent/MX2012000005A/en active IP Right Grant
- 2010-06-28 CA CA2766910A patent/CA2766910A1/en not_active Abandoned
- 2010-06-28 WO PCT/US2010/001845 patent/WO2011002499A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030091189A1 (en) * | 1993-11-18 | 2003-05-15 | Rhoads Geoffrey B. | Arrangement for embedding subliminal data in imaging |
US20070028088A1 (en) * | 2005-08-01 | 2007-02-01 | Coskun Bayrak | Polymorphic encryption method and system |
US20070136492A1 (en) * | 2005-12-08 | 2007-06-14 | Good Technology, Inc. | Method and system for compressing/decompressing data for communication with wireless devices |
Non-Patent Citations (1)
Title |
---|
See also references of EP2449706A4 * |
Also Published As
Publication number | Publication date |
---|---|
US20110003604A1 (en) | 2011-01-06 |
EP2449706A1 (en) | 2012-05-09 |
EP2449706A4 (en) | 2013-09-04 |
MX2012000005A (en) | 2012-04-11 |
WO2011002499A8 (en) | 2012-01-19 |
CA2766910A1 (en) | 2011-01-06 |
US8090394B2 (en) | 2012-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9680500B2 (en) | Staged data compression, including block level long range compression, for data streams in a communications system | |
US8954392B2 (en) | Efficient de-duping using deep packet inspection | |
US20110145313A1 (en) | Method and system for data transport compression based on an encoding dictionary patch | |
US20130307709A1 (en) | Efficient techniques for aligned fixed-length compression | |
US7026962B1 (en) | Text compression method and apparatus | |
AU2016200550A1 (en) | Encoding program, decompression program, compression method, decompression method, compression device and decompression device | |
US20130262486A1 (en) | Encoding and Decoding of Small Amounts of Text | |
US10015285B2 (en) | System and method for multi-stream compression and decompression | |
US9509333B2 (en) | Compression device, compression method, decompression device, decompression method, information processing system, and recording medium | |
US7375660B1 (en) | Huffman decoding method | |
US10897270B2 (en) | Dynamic dictionary-based data symbol encoding | |
US8417730B2 (en) | Block compression algorithm | |
US8909813B2 (en) | Efficient processing of compressed communication traffic | |
US20050281469A1 (en) | Efficient method and system for reducing update requirements for a compressed binary image | |
WO2004012338A3 (en) | Lossless data compression | |
US8090394B2 (en) | Dynamic pattern elimination based compression method for text-based signaling protocols | |
JP2007537642A (en) | Method and apparatus for compression and decompression of structured block unit of XML data | |
Platoš et al. | Compression of small text files | |
KR100494876B1 (en) | Data compression method for multi-byte character language | |
EP2779467B1 (en) | Staged data compression, including block-level long-range compression, for data streams in a communications system | |
JP2013187904A (en) | Apparatus and method for decoding | |
CN102567294A (en) | Text data processing method and text data processing device | |
KR101791877B1 (en) | Method and apparatus for compressing utf-8 code character | |
KR101028904B1 (en) | Apparatus and method for processing data | |
US7870160B2 (en) | Block compression algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10794485 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2766910 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2012/000005 Country of ref document: MX |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010794485 Country of ref document: EP |