WO2014114506A1 - Verfahren zur kompression von quelldaten unter nutzung von symmetrien und einrichtung zur durchführung des verfahrens - Google Patents
Verfahren zur kompression von quelldaten unter nutzung von symmetrien und einrichtung zur durchführung des verfahrens Download PDFInfo
- Publication number
- WO2014114506A1 WO2014114506A1 PCT/EP2014/050381 EP2014050381W WO2014114506A1 WO 2014114506 A1 WO2014114506 A1 WO 2014114506A1 EP 2014050381 W EP2014050381 W EP 2014050381W WO 2014114506 A1 WO2014114506 A1 WO 2014114506A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- lexicon
- compression
- source data
- compressed
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3086—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
Definitions
- the invention relates to a method for compressing source data into a data compression using symmetries, and to a device which performs such a method.
- Compression techniques allow lossless compression of source data so that it can be completely recovered from the data compression, and the source data can also be lossy-compressed, allowing much higher compression rates.
- Lossless compression replaces redundant strings in the source data with shorter strings.
- the comparison of the data to be compressed takes place against already known characters that are stored in a lexicon.
- the lexicon can form part of the data window as a "search buffer", the "look ahead buffer” forming the second part as a look-ahead buffer.
- the lexicon can consist of only one string or have multiple entries as a table or array. It is often not static, but is generated from the source data itself. The lexicons thus contain one or more entries against which matching to matches in the currently considered data window takes place.
- Examples of such methods are Lempel-Civic methods such as LZ77, LZ78 or LZW84. Since the lexicon required to decompress the data compact can be obtained from the data-compression itself, no separate transfer of the lexicon or an initial deposit of a decode table is required, at best, as in the case of LZW84, the character set itself must be initialized.
- a disadvantage of the known methods and devices is that the compression rate is not optimal for a given dictionary size. Generating a data compression with a higher compression rate is advantageous because it requires less storage space and thus reduces costs.
- the object of the invention is therefore to provide a method which enables better source data compression.
- the object of the invention is also to provide a device that can perform such a method.
- the object is achieved by a method for compressing source data into a data compression using symmetries in the source data, wherein a lexicon is generated from the source data or data sequences of the source data, compression by replacing the source data or the data sequences by data references in the lexicon, the data references having indicators specifying a match instruction for elements of the lexicon.
- the data references replacing the source data or data sequences are stored as (n + 1) tuples in comparison to known methods that deposit an n-tuple.
- the object is further achieved by a device that carries out such a method.
- the device may, for example, be a computer unit that executes the method as a software, such as a computer or a computer to be a device that executes the method in a hardware-encoded manner.
- the method according to the invention enables a higher compression rate by being able to reference the elements of the lexicon in different ways.
- the type of referencing is indicated by the value of an indicator that determines how the lexicon data is to be reconciled.
- the indicator may assume exactly two states. As a binary number, it takes up little additional storage space and doubles the number of referencing options. For example, in encyclopaedias with one or more character strings, it may be provided to check them for matches in the forward or backward direction.
- the indicator can specify a direction in which a comparison of a source data part with a lexicon entry takes place. It thus corresponds to a reading direction (forward or backward). However, it may also be advantageous, depending on the symmetry in the data, for the indicator merely to stipulate that the adjustment takes place according to a predetermined pattern or predetermines a reading direction from the middle of the lexicon entry.
- the indicator may take more than two different values.
- this can specify reading directions in various dimensions.
- Source data with directional symmetries which are present in many signals, sound information and images, are well suited for compression with the method according to the invention.
- the method is also applicable to the coding of DNA sequences which often have palindromic sequences as duplexes. Their internal symmetry is formed by single strands, which form a horizonal mirror-reversed base sequence.
- the invention also relates to a method for decompressing the data compressed generated by the method according to the invention and to devices which carry out such a method.
- the decompression method can reconstruct the lexicon from the transmitted data and make a back substitution of the data references through the lexicon entries so that the original source data can be recovered without loss.
- FIGS. 1-3 show a first method according to the invention which generates a data compact according to a modified LZ 77 method
- FIGS. 4-5 show a second method according to the invention which generates a data commentary according to a modified LZ78 method
- FIG. 6 shows a third method according to the invention, which generates a data compact according to a modified LZW84 method.
- the source data passes through a data window, which according to FIG. 1 consists of search and look-ahead buffers.
- the search buffer as a search buffer corresponds to the lexicon, which thus consists of exactly one word.
- the look-ahead buffer as a look-ahead buffer maps the source data to be compressed according to the current encoding position.
- the Compression is enabled by outputting or storing the position and length within the search buffer, where the character subsequence exactly matches the currently processed character string in the look-ahead buffer.
- the first method is a bi-directional expression of the LZ 77 method. It modifies the compression-critical consistency criterion by searching for the longest match of partial sequences in the forward or reverse direction in a given data window and, accordingly, a directional indicator in the compressed one Data stream is stored or transmitted for each compressed subsequence.
- the search in the search buffer is thus bidirectional.
- the first mismatched character (F s ) is also included in the compressed sequence.
- the position (P), the length of the maximum match (L) in one direction or the other, the indicator bit (F) for the direction and the first mismatched character (F s ) are stored or transmitted as a compressed sequence. Instead of a triple, a quadruple is used as a reference.
- L c log 2 (nL A ) + log 2 (L A ) + 1 + N S , where n is the length of the data window, L A is the length of the look-ahead buffer, 1 is the length of the data Indicator, and N s is the number of bits required to describe the next character.
- the difference in the length of the compressed data stored (transmitted) per subsequence is only 1, given by the indicator bit.
- FIG. 1 An exemplary embodiment is shown in FIG.
- n 16
- the first character in the look-ahead buffer is 'a'. Since the search buffer was initialized with zeroes, there is no match - either in one direction or the other. Therefore 'a' is stored without compression (sent) and moved to the search buffer (step 1). Under Step 2, there is a forward match of three characters ('aaa') and one backward of one ('a'), so the forward direction is chosen. Instead of matching characters, (8, 0, 3, b) is filed (sent), which specifies the position of the match, the indicator bit, the number of matching characters, and the first mismatched character.
- the maximum match in the search buffer is three characters in the forward direction, but seven in the reverse direction. This is compressed by using the seven-character match and setting a 1 as an indicator to indicate the backward direction.
- a second embodiment of the method according to the invention extends the compression algorithm according to FIG. Ziv and A. Lempel, "Compression of individual sequences via variable-rate coding, IEEE Transactions on Information Theory, vol. 24, no. 5, pp. 530-536, 1978, (LZ78).
- LZ78 creates a lexicon for compression and decompression, in which previous sequence parts are stored separately as a list and extended by characters that have not yet appeared.
- the compressed sequence contains the position (line number) in the list and the new character that does not already exist.
- the method as modification of the LZ78 algorithm determines the maximum match in the forward and backward direction or, in a more general, not shown implementation, the maximum match of other alternative symmetric representations of a subsequence.
- FIG. 4 shows the pseudocode of the method.
- L denotes the maximum match length.
- the compressed stored or transmitted sequence contains the lexicon or dictionary index, the indicator, here in the form of an indicator bit for the direction, and the next character not yet present. Only the indicator bit has been added compared to the original LZ78.
- FIG. 5 shows the compression process of the thus modified LZ78 in chronological order.
- a third embodiment of the inventive method extends the compression algorithm LZW84, which differs from LZ78 in that there is an initialized lexicon for the compressing device (as well as for the decompressing device). Depending on the type of data to be compressed, this is pre-assigned, for example, by an ASCII character set or DNA base sequences. The transfer of the non-existing data compression from the source data is deferred until the first letter of the
- Lexicon entry to be transmitted next In case of missing character (empty string), the first letter of the previous sequence can be appended to the previous sequence to replace the missing character.
- the LZW84 is extended in an analogous way by an indicator.
- an indicator In the case of a lexicon initialized with ASCII characters, however, the use of an indicator is redundant if the entire ASCII table is stored. However, a benefit can be achieved if the inherent bit symmetry is used. It suffices to place 128 (instead of 256) characters in the lexicon and to set the other 128 by means of a binary indicator bit. The size of the initial lexicon can thus be reduced by half.
- FIG. 6 shows the pseudocode of the third embodiment of the modified LZW84 method, wherein the aforementioned bit symmetry inversion is not shown for the sake of simplicity.
- the entire lexicon is searched for the maximum match.
- the last entry of the lexicon can not be used for backward reading.
- the data digest contains the lexicon index and the indicator.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112014000245.3T DE112014000245A5 (de) | 2013-01-22 | 2014-01-10 | Verfahren zur Kompression von Quelldaten unter Nutzung von Symmetrien und Einrichtung zur Durchführung des Verfahrens |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE202013100294.1 | 2013-01-22 | ||
DE201320100294 DE202013100294U1 (de) | 2013-01-22 | 2013-01-22 | Einrichtung zur Kompression von Quellendaten unter Nutzung von Symmetrien |
EP13188564 | 2013-10-14 | ||
EP13188564.2 | 2013-10-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014114506A1 true WO2014114506A1 (de) | 2014-07-31 |
Family
ID=50033470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2014/050381 WO2014114506A1 (de) | 2013-01-22 | 2014-01-10 | Verfahren zur kompression von quelldaten unter nutzung von symmetrien und einrichtung zur durchführung des verfahrens |
Country Status (2)
Country | Link |
---|---|
DE (1) | DE112014000245A5 (de) |
WO (1) | WO2014114506A1 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105553483A (zh) * | 2015-12-09 | 2016-05-04 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种产生lz77的方法及装置 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0677927A2 (de) * | 1994-04-15 | 1995-10-18 | International Business Machines Corporation | Zeichenkettenvergleich mit minimalem Rechenaufwand pro Zeichen |
-
2014
- 2014-01-10 DE DE112014000245.3T patent/DE112014000245A5/de active Pending
- 2014-01-10 WO PCT/EP2014/050381 patent/WO2014114506A1/de active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0677927A2 (de) * | 1994-04-15 | 1995-10-18 | International Business Machines Corporation | Zeichenkettenvergleich mit minimalem Rechenaufwand pro Zeichen |
Non-Patent Citations (4)
Title |
---|
ATALLAH M J ET AL: "A pattern matching approach to image compression", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) LAUSANNE, SEPT. 16 - 19, 1996; [PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)], NEW YORK, IEEE, US, vol. 1, 16 September 1996 (1996-09-16), pages 349 - 352, XP010202666, ISBN: 978-0-7803-3259-1, DOI: 10.1109/ICIP.1996.560828 * |
J. ZIV; A. LEMPEL: "A universal algorithm for sequential data compression", IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 23, no. 3, 1977, pages 337 - 343, XP000560510, DOI: doi:10.1109/TIT.1977.1055714 |
J. ZIV; A. LEMPEL: "Compression of individual sequences via variable-rate coding", IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 24, no. 5, 1978, pages 530 - 536, XP002911735, DOI: doi:10.1109/TIT.1978.1055934 |
WELCH T A: "A TECHNIQUE FOR HIGH-PERFORMANCE DATA COMPRESSION", COMPUTER, IEEE, US, 1 June 1984 (1984-06-01), pages 8 - 19, XP000743063, ISSN: 0018-9162, DOI: 10.1109/MC.1984.1659158 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105553483A (zh) * | 2015-12-09 | 2016-05-04 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种产生lz77的方法及装置 |
CN105553483B (zh) * | 2015-12-09 | 2018-12-21 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种产生lz77的方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
DE112014000245A5 (de) | 2015-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE3606869C2 (de) | Vorrichtung zur Datenkompression | |
DE19742417B4 (de) | Vorrichtung und Verfahren zur Durchführung von M-fachem Maschinenendzustands-Entropiekodieren bzw. Entropiekodieren mit einer Maschine mit finitem Zustand | |
DE10196890B4 (de) | Verfahren zum Ausführen einer Huffman-Decodierung | |
DE10301362B4 (de) | Blockdatenkompressionssystem, bestehend aus einer Kompressionseinrichtung und einer Dekompressionseinrichtung, und Verfahren zur schnellen Blockdatenkompression mit Multi-Byte-Suche | |
DE69834695T2 (de) | Verfahren und Vorrichtung zur Datenkompression | |
DE69725215T2 (de) | Verfahren und Vorrichtung zur Komprimierung und Dekomprimierung von Schrifttypen | |
DE19622045C2 (de) | Datenkomprimierungs- und Datendekomprimierungsschema unter Verwendung eines Suchbaums, bei dem jeder Eintrag mit einer Zeichenkette unendlicher Länge gespeichert ist | |
DE69935811T3 (de) | Frequenzbereichsaudiodekodierung mit Entropie-code Moduswechsel | |
DE112012004873B4 (de) | Hohe Bandbreitendekomprimierung von mit variabler Länge verschlüsselten Datenströmen | |
DE112013006339B4 (de) | Kompression hoher Bandbreite um Datenströme zu Verschlüsseln | |
DE112012005557B4 (de) | Erzeugen eines Code-Alphabets von Symbolen zum Erzeugen von Codewörtern für Wörter, die mit einem Programm verwendet werden | |
DE112013000734B4 (de) | Multiplex-Klassifizierung zum Komprimieren von Tabellendaten | |
DE112008002903T5 (de) | Datensequenzkompression | |
DE2264090B2 (de) | Datenverdichtung | |
DE102006062062B4 (de) | Komprimierung von Lieddaten und Komprimierer/Dekomprimierer | |
DE60225785T2 (de) | Verfahren zur codierung und decodierung eines pfades in der baumstruktur eines strukturierten dokuments | |
DE112017005823T5 (de) | Codieren von symbolen variabler länge zum ermöglichen eines parallelen decodierens | |
EP1286471B1 (de) | Verfahren zur Kompression von Daten | |
WO2014114506A1 (de) | Verfahren zur kompression von quelldaten unter nutzung von symmetrien und einrichtung zur durchführung des verfahrens | |
EP2095196B1 (de) | System und verfahren zur verlustfreien verarbeitung von prozesswerten einer technischen anlage oder eines technischen prozesses | |
DE60311886T2 (de) | Verfahren und vorrichtung zum sortieren zyklischer daten in lexikographischer reihenfolge | |
EP2823568B1 (de) | Verfahren zur codierung eines datenstroms | |
DE10131801A1 (de) | Verfahren zur Datenkompression und Navigationssystem | |
WO2008040267A1 (de) | Verfahren und vorrichtung zur kompression und dekompression digitaler daten auf elektronischem wege unter verwendung einer kontextgrammatik | |
EP1924931B1 (de) | Verfahren zur speichereffizienten durchführung einer burrows-wheeler-rücktransformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14702459 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120140002453 Country of ref document: DE Ref document number: 112014000245 Country of ref document: DE |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: R225 Ref document number: 112014000245 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14702459 Country of ref document: EP Kind code of ref document: A1 |