FI115937B

FI115937B - Lossless data compression and decompression

Info

Publication number: FI115937B
Application number: FI20012215A
Authority: FI
Inventors: Jukka Kilponen
Original assignee: 3C Data Oy
Priority date: 2001-11-14
Filing date: 2001-11-14
Publication date: 2005-08-15
Also published as: FI20012215A; FI20012215A0

Description

115937 Häviötön datan tiivistäminen ja purkaminen Ala115937 Lossless Data Compression and Decompression Lower

Keksinnön kohteena on häviötön datan tiivistäminen ja purkaminen.The invention relates to lossless data compression and decompression.

Tausta 5 Tällä hetkellä tunnetaan satoja erilaisia tiedon tiivistysmenetelmiä.BACKGROUND 5 Hundreds of different methods of data compression are currently known.

Tiedon tiivistys (engl. compress) tunnetaan myös pakkaamisena ja kompressiona. Tiivistyksen tarkoituksena on pienentää datan kokoa siten, että tiivistetyn datan koko on pienempi kuin alkuperäisen, jolloin datan tallettaminen tai siirtäminen on helpompaa.Compression is also known as compression and compression. The purpose of compression is to reduce the size of the data so that the size of the compressed data is smaller than the original, making it easier to store or transfer data.

10 Tiivistysmenetelmät voidaan jakaa häviöttömiin (lossless) ja häviölli- siin (lossy) menetelmiin. Häviöllisissä menetelmissä osa alkuperäisestä tietosisällöstä katoaa tiivistyksen aikana. Häviöttömissä menetelmissä tiivistetty data on mahdollista palauttaa täsmälleen alkuperäiseen asuunsa. Häviötöntä tiivistämistä käytetään yleensä mm. tekstin, taulukoiden ja ohjelmakoodin tiivistä-15 misessä.10 Compaction methods can be divided into lossless and lossy methods. In lossy methods, some of the original data content is lost during compression. In lossless methods, it is possible to restore the compressed data exactly to its original state. Lossless compression is usually used for example. compression of text, tables, and program code.

Häviöttömissä tiivistysmenetelmissä on yleisesti käytössä kaksi eri toimintaperiaatetta. Tiivistettävän datan yksittäiset merkit voidaan korvata jollain keskimäärin alkuperäistä lyhyemmällä koodilla. Esimerkkejä tästä ovat Huffman koodit ja aritmeettinen koodaus. Toinen periaate on käyttää apuna : 20 sanakirjaa (dictionary), ja korvata tiivistettävän datan osia viittauksilla sanakir- ·:··· jaan. Suurin osa nykyisistä häviöttömistä tiivistysmenetelmistä on näiden kah- . \ j den periaatteen erilaisia sovelluksia ja yhdistelmiä.Two different operating principles are commonly used in lossless compaction methods. The individual characters of the data to be compressed can be replaced with some code shorter than the original. Examples of this are Huffman codes and arithmetic coding. Another principle is to use: 20 dictionaries, and replace parts of the data to be compressed with references to the dictionary:: ···. Most of the current lossless compaction methods are using these two. \ j den different applications and combinations.

: Tunnetuin sanakirjaan perustuva tiivistysmenetelmien perhe on Zi- _ ’vin ja Lempelin kehittämät LZ77 ja LZ78, ja niistä johdetut lukuisat variaatiot.A: The best-known dictionary-based family of compression methods is the LZ77 and LZ78 developed by Ziwin and Lempel, and numerous variations derived therefrom.

25 LZ77-menetelmässä tiivistettävässä datassa esiintyvä merkkijono korvataan viittauksella saman merkkijonon aiempaan esiintymään tiivistettävässä datassa. Sanakirjana toimii siten tiivistettävä data itse. LZ78-menetelmässä tiivistet-: tävän datan perusteella ylläpidetään sanakirjaa, ja mikäli tiivistettävässä da- tässä tulee vastaan merkkijono, joka jo esiintyy sanakirjassa, korvataan merk-!:, 30 kijono viittauksella sanakirjaan.25 In the LZ77 method, a string occurring in data to be compressed is replaced by a reference to a previous occurrence of the same string in the data to be compressed. The data to be compressed thus acts as a dictionary. In the LZ78 method, a dictionary is maintained based on the data to be compressed, and if the string to be compressed encounters a string already present in the dictionary, the character string is replaced by a reference to the dictionary.

Ehkä suurimman suosion nykyisten tiivistysmenetelmien toteutuk- * » sista on saavuttanut zip, ja sen johdannaiset kuten gzip ja pkzip. Näissä ohjel-: missä käytettävässä deflate-algoritmissa suoritetaan datalle ensin LZ77- tyylinen koodaus jossa on mukana RLE (Run Length Encoding), ja tämän tu-35 lokset tiivistetään edelleen Huffman-koodauksella.Perhaps the most popular implementation of current compression techniques has been zip, and its derivatives such as gzip and pkzip. In this software deflate algorithm, the data is first subjected to LZ77-style coding with Run Length Encoding (RLE), and the results of this are further compressed by Huffman coding.

2 1159372 115937

Sanakirjaan perustuvien menetelmien toimivuuteen vaikuttaa suuresti se miten sanakirja muodostetaan, miten sitä ylläpidetään, ja miten sitä käytetään. IPC-luokan H03M7 tiivistysmenetelmien patentit käsittelevät näitä asioita.The effectiveness of dictionary-based methods is greatly influenced by how the dictionary is constructed, maintained, and used. Patents for IPC Class H03M7 Sealing Methods address these issues.

5 Datan tiivistäminen on yleensä tarpeen silloin kun käsiteltävää tai siirrettävää dataa on paljon. Tästä syystä nykyiset tiivistysmenetelmät toimivat parhaiten nimenomaan suurten tietomäärien kanssa. On kuitenkin myös sellaista dataa joka koostuu suuresta määrästä pieniä alkioita. Tämä data voi olla sellaista, että jokainen alkio halutaan tiivistää erikseen eikä kaikkea dataa ko-10 konaisuutena. Syynä voi olla se, että jokaista alkiota halutaan käsitellä erikseen myös tiivistyksen jälkeen: jokaisen alkion on oltava tiivistettävissä ja purettavissa muista alkioista riippumatta. Tällaisia pieniä tietoalkioita ovat mm. WWW-palvelinten tapahtumatiedot, XML- ja WML-dokumentit, tele- ja tietoverkkojen laskutus- ja lokitiedot sekä tietojärjestelmien sisäiset ja väliset sa-15 nomat. Tyypillistä näille on se, että yhden tiedon koko on pieni, mutta tietoja on erittäin paljon. Jos tiivistämistä halutaan, sitä ei voi tehdä tiivistämällä kaikki tietoalkiot yhteen tiivistettyyn tiedostoon, vaan jokainen tietoalkio on tiivistettävä itsenäisesti muista irrallaan.5 Data compression is usually necessary when there is a large amount of data to be processed or transferred. For this reason, current compression methods work best with large amounts of data. However, there is also data consisting of a large number of small elements. This data may be such that each item is to be compressed separately and not all data as a whole. The reason may be that you want to treat each item separately after compression: each item must be able to be compressed and decompressed independently of the other items. Such small data elements include e.g. Web server event information, XML and WML documents, billing and logging information for telecommunication and data networks, and information within and between information systems. Typically, the size of one piece of information is small, but the amount of information is very large. If compression is desired, it cannot be done by compressing all data elements into a single compressed file, but each data element must be compressed independently of the others.

Nykyisin tunnetut tiivistysmenetelmät eivät sovellu hyvin pieni-20 kokoisten tietojen tiivistykseen. Jos tiivistettävää tietoa on vähän, ovat nykyiset tiivistysmenetelmät tehottomia. Dynaamiseen tai semistaattiseen sanakirjaan : perustuvassa tiivistyksessä sanakirja on alussa hyvin suppea eikä tiivistys siksi ole alussa kovin tehokasta. Tiivistettävää dataa pitäisi olla paljon jotta sanakir-.**·.: jasta saataisiin tarpeeksi laaja tehokkaaseen tiivistykseen. Staattista sanakir- 25 jaa käytettäessä sama sanakirja täytyy olla sekä tiedon tiivistäjän että purkajan käytettävissä. Joskus tämä edellyttää sanakirjan siirtämistä tai tallettamista tii-.···. vistetyn datan mukana. Jos tiivistettävää dataa on vähän, voi sanakirjan koko suhteessa tiivistettyyn dataan olla suuri, ja tiivistyksen kokonaistehokkuus siten . . pieni. Jos merkkikoodaus tehdään merkkien staattisella todennäköisyysmallilla, 30 täytyy kyseinen malli siirtää tiivistäjältä purkajalle. Pienellä tiivistettävän datan « » ‘*;·· määrällä tämä huonontaa tiivistystehokkuutta oleellisesti. Jos merkkikoodaus tehdään käyttäen adaptiivista mallia, kestää hetken ennen kuin malli ehtii adaptoitua tiivistettävän datan ominaisuuksiin. Tästä syystä adaptiivinen malli ei ehdi toimia kovin hyvin jos tiivistettävää dataa on vähän. Lisäksi monet, eri-* ; 35 tyisesti parhaan tiivistystehon omaavat menetelmät ovat monimutkaisia ja vaa tivat toimiakseen huomattavan määrän muistia ja laskentatehoa.The currently known compression methods are not suitable for compression of very small size data. If there is little information to compile, current compression methods are ineffective. Dynamic or semistatic dictionary: based on a dictionary, the dictionary is very tight at the beginning and therefore not very effective in the beginning. There should be a lot of data to be compressed so that the dictionary. ** ·: is wide enough for effective compression. When using a static dictionary, the same dictionary must be available to both the data compressor and the decoder. Sometimes this requires moving or saving the dictionary. ···. with the scrambled data. If there is little data to compress, the size of the dictionary relative to the compressed data may be large, and thus the overall efficiency of the compression. . small. If the character coding is done with a static probability model of the characters, 30 that model must be transferred from the compressor to the decoder. With a small amount of data to be compressed, the amount of compression is greatly reduced. If the character coding is done using an adaptive model, it will take some time for the model to adapt to the properties of the data to be compressed. For this reason, the adaptive model does not work very well if there is little data to compress. In addition, many *; 35 methods with the highest compression power are complicated and require a considerable amount of memory and computing power to operate.

3 1159373, 115937

Lyhyt selostusShort description

Keksinnön tavoitteena on tarjota parannettu menetelmä ja parannettu laite. Keksinnön eräänä puolena esitetään patenttivaatimuksen 1 mukainen menetelmä. Keksinnön eräänä puolena esitetään patenttivaatimuksen 9 mu-5 kainen tietokoneohjelma. Keksinnön eräänä puolena esitetään patenttivaatimuksen 10 mukainen tietokonemuisti. Keksinnön eräänä puolena esitetään patenttivaatimuksen 11 mukainen tietoliikennesignaali. Keksinnön eräänä puolena esitetään patenttivaatimuksen 12 mukainen laite. Keksinnön eräänä puolena esitetään patenttivaatimuksen 20 mukainen laite. Keksinnön eräänä puo-10 lena esitetään patenttivaatimuksen 28 mukainen menetelmä. Keksinnön eräänä puolena esitetään patenttivaatimuksen 33 mukainen tietokoneohjelma. Keksinnön eräänä puolena esitetään patenttivaatimuksen 34 mukainen tietokonemuisti. Keksinnön eräänä puolena esitetään patenttivaatimuksen 35 mukainen tietoliikennesignaali. Keksinnön eräänä puolena esitetään patenttivaa-15 timuksen 36 mukainen laite. Keksinnön eräänä puolena esitetään patenttivaatimuksen 41 mukainen laite. Keksinnön muut edulliset suoritusmuodot ovat epäitsenäisten patenttivaatimusten kohteena.An object of the invention is to provide an improved method and an improved device. In one aspect, the invention provides a method according to claim 1. In one aspect of the invention, there is provided a computer program according to claim 9. In one aspect, the invention provides a computer memory according to claim 10. In one aspect of the invention, there is provided a communication signal according to claim 11. In one aspect of the invention, there is provided an apparatus according to claim 12. In one aspect of the invention, there is provided an apparatus according to claim 20. In one aspect of the invention, there is provided a method according to claim 28. In one aspect, the invention provides a computer program according to claim 33. In one aspect, the invention provides a computer memory according to claim 34. In one aspect of the invention, there is provided a communication signal according to claim 35. In one aspect of the invention, there is provided a device according to claim 36. In one aspect of the invention, there is provided an apparatus according to claim 41. Other preferred embodiments of the invention are claimed in the dependent claims.

Keksintö perustuu staattisen sanakirjan käyttöön. Keksintö on tarkoitettu tilanteisiin jossa tiivistettävän datan koko on pieni, ja on olemassa sa-20 nakirja jonka sisältö ja rakenne ovat lähellä tiivistettävää dataa. Se on yksin-. , kertainen ja siksi helposti toteutettavissa myös laitteistoihin, joissa ei ole käy- ; tettävissä paljon muistia tai laskentatehoa. Yksinkertaisuudesta johtuen sekä : : tiivistäminen että purkaminen on nopeaa.The invention is based on the use of a static dictionary. The invention is intended for situations where the size of the data to be compressed is small, and there is a textbook having content and structure close to the data to be compressed. It's alone-. , fold and therefore easy to implement even for non-running equipment; a lot of memory or computing power available. For simplicity, both:: Compression and disassembly are fast.

V·: Keksinnössä tiivistettävää dataa ja staattista sanakirjaa verrataan !/·: 25 toisiinsa, ja tiivistyksen lopputuloksena on tieto tiivistettävän datan ja sanakir- ·:··: jän välisistä eroista. Koska tiivistäjällä ja purkajalla on molemmilla käytettävis- sään sama sanakirja, riittää, että pelkkä tieto tiivistettävän datan eroista sana- *»f kirjaan siirtyy tiivistäjältä purkajalle. Sanakirjana käytetään dataa, joka on ra-: .·. kenteeltaan ja sisällöltään mahdollisimman lähellä tiivistettävää dataa. Tiivis- 't". ‘t 30 tyksen aikana sanakirjaa ei muokata, vaan sitä käytetään staattisesti.V ·: In the invention, the data to be compressed and the static dictionary are compared! / ·: 25, and the result of the compression is the difference between the data to be compressed and the dictionary: ···. Since the compressor and the decoder both have the same dictionary available, it is sufficient that mere information about the differences in the data to be compressed in the word * »f book is passed from the compressor to the decoder. The dictionary uses data that is raw:. ·. in structure and content, as close as possible to the data to be compressed. During compaction, the dictionary is not modified but is used statically.

Yleensä sanakirjaan perustuvat tiivistysmenetelmät etsivät tiivistet-tävästä datasta sanoja, jotka voidaan korvata viittauksella kyseisen sanan : esiintymään sanakirjassa. Keksinnössä sanakirjaa ei jaeta sanoihin, vaan sen : ajatellaan edustavan sellaisenaan koko tiivistettävää dataa. Parhaassa tapa- ' 35 uksessa sanakirja todellakin vastaa täysin tiivistettävää dataa, eli ne ovat 4 115937 identtisiä, jolloin tiivistyksen lopputuloksena on tieto siitä, että tiivistettävä data löytyy sellaisenaan sanakirjasta.Generally, dictionary-based compression methods look up words in the data to be compressed, which can be replaced by a reference to the word: appear in the dictionary. In the invention, the dictionary is not divided into words but rather: it is thought to represent as such all the data to be compressed. In the best case scenario, the dictionary does indeed correspond fully to the data to be compressed, i.e., 4115937 identical, whereby the result of the compression is the knowledge that the data to be compressed is found as such in the dictionary.

Keksinnöllä aikaansaatu tiivistetty data ei sisällä samanlaisia viittauksia sanakirjaan kuin muiden sanakirjamenetelmien tiivistämä data. Sanakir-5 javiittauksen sijasta tässä ilmoitetaan monenko merkin matkalta tiivistettävä data ja sanakirja ovat identtisiä.The compressed data provided by the invention does not contain similar references to the dictionary as the data compressed by other dictionary methods. Instead of the word-5 and the reference, here is how many characters the data to be summed up along the journey and the dictionary are identical.

Usein tiivistettävä data ja sanakirja eivät kuitenkaan ole täysin identtisiä. Tällöin tiivistyksen lopputulos sisältää tiedon siitä, miltä osin tiivistettävä data ja sanakirja ovat identtisiä ja miltä osin eivät. Lisäksi talletetaan se osa tii-10 vietettävästä datasta, jota ei löydy sanakirjasta. Tiivistetty data on siis tietoa tiivistettävän datan eroista suhteessa sanakirjaan. Tässä mielessä kyse on kahden joukon erotuksesta.However, often the data to be compressed and the dictionary are not completely identical. In this case, the result of the compression includes information on the extent to which the data and dictionary to be compressed are identical and not. In addition, a portion of the data to be spent on tii-10 that is not found in the dictionary is stored. Thus, compressed data is information about differences between the data to be compressed and the dictionary. In this sense, it is a difference between two sets.

Keksinnön mukainen menettely on tunnettuja tiivistysmenetelmiä tehokkaampi ja nopeampi, jos tiivistettävä data vastaa rakenteeltaan ja sisällöl-15 tään sanakirjaa. Tiivistyksen aiempaa parempi tehokkuus johtuu siitä, että tiivistettävän datan ja sanakirjan identtisten ja erilaisten osien pituuksia ilmoittavat luvut osuvat tyypillisesti paljon pienemmälle arvovälille kuin perinteiset sa-nakirjaviittaukset ja ovat siksi erittäin tehokkaasti edelleen koodattavissa. Keksinnön nopeus johtuu siitä, että kahden datan välisten yhtäläisyyksien ja erojen 20 paikantaminen on huomattavasti yksinkertaisempi operaatio kuin perinteisissä tiivistysmenetelmissä suoritettava sanojen tunnistaminen ja haku sanakirjasta.The method according to the invention is more efficient and faster than known compression methods if the data to be compressed is similar in structure and content to the dictionary. The improved efficiency of the compression is due to the fact that the numbers indicating the lengths of identical and different parts of the data to be compressed and of the dictionary typically fall within a much smaller range of values than the traditional textbook references and are therefore very efficiently encodable. The speed of the invention is due to the fact that locating the similarities and differences between the two data is a much simpler operation than the word recognition and dictionary search performed in conventional compression methods.

• · ___Zip__Keksintö• · ___ Zip__Invention

Aineisto A Tiivistetty koko (tavua)__100__108_ (63790 tietoa, Nopeus (tietoa/sekunti) 2353 10174 ·;*· ä 527 tavua)____ .··*. Aineisto B Tiivistetty koko (tavua)__129__111_ (1462 tietoa, Nopeus (tietoa/sekunti) 1091 5315 . ä 1965 tavua)____ > I iMaterial A Compressed size (bytes) __ 100__108_ (63790 data, Speed (data / second) 2353 10174 ·; * · ä 527 bytes) ____. ·· *. Material B Compressed size (bytes) __ 129__111_ (1462 data, Speed (data / second) 1091 5315. 1965 bytes) ____> I i

Taulukko 1 ",'’ Keksinnön toteutusta on verrattu zip-ohjelman tiivistystehokkuuteen .,;; ’ 25 ja suoritusnopeuteen. Vertailussa käytetty zip-ohjelma perustuu zlib-kirjastoon, joka optimoitiin kaikilla käytettävissä olevilla keinoilla. Testiaineistona käytettiin _.'t . kahdesta eri mobiilipalvelusta saatua XML-muotoista aineistoa. Aineisto A si-Table 1 ", '' The implementation of the invention has been compared to the compression efficiency of the zip program., ;; '' 25 and the execution speed. The zip program used in the comparison is based on the zlib library, optimized by all available means. the resulting XML material.

» f I»F I

: sältää 63790 tietoa jossa yhden tiedon koko on keskimäärin 527 tavua, ja ai- • S * » * neisto B sisältää 1462 tietoa jossa yhden tiedon koko on keskimäärin 1965 ta- 5 115937 vua. Aineistojen jokainen tapahtuma tiivistettiin molemmilla menetelmillä. Tulokset on esitetty taulukossa 1, josta nähdään, että keksintö on tuottanut molemmilla testiaineistoilla verrattuna zip-algoritmiin suunnilleen saman tiivistystu-loksen, ja on ollut monta kertaa zip-algoritmia nopeampi suorittaa.: Contains 63790 data with an average data size of 527 bytes, and data set S * »* B contains 1462 data with an average data size of 1965 at 11565 bytes. Each event in the data was summarized by both methods. The results are shown in Table 1, which shows that the invention has yielded approximately the same compression result compared to the zip algorithm in both test data and has been many times faster than the zip algorithm.

5 Keksintöä erottaa muista sanakirjaa tai tietojen vertailua käyttävistä tiivistysmenetelmistä nimenomaan tapa, jolla tiivistettävän datan ja sanakirjan väliset erot ja yhtäläisyydet tunnistetaan, ja missä muodossa ne talletetaan tiivistettyyn dataan.The invention is distinguished from other compression methods using a dictionary or data comparison, namely, the manner in which the differences and similarities between the data to be compressed and the dictionary are identified and in what form they are stored in the compressed data.

Muissa sanakirjaa käyttävissä menetelmissä tiivistettävästä datasta 10 tehdään viittauksia sanakirjaan. Keksinnössä viittauksia ei tehdä, vaan niiden sijasta ilmoitetaan identtisten ja erilaisten osien pituudet. Osien pituudet lasketaan alkaen edellisen osan lopusta, jolloin pituuksien arvot tyypillisesti pysyvät hyvin rajatun alueen sisällä. Keksinnössä sanakirja tietyllä tavalla vähennetään tiivistettävästä datasta. Jäljelle jää vain se osa alkuperäisestä datasta, joka ei 15 vastannut sanakirjan rakennetta ja sisältöä.In other methods using the dictionary, the data to be compressed 10 is referred to in the dictionary. The invention does not make references, but instead denotes the lengths of identical and different parts. The lengths of the sections are calculated from the end of the previous section, whereby the values of the lengths typically remain within a well-defined area. In the invention, the dictionary is in some way subtracted from the data to be compressed. Only the part of the original data remaining which did not correspond to the structure and content of the dictionary remains.

Keksintöä voidaan käyttää sekä tietoliikenteessä että tietojenkäsittelyjärjestelmän sisällä, esimerkiksi tietokannan tai tiedoston talletusmenetelmä-nä. Menettelyllä voidaan tiivistää mitä tahansa dataa, mutta erityisesti menetelmä soveltuu XML-datan ja vastaavien rakenteellisten tietojen tiivistämiseen.The invention can be used both in telecommunications and within a data processing system, for example as a storage method for a database or file. The procedure can be used to compress any data, but especially the method is suitable for compressing XML data and similar structured data.

20 Kuvioluettelo : Keksinnön edulliset suoritusmuodot selostetaan esimerkinomaisesti ·:··: alla viitaten oheisiin piirroksiin, joista: kuvio 1 on lohkokaavio järjestelmästä, jonka muodostavat laite da-: tan tiivistämiseksi häviöttömästi ja laite tiivistetyn datan purkamiseksi häviöt- 25 tömästi; ... kuvio 2 on vuokaavio, joka havainnollistaa datan häviötöntä tiivistä- '·'* mismenetelmää; kuvio 3 on vuokaavio, joka havainnollistaa datan häviötöntä purka-: mismenetelmää; 30 kuvio 4 kuvaa kuvion 1 järjestelmän käyttöä tietokannan käsittelyyn; kuvio 5 kuvaa kuvion 1 järjestelmän käyttöä tietoliikennejärjestel- . , mässä.BRIEF DESCRIPTION OF THE DRAWINGS: Preferred embodiments of the invention will be described by way of example with reference to the accompanying drawings, in which: Figure 1 is a block diagram of a system comprising a device for lossless compression of data and a device for lossless compression; ... FIG. 2 is a flowchart illustrating a data lossless compression method; Fig. 3 is a flowchart illustrating a lossless data decompression method; Figure 4 illustrates the use of the system of Figure 1 for database processing; Fig. 5 illustrates the use of the system of Fig. 1 in a communication system. , mass.

Suoritusmuotojen kuvaus • Kuviossa 1 on esitetty järjestelmä, jossa käytetään datan häviötöntä 35 tiivistystä. Alkuperäinen tiivistettävä data 101 käsitellään tiivistäjässä 103. Tii- 6 115937 vistäminen tapahtuu käyttämällä sanakirjaa 102. Tuloksena tästä syntyy tiivistetty data 104. Se, miten paljon tiivistetyn datan koko on pienempi kuin alkuperäisen datan koko, riippuu siitä miten hyvin sanakirjan sisältö ja rakenne muistuttavat tiivistettävän datan sisältöä ja rakennetta.DESCRIPTION OF EMBODIMENTS • Figure 1 shows a system using lossless data compression 35. The original data to be compressed 101 is processed in the compressor 103. The compression is done using the dictionary 102. The result is compressed data 104. How much the size of the compressed data is smaller than the size of the original data depends on how well the dictionary contents and structure resemble the data content and structure.

5 Tiivistetyn datan 104 palauttaminen takaisin alkuperäiseen asuun 107, tapahtuu purkajassa 106. Purkamisessa 106 käytetään apuna sanakirjaa 105. Jotta purkaminen onnistuisi, on sanakirjan 105 oltava sama tai identtinen kuin tiivistyksessä 103 käytetty sanakirja 102. Koska kyseessä on häviötön tiivistysmenetelmä, on palautettu data 107 identtinen alkuperäisen tiivistettävän 10 datan 101 kanssa.5 Returning the compressed data 104 back to its original outfit 107 takes place in the decoder 106. The decryption 106 is assisted by the dictionary 105. For the decryption to be successful, the dictionary 105 must be the same or identical to the dictionary 102 used in the compression 103. Since this is a lossless compression method with the original 10 data to be compressed 101.

Tiivistäjän 103 ja purkajan 106 käyttämät sanakirjat 102 ja 105 voivat olla sama sanakirja tai saman sanakirjan kaksi eri kopiota. Sanakirja 105 voidaan toimittaa tiivistäjältä 103 purkajalle 106 tiivistetyn datan 104 yhteydessä tai ennen sitä.The dictionaries 102 and 105 used by the compressor 103 and the decompressor 106 may be the same dictionary or two different copies of the same dictionary. Dictionary 105 may be provided from compressor 103 to decoder 106 during or prior to compressed data 104.

15 Sanakirjan 102 ei tarvitse olla jokaiselle tiivistettävälle datalle 101 sama. Riittää että yksittäisen datan 101 tiivistyksessä 103 käytettävä sanakirja 102 on sama kuin kyseisen tiivistetyn datan 104 purkamisessa 106 käytettävä sanakirja 105. Seuraavan datan 101 tiivistyksessä 103 on mahdollista käyttää jotain muuta sanakirjaa 102. Mikäli kyseessä on viestinvälitysjärjestelmä, jossa 20 peräkkäiset viestit muistuttavat toisiaan, sanakirjana 102 ja 105 voidaan käyttää esimerkiksi edellistä välitettyä viestiä.15 Dictionary 102 need not be the same for every data 101 to be compressed. It is sufficient that the dictionary 102 used in the compression 103 of the single data 101 is the same as the dictionary 105 used in the decompression 106 of that compressed data 104. It is possible to use another dictionary 102 in the compression 103 of the next data 101. and 105, for example, the previous relayed message may be used.

j, j · Menetelmä datan tiivistämiseksi voidaan esittää formaalimmin mää- *:·*: rittelemällä d = data, : 25 r~ staattinen sanakirja, . [ „ · n = identtisten ja erilaisten osien lukumäärä, ... e, = datan osa, joka on identtinen staattisen sanakirjan kanssa, *“ di = datan osa, joka ei ole identtinen staattisen sanakirjan kanssa, ή = staattisen sanakirjan osa, joka on identtinen datan kanssa, :·· : 30 η = staattisen sanakirjan osa joka ei ole identtinen datan kanssa, :: l(x) = pituusfunktio, datan x pituus, .c(d,r) = tiivistysfunktio datalle d käyttäen staattista sanakirjaa r, ,···. d=ei+di+e2+d2+...+en+dn, jaj, j · A method for compressing data can be represented most formally: *: · * by sorting d = data,: 25 r ~ static dictionary,. ["· N = number of identical and different parts, ... e, = part of data identical to static dictionary, *" di = part of data not identical to static dictionary, ή = part of static dictionary identical to data: ··: 30 η = part of a static dictionary that is not identical to data: :: l (x) = length function, length of data x, .c (d, r) = compression function for data d using static dictionary r, , ···. d = ei + di + e2 + d2 + ... + en + dn, and

* I* I

T r=f1+r1+f2+r2+...+fn+rn.T r = f1 + r1 + f2 + r2 + ... + fn + rn.

"·: 35 Tällöin pätee: ei=fi, (1) 7 115937 l(ei)=l(f,),ja (2) di*n. (3)"·: 35 Then holds: no = fi, (1) 7 115937 l (no) = l (f,), and (2) di * n. (3)

Tiivistetty data voidaan esittää ainakin seuraavissa muodoissa: c(d,r)= l(ei)+... +l(en)+l(di)+... +l(dn)+l(r1)+... +l(rn)+di+.. .+dn, (4) 5 tai c(d,r)=(l(ei)+l(d1)+l(ri)+di)+...+(l(en)+!(dn)+l(r„)+d„). (5)The compressed data can be represented at least in the following forms: c (d, r) = l (no) + ... + l (en) + l (di) + ... + l (dn) + l (r1) + .. . + l (rn) + di + ... + dn, (4) 5 or c (d, r) = (l (ei) + l (d1) + l (ri) + di) + ... + ( l (a) +! (dn) + I (r ') + d'). (5)

Tilanne on siis se, että tiivistetty data käsittää ainakin yhden ensimmäisen tiedon datassa ja staattisessa sanakirjassa olevan ainakin yhden identtisen osan pituudesta, ainakin yhden toisen tiedon vain staattisessa sanakir-10 jassa olevan ainakin yhden osan pituudesta, ja ainakin yhden kolmannen tiedon vain datassa olevan ainakin yhden osan pituudesta ja kyseisen osan (jota ei siis löydy staattisesta sanakirjasta).Thus, the situation is that the compressed data comprises at least one first data length of at least one identical portion in the data and the static dictionary, at least one second information at least one portion of the length of the static dictionary only, and at least one third part length and that part (which is not found in the static dictionary).

Muodossa 4 tiivistettävän datan sanakirjasta poikkeavat osat on talletettu kaikki peräkkäin tiivistetyn datan loppuun, ja muodossa 5 ne on talletet-15 tu identtisten ja erilaisten osien pituuksia ilmoittavien lukujen väliin.In Form 4, portions of the data to be compressed are all stored at the end of the sequentially compressed data, and in Form 5, they are stored between identical and different numbers indicating the length of the parts.

Muoto 4 voi olla paremmin tiivistettävissä edelleen koska siinä eri tyyppisiä arvoja ei ole sekoitettu toisiinsa, vaan pituusdata (n, l(ej), l(d,) ja l(n)) on talletettu alkuun, ja erilainen data (d-,) loppuun. Homogeeninen data jossa datan vaihteluväli on pientä on paremmin tiivistettävissä kuin heterogeeninen 20 data. Muoto 5 mahdollistaa tiivistetyn datan lähettämisen eteenpäin sitä mukaan kun tiivistys etenee, eli koko dataa ei ole pakko käsitellä ennen kuin tiivis-- tettyä dataa voi alkaa lähettää.Form 4 may be better compacted further because different types of values are not mixed together, but longitudinal data (n, l (ej), l (d,) and l (n)) are stored at the beginning, and different data (d-,) out. Homogeneous data with a small range of data can be better compressed than heterogeneous data. The format 5 allows the compressed data to be forwarded as the compression proceeds, i.e., not all the data need to be processed before the compressed data can be transmitted.

*:**: Tiivistysalgoritmin toimintaa voi havainnollistaa seuraavalla esimer- :*·.· kiliä.*: **: The operation of the compression algorithm can be illustrated by the following example: * ·. · Kilos.

. ·. : 25 Olkoon tiivistettävä data: <tilaus><nimike>nito ja</nimike><hinta>20</hinta></tilaus> ... t Ja sitä vastaava staattinen sanakirja: <tilaus><nimike>sakset</nimikexhinta>33.50</hintax/tilaus>. ·. : 25 Let the data to be compressed: <subscription> <name> nito and </nimike> <price> 20 </h Price> </subscribe> ... t And the corresponding static dictionary: <subscription> <name> scissors </namexprice > 33.50 </ hintax / order>

Ensimmäisen yhteisen osan pituus on kuusitoista merkkiä: l(ei) = : 30 16. Jatkossa alleviivauksella merkitään aina tarkastelun kohteena olevaa osaa sekä ensimmäisenä kuvattavasta datasta että toisena kuvattavasta sanakirjas-• · ta.The length of the first common part is sixteen characters: l (no) =: 30 16. In the following, the underscore always represents the part under review, both in the data to be first described and in the dictionary to be described.

.' ”, <tilausxnimike>nitoja</nimikexhinta>20</hintax/tilaus> <tilausXnimike>sakset</nimike><hinta>33.50</hintaX/tilaus> '· : 35 Tiivistettävän datan ensimmäisen erilaisen osan pituus on kuusi : “· merkkiä: l(di) = 6, ja erilainen merkkijono di = “nitoja”.. ' ", <orderxname> stapler </nimikexprice> 20 </ pricex / order> <orderXname> scissors </ name> <price> 33.50 </ priceX / order> '·: 35 The first different part of the data to be compressed is six:“ · characters: l (di) = 6, and a different string di = "stapler".

8 115937 <tilausxnimike>nitoja</nimikexhinta>20</hintax/tilaus> <tilaus><nimike>sakset</nimike><hinta>33.50</hinta></tilaus>8 115937 <orderxname> stapler </nimikexprice> 20 </ pricex / order> <order><name> scissors</name><price>33.50</price> </order>

Sanakirjan ensimmäisen erilaisen osan pituus on kuusi merkkiä: l(n) = 6.The first different part of the dictionary is six characters long: l (n) = 6.

5 <tilausxnimike>nitoja</nimikexhinta>20</hintax/tilaus> <tilaus><nimike>sakset</nimike><hinta>33.50</hintaX/tilaus>5 <orderxname> stapler </nimikexprice> 20 </ pricex / order> <order><name> scissors</name> <price> 33.50 </ priceX / order>

Toisen yhteisen osan pituus on kuusitoista merkkiä: I(e2) = 16.The second common part has a length of sixteen characters: I (e2) = 16.

<tilaus><nimike>nito ja</nimikeXhinta>20</hinta></tilau5> <tilaus><nimike>sakset</nimike><hinta>33.50</hintaX/tilaus> 10 Tiivistettävän datan toisen erilaisen osan pituus on yksi merkki: I(d2) = 1, ja erilainen merkkijono d2 = “2”.<order> <name> nito and </nimikeXh Price> 20 </home> </tilau5> <order><name> scissors</name> <price> 33.50 </priceX / order> 10 The length of the second different piece of data to be compressed is one character: I (d2) = 1, and a different character string d2 = “2”.

<tilaus><nimike>nitoja</nimike><hinta>20</hintax/tilaus> <tilausXnimike>sakset</nimikeXhinta>33.50</hintaX/tilaus><order><name> stapler </name> <price> 20 </ pricex / order> <orderXname> scissors </name_Price> 33.50 </ priceX / order>

Sanakirjan toisen erilaisen osan pituus on neljä merkkiä: l(r2) = 4.The other part of the dictionary is four characters long: l (r2) = 4.

15 <tilaus><nimike>nito ja</nimike><hinta>20</hintaX/tilaus> <tilaus><nimike>sakset</nimike><hinta>33.50</hintaX/tilaus>15 <order> <name> staple and </name> <price> 20 </ priceX / order> <order><name> scissors </ name> <price> 33.50 </ priceX / order>

Kolmannen yhteisen osan pituus on kahdeksantoista merkkiä: l(e3) = 18.The third common portion is eighteen characters long: l (e3) = 18.

<tilaus><nimike>nitoja</nimikexhinta>20</hinta></tilaus> 20 <tilaus><nimike>sakset</nimikexhinta>33.50</hintax/tilaus><order><name> stapler </name> price </20>/price> </order> 20 <order> <name> scissors </name> price 33.50 </ pricex / order>

Tiivistettävän datan kolmannen erilaisen osan pituus on nolla merk-; : kiä: l(d3) = 0, ja erilainen merkkijono d3 on tyhjä.The length of the third different portion of the data to be compressed is zero characters; : k: l (d3) = 0, and the different string d3 is empty.

... : Sanakirjan kolmannen erilaisen osan pituus on nolla merkkiä: l(r3) = °- ! 25 Jos tiivistetty data talletetaan formaatissa ; : c(d,r) = 0+1(0!)+...-H(en)+I(di)+. +l(dn)+l(ri)+... +l(rn)+d1+...+dn ‘,, ‘ tulee tiivistetyksi dataksi tässä esimerkissä :··*: 3, 16, 6, 6, 16, 1, 4, 18, 0, 0, ‘n’, T, 'f, ‘o’, ‘j’. ‘a’, ‘3’, ‘3’,’5’...: The third different part of the dictionary is zero characters: l (r3) = ° -! 25 If compressed data is stored in a format; : c (d, r) = 0 + 1 (0!) + ...- H (en) + I (di) +. + l (dn) + l (ri) + ... + l (rn) + d1 + ... + dn ',,' becomes condensed data in this example: ·· *: 3, 16, 6, 6, 16, 1, 4, 18, 0, 0, 'n', T, 'f,' o ',' j '. 'A', '3', '3', '5'

Alkuperäisen tiivistettävän datan koko esimerkissä on 57 merkkiä, ja :. i : 30 tiivistetyn datan koko 20 merkkiä.In the example, the original data to be compressed has a size of 57 characters and:. i: Size of 30 compressed data, 20 characters.

L./ Kuvio 2 on vuokaavio esittäen häviötöntä menetelmää datan tiivis- ‘ , tämiseksi. Menetelmän suorittaminen aloitetaan lohkossa 200. Lohkossa 201 _ laskevaan datan ja staattisen sanakirjan alussa olevien samojen merkkien määrä. Tämä tehdään aloittamalla sekä data että staattinen sanakirja ensim-35 mäisestä merkistä ja vertaamalla merkkejä keskenään. Mikäli merkit ovat sama merkki, siirrytään tarkastelemaan sekä datan että staattisen sanakirjan 9 115937 seuraavaa merkkiä. Vertailu lopetetaan kun vastaan tulee ensimmäinen kohta, jossa merkit eivät ole sama merkki. Tästä laskutavasta seuraa, että sama määrä merkkejä edetään sekä datassa että sanakirjassa. Identtisten merkkien lukumäärä merkitään l(ej).L. / Figure 2 is a flowchart illustrating a lossless method for filling data in a compact manner. Execution of the method begins in block 200. In block 201, the number of the same characters at the beginning of the descending data and the static dictionary. This is done by starting with both the data and the static dictionary of the first 35 characters and comparing the characters with each other. If the characters are the same character, we move on to the next characters in both the data and the static dictionary 9 115937. The comparison ends when the first point where the characters are not the same character is encountered. It follows from this calculation method that the same number of characters is propagated in both the data and the dictionary. The number of identical characters is denoted by l (ej).

5 Tämän jälkeen lohkossa 202 etsitään seuraava kohta, jossa data d ja staattinen sanakirja r vastaavat taas toisiaan. Seuraava kohta, jossa data ja sanakirja vastaavat toisiaan ei välttämättä ole samassa paikassa kummassakin datassa. Eräässä suoritusmuodossa kyseinen kohta on sellainen, että siitä alkaa ensimmäinen merkkijono, joka on riittävän pitkä ja joka esiintyy mahdolli-10 simman aikaisin sekä datassa että staattisessa sanakirjassa. Riittävän pitkä merkkijono tässä tarkoittaa sitä, että merkkijonon on oltava vähintään yhtä pitkä kuin pituuksien Ι(β,), l(dj) ja Ι(η) esittämiseen tarvittava muistin määrä on. Jos identtinen merkkijono on tätä lyhyempi, ei sitä kannata korvata staattisen sanakirjan käytöllä, koska hyppy staattiseen sanakirjaan vie korvattavaa merkki-15 jonoa enemmän tilaa.Then, in block 202, the next point is searched where the data d and the static dictionary r again match. The next point where the data and the dictionary match each other may not be in the same place in both data. In one embodiment, the passage is such that it begins with a first string that is long enough and appears as early as possible in both the data and the static dictionary. A sufficiently long string here means that the string must be at least as long as the amount of memory needed to represent the lengths Ι (β,), l (dj) and Ι (η). If the identical string is shorter than this, you should not replace it with a static dictionary, as jumping to a static dictionary takes up more space than the character-15 string being replaced.

Kun seuraava identtinen kohta on paikallistettu sekä datasta että staattisesta sanakirjasta, lasketaan lohkoissa 203 ja 204 monenko merkin päässä kyseinen identtinen kohta on datassa ja staattisessa sanakirjassa. On huomattava että luku l(dj) voi olla nolla ja/tai luku Ι(η) voi olla nolla.When the next identical point is located in both the data and the static dictionary, in blocks 203 and 204, how many characters at the end of that identical point is in the data and the static dictionary. Note that l (dj) may be zero and / or Ι (η) may be zero.

20 Eräässä suoritusmuodossa lasketut pituudet ja tiivistettävän datan erilaiset merkit voidaan tallettaa sellaisenaan tai tiivistää tai koodata edelleen \ll jollain toisella tiivistysmenetelmällä lohkossa 205. Esimerkiksi adaptiivinen •: · · · aritmeettinen koodaus on tähän tarkoitukseen sopiva menetelmä.In one embodiment, the calculated lengths and various characters of data to be compressed may be stored as such or compressed or further encoded by another compression method in block 205. For example, adaptive arithmetic coding is a suitable method for this purpose.

Tämän jälkeen lohkossa 206 testataan onko tiivistettävää dataa vie-.·. : 25 lä käsittelemättä. Jos dataa on käsittelemättä palataan lohkoon 201. Muussa * tapauksessa tiivistys on suoritettu loppuun ja siirrytään lohkoon 207, jossa lo- ... petetään tiivistämismenetelmän suorittaminen.Thereafter, block 206 tests whether data to be compressed is exported. : 25 untreated. If the data has not been processed, it returns to block 201. Otherwise, the compression is complete and proceeds to block 207, where ... the compression method is terminated.

*" Mahdollisimman tehokkaan tiivistyksen saavuttamiseksi lohkossa 202 on syytä etsiä mahdollisimman lähellä oleva paikka, jossa tiivistettävä data · 30 ja sanakirja taas vastaavat toisiaan. Tämän hakualgoritmin nopeutta on mah- »> t dollista parantaa huomattavasti tinkimällä sen osumatarkkuudesta. Eli haetaan melkein ensimmäinen kohta, mutta paljon nopeammin. Tämä antaa menetel-, · · , män toteuttajalle ja käyttäjälle mahdollisuuden tasapainoilla tiivistystehon ja suoritusnopeuden välillä ja painottaa sitä ominaisuutta, joka kulloinkin on tär-. : 35 keämpi.* "In order to achieve the most efficient compression in block 202, it is advisable to find the closest possible location where the data to be compressed · 30 and the dictionary match again. It is possible to greatly improve the speed of this search algorithm by compromising its hit accuracy. much faster, which allows the implementer and the user of the process to balance between compression power and execution speed, and emphasizes the feature that is more important in each case: 35.

» 10 115937»10 115937

Eräässä suoritusmuodossa data koostuu kahdeksan bitin merkeistä, jolloin on edullista rajoittaa myös pituudet l(ej), l(d,) ja l(n) sellaisiksi että ne voidaan aina esittää kahdeksalla bitillä. Tämä voidaan tehdä laskemalla pituudet alunperin välittämättä kahdeksan bitin rajoituksesta, mutta jakaa saadut pituu-5 det useampaan osaan siten että yksikään pituus ei ylitä kahdeksan bitin rajaa. Jos pituudet ovat esimerkiksi l(e5)=300, l(d5)=10 ja l(r5)=12, tämä voidaan jakaa kahteen osaan seuraavasti: l(e5)=255, l(d5)=0, l(r5)=12 ja Ι(β6)=45, l(d6)=10, l(r6)=0.In one embodiment, the data consists of eight bit characters, whereby it is also advantageous to limit the lengths l (ej), l (d,) and l (n) such that they can always be represented by eight bits. This can be done by initially counting the lengths, disregarding the eight-bit limit, but dividing the resulting lengths into 5 sections so that no length exceeds the eight-bit limit. For example, if the lengths are l (e5) = 300, l (d5) = 10, and l (r5) = 12, this can be divided into two parts: l (e5) = 255, l (d5) = 0, l (r5) = 12 and Ι (β6) = 45, l (d6) = 10, l (r6) = 0.

Seuraavaksi tarkastellaan kuviossa 3 esitettyä vuokaaviota tiiviste-10 tyn datan palauttamiseksi alkuperäiseen asuun. Häviötön menetelmä tiivistetyn datan purkamiseksi aloitetaan lohkossa 300.Next, the flowchart shown in Figure 3 for returning the compressed data to its original outline will be considered. The lossless method for decompressing the compressed data is started in block 300.

Lohkossa 301 tiivistetystä datasta luetaan ensimmäinen osa, joka käsittää pituudet l(eO, l(di) ja Ι(η) sekä datan dj. Mahdollisesti käytetty edelleen koodaus tai tiivistäminen puretaan.In block 301, a first portion comprising the lengths l (e0, l (di) and Ι (η) and the data dj is read from the compressed data. Any further coding or compression is used.

15 Lohkossa 302 staattisesta sanakirjasta palautetaan l(ej) merkin ver ran merkkejä. Tiivistyksen yhteydessä laskettiin lohkossa 201, että tässä kohdassa on l(ej) verran samoja merkkejä sekä tiivistettävässä datassa että staattisessa sanakirjassa. Tästä syystä, vaikka merkit otetaan palautuksessa 302 staattisesta sanakirjasta, saadaan lopputuloksena samat merkit, jotka esiintyi-20 vät alkuperäisessä tiivistettävässä datassa. Tässä yhteydessä staattisessa sanakirjassa siirrytään l(ej) merkkiä eteenpäin.In block 302, the static dictionary returns characters of l (ej) characters. In compression, it was calculated in block 201 that at this point there are 1 (ej) characters in the data to be compressed and in the static dictionary. Therefore, even though the characters are retrieved from the 302 static dictionary, the result is the same characters that appeared in the original data to be compressed. In this context, the static dictionary moves l (ej) characters forward.

• ; ‘; Seuraavaksi lohkossa 303 palautetaan se osa alkuperäisestä datas- ta, joka ei ollut identtinen staattisen sanakirjan kanssa. Kyseinen data on talle-: tettu tiivistettyyn dataan ja voidaan lukea ja palauttaa sieltä. Mahdollinen edel- ,·, S 25 leen koodaus tai tiivistys on purettu jo lohkossa 301.•; '; Next, in block 303, that portion of the original data that was not identical to the static dictionary is restored. The data in question is stored in compressed data and can be read and recovered from there. Any previous, ·, S 25 encoding or compression has already been unblocked in block 301.

] *, Lohkossa 304 staattisessa sanakirjassa siirrytään eteenpäin l(n) merkin verran, jotta päästään staattisessa sanakirjassa kohtaan, josta alkaa ' · * · ’ seuraava identtinen osuus.] *, In block 304, the static dictionary moves forward by 1 (n) characters to get to the point in the static dictionary that begins with '· * ·' the next identical section.

Lohkossa 305 testataan, onko kaikki tiivistetty data jo käsitelty. JosBlock 305 tests whether all the compressed data has already been processed. If

• I• I

:,! : 30 kaikkea tiivistettyä dataa ei vielä ole käsitelty, jatketaan tiivistetyn datan pur- • I * ·,,,’ kamista palaamalla lohkoon 301. Muussa tapauksessa purkaminen on ! , suoritettu loppuun, jolloin lohkosta 305 siirrytään lohkoon 306, jossa lopetetaan ' l 1 purkamismenetelmän suorittaminen.:,! : 30 all the compressed data has not been processed yet, the decompression of the compressed data will continue • I * · ,,, 'returning to block 301. Otherwise, the decompression is! , completed to move from block 305 to block 306, where the execution of the '11 unloading method is stopped.

‘;' Kuviossa 4 on kuvattu esimerkki järjestelmästä, jossa tietokannassa :: 35 oleva data on tiivistetty. Tiivistyksen tavoitteena tässä on pienentää tarvittavan : i tietokannan kokoa. Saman tietomäärän tallettamiseen tiivistettynä riittää pie- 11 115937 nempi määrä levyjä tai pienempi tietokantapalvelin. Koko järjestelmä sijaitsee yhdessä tietokonelaitteistossa 401. Sovellusohjelma 402 voi lukea ja kirjoittaa tietokannassa 403 olevaa tietoa. Kirjoitus tietokantaan tapahtuu siten, että alkuperäinen data 101 ohjataan tiivistäjän 103 kautta, jonka jälkeen tiivistetty da-5 ta 104 kirjoitetaan tietokantaan 403. Lukeminen tietokannasta 403 tapahtuu siten että tiivistetty data 104 luetaan tietokannasta 403, ohjataan purkajalle 106, ja lopuksi palautetaan alkuperäiseen muotoon palautettu data 107 sovellukselle 402.';' Figure 4 illustrates an example of a system where data in database :: 35 is compressed. The purpose of the compression here is to reduce the size of the required: i database. A smaller number of disks or a smaller database server is sufficient to store the same amount of data when compressed. The entire system resides on a single computer hardware 401. The application program 402 can read and write information contained in the database 403. Writing to the database is done by directing the original data 101 through compressor 103, after which compressed da-5 104 is written to database 403. Reading from database 403 is done by reading compressed data 104 from database 403, redirecting to decoder 106, and finally returning the data 107 for application 402.

Sovellus 402 käsittelee vain alkuperäistä dataa 101 ja alkuperäi-10 seen muotoon palautettua dataa 107. Tietokanta 403 puolestaan käsittelee vain tiivistettyä dataa 104.The application 402 only processes the original data 101 and the data 107. The database 403 only processes the compressed data 104.

Yhden sovelluksen 402 sijasta järjestelmässä voi olla myös useita sovelluksia. Tietokanta 403 voi sijaita samassa laitteistossa kuin sovellus 402, tai myös jossain muussa laitteistossa. Tiivistäjä 103 ja purkaja 106 voidaan to-15 teuttaa laitteistoon 401 joko erillisinä prosesseina, yhdistettynä tiivistys- ja purkuprosessina, tai osana sovellusta 402.Instead of a single application 402, the system may also have multiple applications. The database 403 may be located on the same hardware as the application 402, or on another hardware. Compactor 103 and decompressor 106 may be implemented in apparatus 401 either as separate processes, combined as a compaction and decompression process, or as part of application 402.

Kuviossa 5 on kuvattu esimerkki, jossa tiivistystä käytetään kahden järjestelmän välisessä tietoliikenteessä. Järjestelmässä 501 oleva sovellus 502 lähettää dataa sisältävän viestin 101 järjestelmässä 503 olevalle sovellukselle 20 504. Ennen lähetystä järjestelmään 503, viesti 101 menee tiivistäjälle 103, jos sa viesti 101 tiivistetään tiivistetyksi dataksi 104, joka sitten välitetään järjes-: : teimään 503. Purkaja 106 saa tiivistetyn viestin 104 ja palauttaa sen alkuperäi- ':: seen muotoon 107 ennen toimittamista sovellukselle 504.Figure 5 illustrates an example in which compression is used for communication between two systems. The application 502 in system 501 transmits a data message 101 to the application 20 504 in system 503. Prior to transmission to system 503, the message 101 goes to compressor 103 if that message 101 is compressed to compressed data 104, which is then passed to system 503. compressed message 104 and returns it to its original format 107 before being forwarded to application 504.

:.*·· Kuvan 5 mukaisella järjestelyllä sovellukset 502 ja 504 voivat käsi- 25 teliä alkuperäisiä viestejä 101 tai alkuperäiseen muotoon palautettuja viestejä ·:·· 107. Järjestelmien 501 ja 503 välillä liikkuu vain tiivistettyä dataa 104. Tiivistä- .*·*. misen tavoitteena tällaisessa järjestelyssä on kasvattaa järjestelmien välisen tietoliikennekanavan todellista hyötykuormakapasiteettia, tai saavuttaa kustan-. . nussäästöjä, koska siirrettävää dataa on vähemmän. Tietenkin on selvää, että 30 kuvion 5 esimerkissä järjestelmä 501 voi sisältää purkajan ja järjestelmä 503 tiivistäjän, jolloin tiivistettyä dataa voidaan lähettää järjestelmien 501, 503 välil-lä molempiin suuntiin. Tiivistetty data 104 voidaan vielä kanavakoodata erilai-silla tunnetuilla kanavakoodausmenetelmillä, jotta kanavassa, esimerkiksi ra-. diokanavassa, aiheutuvat tiedonsiirtovirheet voidaan havaita ja mahdollisesti • ; 35 myös korjata. Kyseisiä kanavakoodaukseen ja yleensäkin lähetys- ja vastaan- 12 115937 ottotekniikkaan tarvittavia laitteita ei ole kuviossa 5 kuvattu, koska ne ovat alan ammattilaisten yleisesti tuntemia.:. * ·· By the arrangement shown in Figure 5, applications 502 and 504 can process the original messages 101 or messages returned to their original format ·: ·· 107. Only compressed data 104. is moved between systems 501 and 503. * · *. The purpose of such an arrangement is to increase the actual payload capacity of the inter-system communication channel, or to achieve cost. . because there is less data to be transferred. Of course, it is clear that in the example of Figure 5, system 501 may include a decompressor and system 503 a compressor, whereby compressed data can be transmitted between systems 501, 503 in both directions. The compressed data 104 may further be channel coded by various known channel coding methods to provide a channel, e.g. in the di channel, the resulting communication errors can be detected and possibly •; 35 also fix. Such devices required for channel coding and in general for transmission and reception techniques are not illustrated in Figure 5 as they are well known to those skilled in the art.

Kuvioissa 1, 4 ja 5 kuvattu laite datan tiivistämiseksi häviöttömästi, eli tiivistäjä 103 on konfiguroitu muodostamaan tiivistetty data vertaamalla da-5 taa staattiseen sanakirjaan. Eräässä suoritusmuodossa laite 103 on konfiguroitu laskemaan monenko merkin matkalta tiivistettävä data ja sanakirja ovat identtisiä; laskemaan montako seuraavaa merkkiä tiivistettävässä datassa poikkeaa sanakirjan merkeistä; laskemaan montako seuraavaa merkkiä sanakirjassa poikkeaa tiivistettävän datan merkeistä; ja toistamaan laskentaa kun-10 nes kaikki datan merkit on käsitelty. Voidaan myös sanoa, että laite 103 käsittää välineet suorittaa mainitut toiminnot datan tiivistämiseksi.The device for lossless compression of data illustrated in Figures 1, 4 and 5, i.e., compressor 103 is configured to generate compressed data by comparing da-5 to a static dictionary. In one embodiment, the device 103 is configured to count the plurality of character data to be compressed and the dictionary to be identical; calculate how many following characters in the data to be compressed deviate from dictionary characters; calculate how many subsequent characters in the dictionary deviate from the characters of the data to be compressed; and repeat the computation when all the characters in the data have been processed. It can also be said that the device 103 comprises means for performing said functions to compress the data.

Kuvioissa 1, 4, ja 5 kuvattu laite tiivistetyn datan häviöttömästi, eli purkaja 106 on konfiguroitu muodostamaan data vertaamalla tiivistettyä dataa staattiseen sanakirjaan. Eräässä suoritusmuodossa purkamiseksi laite 106 on 15 konfiguroitu: palauttamaan staattisesta sanakirjasta ensimmäisen tiedon pituuden verran merkkejä; palauttamaan tiivistetystä datasta kolmannen tiedon pituuden verran merkkejä; siirtymään staattisessa sanakirjassa eteenpäin toisen tiedon pituuden verran merkkejä; ja toistamaan edellisiä toimenpiteitä kunnes kaikki tiivistetyn datan merkit on käsitelty. Voidaan myös sanoa, että laite 106 20 käsittää välineet suorittaa mainitut toiminnot tiivistetyn datan purkamiseksi.1, 4, and 5, the decompressor 106 is configured to generate data by comparing the compressed data with a static dictionary. In one embodiment, the decoder 106 is configured to: retrieve from the static dictionary the first data length by characters; return characters from the compressed data for a third length of data; move forward in the static dictionary by the length of the second information; and repeat the previous steps until all characters of the compressed data have been processed. It can also be said that the device 106 20 comprises means for performing said functions for decompressing compressed data.

Sekä tiivistäjän 103 että purkajan 106 toiminnalle on olennaista, että i tiivistetty data käsittää ainakin yhden ensimmäisen tiedon datassa ja staatti- sessa sanakirjassa olevan ainakin yhden identtisen osan pituudesta, ainakin :\i yhden toisen tiedon vain staattisessa sanakirjassa olevan ainakin yhden osan 25 pituudesta, ainakin yhden kolmannen tiedon vain datassa olevan ainakin yh-....: den osan pituudesta ja kyseisen osan.It is essential for the operation of both the compressor 103 and the decompressor 106 that the compressed data comprises at least one first information on the length of at least one identical part in the data and the static dictionary, at least: one third of the length of at least one -.... of the data only in the data and that portion.

.··, Sekä tiivistäjän 103 että purkajan 1056 edellyttämät toiminnot voi daan toteuttaa mikroprosessorilla tarvittavine varus- ja sovellusohjelmistoi-. , neen. Kuvattu konfigurointi ja mainitut välineet voidaan toteuttaa prosessorissa 30 toimivana tietokoneohjelmana, jolloin esimerkiksi kukin tarvittava toiminto to- • * ’·;· teutetaan omana ohjelmamodulina. Tietokoneohjelma siis sisältää rutiinit me- netelmän vaiheiden toteuttamiseksi. Tietokoneohjelman myyntiä varten se voi- ; ; daan tallentaa tietokonemuistille esimerkiksi CD-ROM:ille (Compact Disc Read··, The functions required by both the compressor 103 and the decompressor 1056 can be implemented by the microprocessor with the necessary firmware and application software. , no. The described configuration and the said means can be implemented as a computer program running in the processor 30, whereby, for example, each required function is implemented as a separate program module. Thus, the computer program includes routines for implementing the steps of the method. For the sale of a computer program, it may; ; It can be stored in computer memory, for example, on a Compact Disc Read

Only Memory). Toinen keino tietokoneohjelman myymiseksi on sijoittaa se tie-• ; 35 toliikennesignaaliin, joka on ladattavissa palvelimesta esimerkiksi Internetin : : ylitse laitteeseen 103, 106. Tietokoneohjelma voidaan suunnitella siten, että se 13 115937 toimii normaalissa yleiskäyttöisessä henkilökohtaisessa tietokoneessa, kannettavassa tietokoneessa, tietokoneverkon palvelimessa, muussa tunnetun tekniikan mukaisessa tietokoneessa tai laitteessa, johon on yhdistetty sekä tietokoneen että matkapuhelimen ominaisuuksia.Only Memory). Another way to sell a computer program is to invest it in the •; 35 to a telecommunication signal which can be downloaded from a server, for example to the Internet:: over the device 103, 106. The computer program may be designed to run on a normal general purpose personal computer, laptop, computer network server, other prior art computer or device The computer that the mobile phone features.

5 Kuvattu konfigurointi ja välineet voidaan toteuttaa myös laitteistorat kaisuna, esimerkiksi yhtenä tai useampana sovelluskohtaisena integroituna piirinä (Application Specific Integrated Circuit, ASIC) tai erilliskomponenteista rakennettuna toimintalogiikkana. Alan ammattilainen huomioi toteutustavan valinnassa esimerkiksi laitteen koolle ja virrankulutukselle asetetut vaatimukset, 10 tarvittavan prosessointitehon, valmistuskustannukset sekä tuotantomäärät.The configuration and means described may also be implemented as hardware logic, for example as one or more Application Specific Integrated Circuits (ASICs) or as operating logic built from discrete components. The skilled artisan will consider, for example, the size and power requirements of the device, the processing power required, the manufacturing cost, and the production volume.

Myös erilaiset ohjelmiston ja laitteiston muodostamat hybriditoteu-tukset ovat mahdollisia. Laitetta 103, 106 voidaan muunnella käyttäen oheisten patenttivaatimusten mukaisia edullisia suoritusmuotoja, jotka on edellä kuvattu menetelmän yhteydessä, joten niitä ei tässä enää toisteta.Various software and hardware hybrid implementations are also possible. The device 103, 106 may be modified using the preferred embodiments of the appended claims described above in connection with the process, so that they will not be repeated here.

15 Eräässä suoritusmuodossa kuvattu datan tiivistäminen ja purkami nen toteutetaan siten, että staattinen sanakirja muodostuu vain yhdestä tietorakenteesta. Tietorakenne voi olla jokin seuraavista: WWW-palvelimen tapahtumatieto, XML-dokumentti, WML-dokumentti, tietoliikenneverkon laskutustieto, tietoliikenneverkon lokitieto, tietojärjestelmän sisäinen sanoma, tietojärjes-20 telmien välinen sanoma, tai jokin muu tietue.In one embodiment, the data compression and decomposition described is implemented such that a static dictionary is composed of only one data structure. The data structure may be any of the following: web server event information, XML document, WML document, communication billing information, communication log information, information system internal, information between systems, or other record.

Vaikka keksintöä on edellä selostettu viitaten oheisten piirustusten !,: i mukaiseen esimerkkiin, on selvää, ettei keksintö ole rajoittunut siihen, vaan si- tä voidaan muunnella monin tavoin oheisten patenttivaatimusten esittämän keksinnöllisen ajatuksen puitteissa.Although the invention has been described above with reference to the example of the accompanying drawings, it will be understood that the invention is not limited thereto, but can be modified in many ways within the scope of the inventive idea set forth in the appended claims.

• · » • * > · · * * · ·• · »• *> · · * * · ·

• I• I

> ' *> '*

> I> I

Claims

115937

A lossless method for compressing data, the method comprising generating the compressed data by comparing the data to a static dictionary, characterized in that the compressed data comprises at least one first data length of at least one identical portion in the data and a static dictionary, and at least one third of the length of at least one part of the data and that portion.

A method according to claim 1, characterized in that in comparing: calculating (201) the plurality of character data to be compressed and the static dictionary are identical; calculating (203) how many subsequent characters in the data to be compressed deviate from the static dictionary characters; Calculating (204) the number of following characters in the static dictionary that deviate from the characters of the data to be compressed; and repeating the previous steps (201, 203, 204) until all data characters have been processed.

Method according to one of the preceding claims, characterized in that the static dictionary consists of only one data structure; tea.

A method according to claim 3, characterized in that the data structure is one of the following: web server event information, XML; document, WML document, telecommunication network billing information, telecommunication *: 25 network log information, information system internal message, inter-information system ': *': message, other record.

: ... · 5. A method according to any one of the preceding claims, characterized in that: • d = data, • t · *. · * ·. 30 r = static dictionary, \ n = number of identical and different parts, ·· 'e, = part of data identical to static dictionary, :: d, = part of data not identical with static dictionary,;'. t: fj = part of a static dictionary identical to data,., ..: 35 η = part of a static dictionary not identical to data, l (x) = length function, length of data x, 115937 c (d, r ) = compression function for data d using static dictionary r, d = e1 + di + e2 + d2 + ... + en + dn, r = fi + ri + f2 + r2 + ... + fn + rn, epfi, 5 l (not ) = l (fi), d ^ n, and the compressed data c (d, r) = l (no) + ... + l (en) + l (di) + ... + l (dn) + l (ri) + ... + l (rn) + di + ... + dn or 10 c (d, r) = (l (no) + l (di) + l (ri) + d1) + ... + (l (a) + I (d n) + I (r n) + d n).

Method according to claim 5, characterized in that the identical part must be at least as long as the amount of memory needed to represent the lengths l (ej), l (dj) and l (n).

A method according to claim 5, characterized in that, if the data to be compressed consists of eight bits, the lengths Ι (β}), l (dj) and l (n) are limited so that they can be represented by eight bits.

Method according to one of the preceding claims, characterized in that the compressed data is partially or completely compressed by at least one other compression method.

A computer program encoding a computer process for lossless compression of data, the computer process comprising: generating compressed data by comparing the data to a static dictionary, characterized in that the computer process further comprises: the compressed data comprising at least one first data in the data and the static dictionary of at least one identical portion, at least one second of at least one portion of the static dictionary, and at least one third of the length. · · ·. don only at least one part of the length of the data and that part.

10 XML document, WML document, telecommunication network billing information, telecommunication network log information, information system internal message, information system message, other record.

10. Computer memory, characterized in that it contains a patent claim. , a computer program according to Volume 9. / 30

Telecommunication signal, characterized in that it contains a computer program according to claim 9.

·: * 12. An apparatus for lossless compression of data configured to generate compressed data by comparing the data to a static dictionary, characterized in that the compressed data comprises at least one of the first data; 35, and at least one third information about the length of at least one part of the static dictionary, and at least one third part of the length of the at least one part of the data and the static dictionary.

Device according to Claim 12, characterized in that the device is configured to compute the plurality of character data to be compressed and the static dictionary for comparison purposes; calculate how many subsequent characters in the data to be compressed deviate from the static dictionary characters; calculate how many following characters in the static dictionary 10 deviate from the characters of the data to be compressed; and repeat the calculation until all data characters have been processed.

Device according to one of the preceding claims 12 to 13, characterized in that the static dictionary consists of only one data structure.

Device according to Claim 14, characterized in that the data structure is one of the following: WVWV server event information, XML document, WML document, communication network billing information, communication network log information, information system internal, information system information, other record.

Device according to one of the preceding claims 12 to 15, characterized in that: d = data, "'' · r = static dictionary,: \: n = number of identical and different parts,: * ·. · 25 e, = data part identical to static dictionary, dj = part of data not identical to static dictionary, ···. fi = part of static dictionary identical to data, n = part of static dictionary not identical with data,,. | (x) = length function, length of data x, :: 30 c (d, r) = hash function for data d using static dictionary r, * ·· * 'd = ei + di + e2 + d2 + .. . + en + dn, ;; · r = fi + ri + f2 + r2 + ... + fn + rn, ei = fi, '.f. I (ei) = l (fi), 35 dj ^ n,: and compressed data 115937 c (d, r) = l (no) + ... + l (en) + l (d1) + ... + l (dn) + l (ri) + ... + l ( rn) + di + ... + dn or c (dlr) = (l (no) + l (d1) + l (r1) + d1) + ... + (l (en) + l (dn) + l (rn) + dn).

Device according to Claim 16, characterized in that the identical part 5 must be at least as long as the amount of memory needed to represent the lengths Ke 1, 1, d 1 and d 1 in the device.

Device according to Claim 16, characterized in that if the data to be compressed consists of eight bit characters, the lengths l (no), l (di) and l (n) are limited so that they can be represented on the device by eight bits.

Device according to one of the preceding claims 12 to 18, characterized in that the device is configured to compress the compressed data either partially or completely in at least one other way than described.

A device for lossless compression of data, comprising: comparing means to form compressed data by comparing the data with a static dictionary, characterized in that the compressed data comprises at least one first part of the data and at least one second part of the static dictionary. only the length of at least one part of the static dictionary, and at least one third of the length of at least one part of the data only and that part.

Device according to Claim 20, characterized in that the comparison means * calculates the multiplexed data to be compressed and the static *: * ·: the dictionary is identical; count the number of following characters in the data to be compressed. '. _ | 25 pages of static dictionary characters; count the number of following characters in the static dictionary. · ·, Differs from the characters to be compressed; and repeat the calculation until all data characters have been processed.

Device according to any one of the preceding claims 20 to 21, characterized in that the static dictionary consists of only one data structure.

A device according to claim 22, characterized in that. "·. the data structure is one of the following: web server event information, XML document / \ document, WML document, telecommunication network billing information, telecommunication network '*': 35 log log information, information system internal message, information system information message, other record. 115937

Device according to one of the preceding claims 20 to 23, characterized in that d = data, r = a static dictionary, 5 n = the number of identical and different parts, ej = a part of data identical to a static dictionary, dj = a part of data , not identical to a static dictionary, ή = a portion of a static dictionary identical to data, n = a portion of a static dictionary not identical to data, 10 l (x) = length function, length x of data, c (d, r) = compression function for data d using static dictionary r, d = no + d1 + e2 + d2 + ... + en + dn, r = f1 + r1 + f2 + r2 + ... + fn + rn, no = fi, 15 l (no) = l (fi), di * n, and compressed data c (d, r) = l (no) + ... + l (en) + l (di) + ... + l (dn) ) + Ι0ί) +. .. + l (rn) + di + ... + dn or 20 c (d, r) = (l (no) + l (d1) + l (r1) + d1) + ... + (l (en) ) + l (d n) + I (r n) + d n).

Device according to Claim 24, characterized in that: the identical part must be at least as long as the amount of memory needed to represent the lengths l (ej), l (d,) and Ι (η) •: · ·.

A device according to claim 24, characterized in that: if the data to be compressed consists of eight bits, the lengths are limited. '5 l (ej), l (dj) and l (n) such that they can be represented on the device by eight bits.

A device according to any one of the preceding claims 20 to 26, characterized in that the device comprises means for compressing the compressed data either partially or completely in at least one other way than described. ·! i 30

28. A lossless method for extracting compressed data, the method comprising generating data by comparing the compressed data with a static dictionary. . characterized in that the compressed data comprises at least one first,. the length of at least one identical part of the data in the data and the static dictionary, the length of the at least one other part in the at least one part in the static dictionary only, and at least one third information in the length and . 115937

The method of claim 28, characterized by: decoding (302) returning from the static dictionary a character length of the first data; Recovering (303) characters from the compressed data for a third length of data; moving (304) the static dictionary forward by the length of the second information; and repeating the previous steps (302, 303, 304) until all characters of the compressed data have been processed.

Method according to one of the preceding claims 28 to 29, characterized in that the static dictionary consists of only one data structure.

The method of claim 30, characterized in that the data structure is one of the following: web server event information, XML document, WML document, communication network billing information, communication network log information, information system internal, information system information, other record.

Method according to one of the preceding claims 28 to 31, characterized in that d = data, ·. · · R = static dictionary, ·: ·: n = number of identical and different parts,: \; ei = a piece of data identical to a static dictionary; 25 di = part of data not identical to static dictionary,.!., · Ή = part of static dictionary identical to data, ... η = part of static dictionary not identical to data, l (x ) = length function, length of data x, c (d, r) = compression function for data d using static dictionary r,: 30 d = e1 + di + e2 + d2 + ... + en + dn, r = f1 + r1 + f2 + r2 + ... + fn + rn, ej = fil I (ei) = l (fi) ,. d ^ n, •: 35 and compressed data: c (d, r) = l (e1) + ... + l (en) + l (d1) + ... + l (dn) + 1 (^) + .. + l (rn) + d1 + ... + dn 115937 or c (d, r) = (l (no) + l (di) + l (ri) + di) + ... + (l ( a) l + (d n) + I (r n) + d n).

33. A computer program encoding a computer process for lossless decompression of data, the computer process comprising: generating data by comparing compressed data to a static dictionary, characterized in that the computer process further comprises: the compressed data comprising at least one first data in the data and static dictionary part of the length, at least one second of the length of at least one part of the static dictionary only, and at least one third of the length of the at least one part of the data and that part.

34. Computer memory, characterized in that it contains a computer program according to claim 33.

A telecommunications signal, characterized in that it comprises a computer program according to claim 33.

36. An apparatus for lossless compression of compressed data configured to generate data by comparing compressed data to a static dictionary, characterized in that the compressed data comprises at least one first portion of at least one identical portion of the data and a static dictionary, at least one second 20 and at least one third of the length of at least one part of the data and that part only.

A device according to claim 36, characterized in that the device is configured to: i retrieve from the static dictionary the first data length by 25 characters; to return the compressed data to a third of the length of the data. · ·, Signs; to move forward in the static dictionary the length of the second information,. den mark; and: 30 repeat the previous steps until all the '· * ·' characters in the compressed data have been processed.

A device according to any one of claims 36 to 37,. characterized in that a static dictionary consists of only one data structure. • 35

The method of claim 38, characterized in that the data structure is any of the following: web server event information, 115937 XML document, WML document, communication network billing information, communication network log information, information system internal, information system information, other record.

A method according to any one of claims 36 to 39, characterized in that d = data, r = a static dictionary, n = the number of identical and different parts, non = a part of data identical to a static dictionary, 10 dj = part of data not identical to static dictionary, fj = part of static dictionary identical to data, n = part of static dictionary not identical to data, l (x) = length function, length of data x, c ( d, r) = compression function for data d using static dictionary r, 15 d = no + di + e2 + d2 + ... + en + dn, r = fi + r1 + f2 + r2 + ... + fn + rn, epfi, l (no) = l (fi), di * n, 20, and condensed data c (d, r) = l (no) + ... + l (en) + l (di) + ... + l ( dn) + l (ri) + ... + l (rn) + di + ... + dn::: i I · I •: · * S c (d, r) = (l (ei) + l (di) + l (n) + di) + ... + (l (en) + l (dn) + l (rn) + dn).

41. Device for lossless compression of compressed data, which device ·. : 25, the decompressing means generates the data by comparing the compressed data with a static dictionary, characterized in that the compressed data comprises at least one I <... of at least one identical portion of the length of the first data in the data and the static dictionary, at least one the second information is only the length of at least a portion of the static word in the book, and at least one third of the length of the at least one portion of the data is at least one portion and that portion.

A device according to claim 41, characterized in that the decoding means:,, returns characters from the static dictionary of the first data length; 35 return from the compressed data a third length of data: characters; 115937 move the static dictionary forward by another character length; and repeats the previous steps until all characters in the compressed data have been processed.

Device according to one of the preceding claims 41 to 42, characterized in that the static dictionary consists of only one data structure.

A method according to claim 43, characterized in that the data structure is one of the following:

A method according to any one of claims 41 to 44, characterized by defining 15 d = data, r = static dictionary, n = number of identical and different parts, e, = part of data identical to static dictionary, dj = data part not identical with static dictionary, 20 f, = part of static dictionary identical to data, Π = part of static dictionary not identical with data; :: l (x) = length function, x length of data,: ··! c (d, r) = compression function for data d using static dictionary r, 'd = ei + di + e2 + d2 + ... + en + dn,: 25 r = f1 + ri + f2 + r2 + ... + fn + rn, ,,,,; ®i = fii I .. 'I (no) = l (fi),! ·· *: di * n, and the compressed data i 30 c (d, r) = l (no) + ... + l ( en) + l (d1) + ... + l (dn) + l (ri) + ... + l (rn) + d1 + ... + dn or 5. c (dlr) = (l (e1) l + (d1) + l (r1) + d i) + ... + (l (a) + I (d n) + I (r n) + d n). > I »» 115937