WO2006056247A1

WO2006056247A1 - Digital data compression method

Info

Publication number: WO2006056247A1
Application number: PCT/EP2005/009125
Authority: WO
Inventors: Bertrand Danet
Original assignee: Siemens Vdo Automotive
Priority date: 2004-11-26
Filing date: 2005-08-24
Publication date: 2006-06-01
Also published as: FR2878668A1; FR2878668B1

Abstract

The invention relates to a method for compressing a binary data stream comprising steps which consist in determining a reference sequence (DI) in a compressible data stream, wherein said sequence is several times repeatable in the data stream sequence, in generating a compressed data stream from the compressible data stream by replacing each reference sequence repeat by a vector (Vr1*2, Vr1*1) directly defining a return to said reference sequence. Said invention facilitates, in particular to catty out a decompression with the aid of processing low-capacity machines.

Description

Digital data compression method

In different industrial sectors (for example the manufacture of engine computers or mobile phones), digital data is conventionally transmitted by a serial link between a server and a programming unit of a non-volatile memory on a production line or when update phases (such as reprogramming garage motor vehicle calculators). The transmission of these data occupies an important bandwidth on the transmission network, moreover the programming time of the memory is often lower than the transmission time on the network. This problem is all the more present as the size of the non-volatile memories and the data / programs stored therein increases considerably.

Numeric data compression / decompression algorithms, such as Run Length Encoding (RLE), ZIP, Hufman or LZW (Lempel-Ziv-Welch) (registered trademarks), are widely distributed for consumer computing.

These algorithms usually require a lot of RAM to handle data in a lookup table, dictionary, or binary tree. During a programming process, dialogue, test, programming and compression routines often need to be stored in a limited amount of RAM. This decompression software must not be too large (binary tree) and must not require too much computing power (arithmetic compression). If such software have interesting compression rates, however, they are poorly suited to be run on machines with reduced processing capabilities and not exploiting specificities related to a family of processor.

There is therefore a need for a method of compressing a binary data stream, comprising the steps of: in a bit stream to be compressed, determining a reference sequence, repeated several times in the sequence of the bitstream;

generating a compressed data stream from the data stream to be compressed, replacing each repetition of the reference sequence with a vector defining directly a reference to this reference sequence. According to one variant, the method comprises the steps of:

-determine binary sequences not to compress;

-generate the compressed data stream by inserting before each binary sequence not to compress a vector identifier that the following sequence is not compressed; generating the compressed data stream by inserting a vector identifying a command followed by a command sequence at the beginning of the bit stream to be compressed. According to another variant, each vector has a field identifying its length, and at least two other fields whose combination of values identifies the type and the values of this vector.

According to another variant, the generation of the compressed data stream further comprises the insertion of control vectors followed by control sequences at appropriate places of the bit stream to be compressed, the control sequences defining the structure of the subsequent vectors, defining the respective size of the subsequent vector fields, defining threshold values of fields of the subsequent vectors, containing information on the compressed file or containing bits of verification of the integrity of the compressed file.

According to yet another variant, the vector identifying an uncompressed next sequence comprises a field defining the size of said next sequence.

The size of the field defining the size of the following sequence can be defined in a command sequence inserted into the compressed bit stream. The return vector may comprise a field defining the size of the reference sequence and a field defining the position of the reference sequence.

According to one variant, the size of the field defining the size of the reference sequence and the size of the field defining the position of the reference sequence are automatically defined in a control sequence inserted in the compressed bitstream.

According to another variant, the position of the reference sequence is defined by the distance, in the file to be compressed, between the reference sequence and the repetition replaced by the vector.

Also disclosed is a method of decompressing a compressed bit stream, the compressed bit stream comprising a reference sequence and reference vectors to the reference sequence replacing repetitions of the reference sequence, and the method comprising a step of generating an uncompressed bit stream by replacing the return vectors with the reference sequence to which they return. According to one variant, the generation of the bitstream comprises the on-the-fly generation of the decompressed bitstream during the reading of the compressed bitstream.

According to another variant, the on-the-fly generation of the decompressed bit stream is performed by progressive storage of the decompressed bit stream in a non-volatile memory; when reading a reference vector to a reference sequence in the compressed bit stream, the reference sequence is recovered in the part of the bitstream already stored in the non-volatile memory and the bitstream is completed by copying the sequence reference number recovered. According to another variant:

the compressed bit stream comprises a control vector followed by a control sequence;

the generation of the uncompressed bitstream comprises the execution of the control sequence and the deletion of the control vector and the control sequence in the decompressed bitstream.

According to a variant:

the compressed bit stream comprises a vector identifying an uncompressed binary sequence followed by the uncompressed binary sequence; the generation of the uncompressed bitstream comprises the deletion of this vector and the copying of the uncompressed binary sequence into the decompressed bitstream.

The invention also relates to a storage medium storing compressed software according to such a compression method. The invention also relates to a storage medium storing a compression software capable of implementing such a compression method.

The invention also relates to a storage medium storing a decompression software capable of implementing such a decompression method.

The invention will be better understood on reading the description which follows, accompanied by the appended drawings which represent:

FIG. 1, a simplified algorithm of a data compression method according to the invention;

FIG. 2, a simplified algorithm of a data decompression method according to the invention; FIG. 3, a representation of an example of a generic vector;

FIG. 4, the representation of an example of a direct data vector;

FIG. 5, the representation of an exemplary copy vector;

FIG. 6, the representation of an example of a control vector;

FIG. 7, the representation of a portion of a bit stream to be compressed; FIG. 8, the representation of a portion of the compressed bitstream according to a variant of the invention.

The invention thus proposes to compress a bit stream in the following manner: reference sequences are determined in the bit stream which each have repetitions in the remainder of the bit stream. A compressed bit stream is then generated by replacing each repetition of the reference sequence with a vector returning to that reference sequence. The replacement of a repetition by a substantially smaller vector of return makes it possible to compress the bit stream.

According to a variant facilitating decompression, other vector types are used: Before generating the compressed bit stream, it is possible to determine sequences whose compression by a return vector is of little interest. This is particularly the case for short sequences or with no or little repetition in the data already decompressed bit stream. When generating the compressed bit stream, a direct data vector is inserted before such sequences. The uncompressed sequences are thus identified and the data transmitted.

Control vectors may also be inserted into the compressed bit stream. These control vectors precede commands whose examples will be given later. The commands are intended to be processed during decompression. In particular, a control vector is inserted at the beginning of the bit stream in order to define certain compression parameters or the name of the file for example.

During decompression, the decompressed bit stream is generated as follows: the control sequences are executed, the reference sequences are copied each time a corresponding reference vector is read and the vectors are eliminated. Since the compressed bitstream returns directly to reference sequences of the bitstream being decompressed, the decompression does not imply the reading of an appendix.

Figure 1 illustrates an example of a compression algorithm that can be implemented. During a step 101, a compressor recovers a bit stream to be compressed, for example in the form of a file. In step 102, the compressor reads the contents of the bitstream and lists reference sequences with repetitions. The compressor can thus detect the first occurrence of binary sequences and detect subsequent repetitions. The compressor stores the location and size of the reference sequence and repetitions. In step 103, the compressor performs in a manner known per se statistical processing from the stored information. The compressor then determines the sequences for which compression is advantageous. In step 104, the compressor generates a compressed bit stream from the train to be compressed. For this, the compressor replaces the repetitions of sequences selected by a vector returning directly to their reference sequence. The vector indicates in particular the size and the location of this reference sequence. The compressor therefore does not have to include in the compressed train an appendix table defining the references to the reference sequences. The compressor also places direct and control data vectors as needed. The control vectors may contain information about the compressed file (e.g., the CRC, the file name) of the parameters defining the compression (e.g., vector sizes associated with different codes, pivot positions in the vectors, values threshold for vector fields, multipliers of values of vector fields, areas not to be programmed, etc.) or control information (indication of a start of file or subfile, indication of end of file or subfile, filename or subfile, CRC check bits, encryption key ...).

It will be noted that this compression method is conservative, ie there is no loss of data between the bit stream to be compressed and the decompressed bit stream.

In addition, the compression method makes it possible to obtain interesting compression ratios. Software implementing such a method has in particular made it possible to obtain the following compression performances:

Figure 2 illustrates an example of an associated decompression algorithm.

The algorithm processes the compressed bit stream in parallel with its reception (there is potentially no storage or waiting of data in the bit stream). In step 201, the decompressor waits for the reception of a complete vector. In step 202, it decodes the vector type (command, direct, reference copy).

We go to step 203 for a control vector that will, if necessary, retrieve parameters in the incoming data stream. This control vector is executed during step 204. After the vector processing, we go back to step 201.

Go to step 213 for a direct data vector: from the data, the size of the direct area is decoded in the vector. In step 214, we recover in the data flow the direct data which are immediately copied into the memory to be filled. After the treatment of the vector, we go back to step 201.

Go to step 223 for a reference copy vector: the size and position of the reference is decoded from the data in the vector. In step 224, the reference area is copied by copying the data already written in the memory to be filled (generally a very fast process). After treatment of the vector, we go back to step 201.

The associated decompression method can be implemented with simple means essentially comprising shift registers and masks through the original vector structure. The associated decompression method also makes it possible to use at least the random access memory of the decompression device (storing a table in random access memory is no longer essential). This decompression process is particularly suitable for nonvolatile memories. Indeed, the copying vectors are used to directly copy data already stored in a reference sequence of the non-volatile memory. Access to nonvolatile memory data during copying does not overly slow down the decompression process.

Although it may be considered that the necessary compression time is relatively long, it should be noted that the associated decompression method requires a reduced RAM, is very fast and can be used with processors of limited capacity.

FIG. 3 illustrates an example of a vector that can be used for the different types of vectors included in the compressed bitstream. The vector has a first field 31 in which its length is identified. Different ways of identifying the length will be illustrated later. The second field 32 is used to define a bit sequence size, a data location to be copied or the type of vector. Field

33 is used to define the size of a sequence to copy or the type of vector. The pivot

34 is used to define the separation between the field 32 and the field 33. Knowing the position of the pivot, the decompressor will itself delimit the fields 32 and 33.

FIG. 4 illustrates an exemplary structure of a direct data vector 4. The field 41 comprises a combination of bits 01 indicating that the vector 4 occupies 16 bits. The field 42 defines the size of the uncompressed sequence that follows the vector 4. The field 43 comprises a zero value. Combining the fields 42 and 43 allows the decompressor to identify the vector 4 as a direct data vector. Indeed, the decompressor will identify a non-zero value in the field 42 and a zero value in the field 43, this combination designating in the example a vector of direct data. FIG. 5 illustrates an exemplary structure of a copy vector 5. The field 51 comprises a combination of bits 01 indicating that the vector 5 occupies 16 bits. The field 52 defines the location of the data to be copied in the uncompressed bitstream. Field 53 defines the size of the data to be copied to the indicated location.

FIG. 6 illustrates an exemplary structure of a control vector 6. The field 61 comprises the value 00 which defines that the vector 6 occupies 8 bits. The fields

62 and 63 take a value of zero. The decompressor will identify the null values of the fields 62 and 63 and determine that the vector 6 is a control vector followed by a type and parameters of the command.

In the example, the control vector does not indicate size for the following control sequence. It will then be possible to use a predetermined size of the command sequences according to their type.

FIG. 7 illustrates a portion of bit stream to be compressed having successively three D1 sequences, a D2 sequence and a D1 sequence. The compressor identifies the redundancies and creates a compressed bitstream, a portion of which is illustrated in FIG. 8. This bitstream comprises a first control vector VC1, followed by its command and its parameters C1. For the reference sequence of the sequence D1, the compressor then places a direct data vector VD1 followed by the sequence D1 in the bit stream. The compressor then places a copy vector Vr1 ^* 2 indicating an offset of eight bits and the copying of 16 bits (the vector Vr1 ^* 2 thus allows the copying of the first two repetitions of D1). The compressor then places a direct data vector VD2, the sequence D2, then a copy vector Vr1 ^* 1 indicating a shift of 16 bits and the copy of eight bits. The compression algorithm used may use a number of known statistical methods to determine whether repetitions should be compressed or not.

In particular, it is possible to compare the size of a repeated sequence with the size of the copying vector considered in order to determine whether this sequence should be replaced by this copying vector. It can thus be noted that the size required for the field 52 will be variable as a function of the difference between the reference sequence and its repetition. This size will be fixed for example by a pivot positioning control described later.

The coding of the size of the vectors can be done in the following manner. In the following simplified examples, it will be assumed that the vectors have a size of 8, 16, 32 or 64 bits. In the example of FIGS. 4 to 6, the size of the vector is coded on two bits in the fields 41, 51 and 61. The combination 00 identifies a vector of size δbits, 01 a vector of 16 bits, 10 a 32-bit vector and 11 a 64-bit vector. A second possibility is to detect the position of the first bit at 1 of the field

31. Thus, a bit at 1 in the first position identifies a vector of 8 bits, a bit at 1 in second position a vector of 16 bits, a bit at 1 in third position a vector of 32 bits and a bit at 1 in fourth position a 64-bit vector. The field defining the size of the vector may thus have a variable length depending on the size of the vector.

A third possibility is to detect the position of the first bit at 0 in the field 31. A bit at 0 in the first position identifies an 8-bit vector and so on.

To detect the position of the first bit at 0 or 1, assembly instructions can be used on most processors that do this type of detection. It can be predicted that the position of the pivot is predefined according to the size of the vectors. The position of the pivot can also be initialized and adapted dynamically as the bit stream, depending on the compression conditions encountered, with control vectors.

Several improvements make it possible to encode as much information as possible with vectors having as small a size as possible.

Thus, it can be provided that the compressor and the decompressor associate a different value to the value of the field 42 or 52. The associated value will be for example the sum of the field value and a threshold value. Thus, if the value of the field 42 or 52 is 11 and if the threshold value is 1, the associated value will be equal to 12. Indeed, the minimum distance between the reference sequence and its repetition is never zero: this is not coded than distances greater than 1 with such a copying vector 5.

Similarly, the size of a direct sequence is never zero: thus, only sequence sizes greater than 1 are encoded with such a direct data vector 4.

The value of the threshold thus defines respectively the minimum distance of the reference sequence and the minimum size of the direct sequence.

Similarly, it will be possible to associate distinct threshold values as a function of the size of the vectors. Here is a practical example: We have vectors 4 and 5 respectively of 8 bits and 16 bits, whose fields

42 and 52 have 4 and 8 bits. With a threshold value of 1, the values associated with the vector 4 will be from 1 to 16.

The vector 5 on 16 bits will provide a gain less than the vector 4 on 8 bits for values from 1 to 16. It is therefore possible to set the threshold of the vector 5 to 17, the values associated with the vector 5 will therefore be from 17 to 272.

Claims

A method of compressing a binary data stream comprising the steps of: in a bit stream to be compressed, determining (102) a reference sequence, repeated several times in the remainder of the bit stream; generating a compressed data stream (104) from the data stream to be compressed, replacing each repetition of the reference sequence with a vector directly defining a reference to this reference sequence and identifying a command followed by a sequence of command, characterized in that each vector has a field identifying its length (41, 51, 61), and at least two other fields whose combination of values identifies the type and values of this vector.

2. A compression method according to claim 1, characterized in that the generation of the compressed data stream further comprises the insertion of control vectors followed by control sequences at appropriate locations of the bit stream to be compressed, the control sequences defining the structure of the subsequent vectors, defining the respective size of the fields of the subsequent vectors, defining threshold values of fields of the subsequent vectors, containing information on the compressed file or containing bits of verification of the integrity of the compressed file.

3. A compression method according to one of the preceding claims, characterized in that the vector identifying an uncompressed next sequence comprises a field (42) defining the size of said next sequence.

4. A compression method according to claims 1 and 3, characterized in that the size of the field defining the size of the next sequence is defined in a control sequence inserted into the compressed bit stream.

5. A compression method according to any one of the preceding claims, characterized in that the return vector comprises a field (53) defining the size of the reference sequence and a field (52) defining the position of the reference sequence. .

6. A compression method according to claims 1 and 5, characterized in that the size of the field defining the size of the reference sequence and the size of the field defining the position of the reference sequence are automatically defined in an inserted control sequence. in the compressed bit stream.

7. A method of compression according to claim 5 or 6, characterized in that the position of the reference sequence is defined by the distance, in the file to be compressed, between the reference sequence and the repetition replaced by the vector.

8. Storage medium storing compressed software according to the method of any one of claims 1 to 7.

9. Storage medium storing a compression software capable of implementing the method according to any one of claims 1 to 7.