WO2014089753A1

WO2014089753A1 - File compression method, file decompression method, device and server

Info

Publication number: WO2014089753A1
Application number: PCT/CN2012/086341
Authority: WO
Inventors: 沈慧
Original assignee: 华为技术有限公司
Priority date: 2012-12-11
Filing date: 2012-12-11
Publication date: 2014-06-19
Also published as: CN103384884B; CN103384884A

Abstract

A file compression method, a decompression method, a device and a server. The decompression method comprises: acquiring the length of each compressed data block, the number of data blocks and the cyclic redundancy check (CRC) value of each data block in a compressed file; according to the length of the compressed data block and the number of data blocks, blocking the compressed file to obtain each compressed data block; performing parallel decompression on each compressed data block to obtain each corresponding data block; calculating the CRC value of each data block obtained by decompression; if the acquired CRC value of each data block is the same as the CRC value of each data block obtained by decompression, the data block being consistent with an original data block; and merging each data block obtained by decompression to obtain an original file. According to the acquired length of each compressed data block and the CRC value of each data block, the present invention performs parallel decompression on the compressed file, improving the speed and efficiency of decompression.

Description

TECHNICAL FIELD The present invention relates to the field of information technology, and in particular, to a file compression method, a file decompression method, an apparatus, and a server. Background technique

At present, in the existing GZIP (GNU Zip) compression method, the file is first split into multiple data blocks, and then the split multiple data blocks are compressed in parallel, and then, each bit is compressed in units of bits. The data blocks are combined into one compressed file. According to the GZIP compression method, the entire GZIP compressed file has only the starting address of the compressed data block, and there is no data block number of compressed data and the length of each data block.

Therefore, in the corresponding GZIP decompression method, that is, when the compressed file is decompressed, only one bit can be read and parsed in sequence, that is, only the first one of the compressed files is After the compressed data block is decompressed, the second compressed data block can be decompressed, that is, only one data block can be serially decompressed.

In the existing GZIP decompression method, only the compressed file can be serially decompressed, and the decompression speed and efficiency are not high. Summary of the invention

The embodiment of the invention provides a file compression method, a file decompression method, a device and a server, and the parallel decompression of data improves the speed and efficiency of decompression.

In order to solve the above technical problem, the embodiment of the present invention discloses the following technical solutions:

The first aspect provides a file compression method, including:

Splitting the file into multiple data blocks, and counting the number of the data blocks;

Calculating, according to the number of the plurality of data blocks, a length of the extended data content, and applying for memory occupied by the additional option according to the length;

Performing parallel compression on the plurality of data blocks to obtain corresponding compressed data blocks, and acquiring a cyclic redundancy check CRC value of each data block;

Length of the extended data content, number of data blocks, length of each compressed data block, and individual data The CRC value of the block is stored in the additional option;

Adding the additional option to the extended extra option corresponding to the header in the data compression format, and combining the multiple compressed data blocks to obtain a compressed file;

Sending the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file. In a first possible implementation manner of the first aspect, the performing the parallel compression on the multiple data blocks comprises: performing parallel compression on the plurality of data blocks by using multiple compression engines.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the additional option further includes: SI1 and SI2, where the SI1 and SI2 represent additional options. The ID of the extended data.

The second aspect provides a file decompression method, including:

Obtaining a length of each compressed data block in the compressed file, a number of data blocks, and a cyclic redundancy check CRC value of each data block;

And compressing the compressed file according to the length of the compressed data block and the number of data blocks to obtain respective compressed data blocks;

Performing parallel decompression on the compressed data blocks to obtain corresponding data blocks;

Calculating a CRC value of each of the data blocks obtained by decompression;

Determining whether the obtained CRC value of each data block is the same as the CRC value of each data block obtained by decompression;

When the CRC value of each data block is the same as the CRC value of each data block obtained by decompression, the respective data blocks obtained by decompression are combined to obtain an original file.

In a first possible implementation manner of the second aspect, the obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block specifically includes:

The length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block are obtained from the additional options in the compressed file header extension extra option.

With reference to the second aspect, or the first possible implementation manner of the second aspect, in the second possible implementation manner, the performing, the decompressing the compressed data blocks in parallel includes:

The plurality of compressed data blocks are respectively decompressed in parallel by a plurality of decompression engines.

A third aspect provides a file compression apparatus, including:

a splitting unit, configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; the first calculating unit is configured to calculate the extended data content according to the number of the plurality of data blocks Length, the memory occupied by the additional options according to the length; a compression unit, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks; and a second calculating unit, configured to separately calculate each data block when the compression unit performs parallel compression on the plurality of data blocks Cyclic redundancy check CRC value;

a storage unit, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are stored in an additional option;

Adding a unit, configured to add the additional option to an extended extra option corresponding to a header in a compressed format;

a merging unit, configured to merge the plurality of compressed data blocks to obtain a compressed file after the adding unit adds the additional option to a location corresponding to a header in the GZIP format;

And a sending unit, configured to send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.

The fourth aspect provides a file decompression device, including:

And an obtaining unit, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block;

a dividing unit, configured to block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;

a decompression unit, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;

a calculating unit, configured to calculate a CRC value of each of the decompressed data blocks when the decompressing unit performs parallel decompression on the compressed data blocks;

a determining unit, configured to determine whether a CRC value of each of the data blocks acquired by the acquiring unit is the same as a CRC value of each data block obtained by calculating the decompression;

a determining unit, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block;

And a merging unit, configured to: when the determining unit determines that the data block is consistent with the original data block, merge the decompressed respective data blocks to obtain an original file.

In a first possible implementation manner of the fourth aspect, the acquiring unit is specifically configured to obtain, according to an additional option in the compressed file header extension extra option, a length of each compressed data block, a number of data blocks, and Cyclic Redundancy Check CRC value for each data block.

The fifth aspect provides a server, including:

a processor, configured to split a file to be compressed into a plurality of data blocks, and count the number of the plurality of data blocks; Calculating a length of the extended data content according to the number of the plurality of data blocks, and applying for an additional occupied memory according to the length;

a compression engine group, comprising: a plurality of compression engines, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks;

The processor is further configured to calculate a cyclic redundancy check CRC value of each data block, and the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block. Storing in an additional option; and adding the additional option to the extended extra option corresponding to the header in the GZIP format, merging the plurality of compressed data blocks, obtaining a compressed file, and transmitting the compressed file to the receiving End, so that the receiving end performs parallel decompression on the compressed file.

A sixth aspect provides a server, including:

a processor, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block The number of data blocks is divided into blocks, and each compressed data block is obtained;

a decompressing engine group, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;

The processor is further configured to calculate a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block and the CRC value of each data block obtained by decompression If the data block is the same as the original data block, the respective data blocks obtained by the decompression are combined to obtain the original file.

According to the foregoing technical solution, in the embodiment of the present invention, when the file is compressed, the length of each compressed data block and the Cyclic Redundancy Check value of each data block are added to the header information. In an additional option, when the receiving end decompresses the compressed file, the compressed file may be decompressed in parallel according to the length of each compressed data block and the CRC value of each data block, thereby improving the speed of understanding compression. effectiveness. DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.

FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention;

2 is a flowchart of a file decompression method according to an embodiment of the present invention; FIG. 3 is a schematic structural diagram of a file compression apparatus according to an embodiment of the present disclosure;

4 is a schematic structural diagram of a file decompressing apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of another server according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention;

FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Referring to FIG. 1, FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention. The method includes: Step 101: split a file into multiple data blocks, and count the plurality of data blocks. Number; where, the server (X86 server, minimalist instruction set computer, IA-64 server, etc.) for files (such as UNIX system files, locally stored files or received files, or various formats running on the operating system) The files, especially the text files, etc.) There are many ways to split. For different data blocks, different splitting methods can be used. For example, TMPGEnc can be used to split files in MPEG format. ASF can also be used. Tools split files in ASF or WMV format, and can also use AVI chop to split files in MPEG4 format and so on.

The method of splitting may be split according to a fixed number of bytes, or may be equally divided according to the size of the file, and may be arbitrarily split according to requirements, which is not limited in this embodiment.

The file in this embodiment refers to a file suitable for GZIP compression or decompression.

Step 102: Calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for memory occupied by the additional option according to the length;

For example, if the file is split into 10 data blocks, the length of the extended data content (XLEN, eXtra LENgth) is: The sum of the lengths of each data block, that is, the sum of the lengths of 10 data blocks. This embodiment defines the length occupied by all data blocks as the length of the extended data content.

That is, the XLEN is an optional number of bytes. That is, the number of bytes of NUM~nCRC in Table 1 below. For example, if the file is split into 10 data blocks, refer to the number of bytes of each of the items in Table 2 below, S卩XLEN=2 (NUM) + (4+4) * 10=82.

Then, since the length of XLEN is calculated to be 82, the memory of 82bite can be applied through the malloc function.

In this embodiment, the additional options are located in an extended option of the header of the compressed file, and the structure of the additional optional items is shown in Table 1 below.

Step 103: Perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks, and calculate a cyclic redundancy check CRC value of each data block;

In this embodiment, the parallel compression needs to compress the multiple data blocks by using multiple compression engines. In this embodiment, in the case of hardware compression, parallel compression is to use multiple compression engines to simultaneously multiple data. Block compression; in the case of software compression, parallel compression is the use of multi-threading technology to compress multiple data blocks while the central processing unit (CPU) has multiple physical cores.

In general, in order to verify whether the data is decompressed correctly, the CRC check value of each data block needs to be calculated, so as to be verified by the CRC algorithm, wherein the principle of C R C is:

The CRC check value of each data block is calculated. One of the principles of the CRC algorithm is, but not limited to, the following: By means of polynomial division, the remaining numbers are check fields.

For example: The data segment code is: 1011001 ; Corresponding m(x)=x6+x4+x3+l

Suppose the generator polynomial is: gCx)=x4+x3+l _; then the code corresponding to gCx) is: 11001 x4m(x)=xl 0+χ8+χ7+χ4 The corresponding code is recorded as: 10110010000;

Polynomial division is used: The remainder is: 1010 (ie the check field is: 1010).

Of course, other CRC algorithms may also be used by those skilled in the art, and the present embodiment is not limited. Step 104: Store the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in an additional option;

For example, in the above process, the file is first divided into two data blocks, the number of statistical data blocks, and the length of the extended data content is obtained according to the number of data blocks; then, after compressing each data block, Obtaining the length of each compressed data block, and calculating a CRC value of each data block, and then, the length of the extended data content, the number of data blocks, the length of the first compressed data block, and the first data block a CRC value, a length of the second compressed data block, and a CRC value of the second data block, until the length of the second compressed data block and the CRC value of the second data block are sequentially added to the corresponding XLEN field in the additional option, respectively. , NUM field, 1 LEN field, 1 CRC field; 2 LEN field, 2 CRC field; and NLEN field, NCRC word In the paragraph;

Further, the additional options may further include identification information such as SI1 and SI2, wherein the SI1 and SI2 are IDs of the extended data content in the additional options.

Specifically, the structure of the additional optional items is specifically as shown in Table 1:

Table 1

The SI1 and the SI2 are identification information.

The XLEN is the length of the extended content, that is, the length from NUM to nCRC;

The NUM represents the number of data blocks owned by the compressed file;

The 1 LEN, 1 CRC to NLEN, NCRC, is used to indicate extended information, including length information of each compressed data block, and specifically includes: length of each compressed data block (block) after compression and CRC32 value of each data block before compression The CRC32 is a data error check code. In the case of data communication and compression, the data is checked for error by comparing whether the original data and the CRC32 value of the compressed packet decompressed data are the same. Among them, the structure of the additional options includes the specific contents as shown in Table 2:

Table 2

The content shown in Table 2 is only an example, and is not limited thereto. The adaptive repair step 105 may also be performed as needed: adding the additional option to the extended extra option corresponding to the header in the compressed format. Merging the plurality of compressed data blocks to obtain a compressed file;

Further, the extended extra option may include a source file name, a comment text, or a CRC 16 and the like in addition to the additional options. This embodiment mainly expands the additional options. That is, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are increased in an additional option, so that the receiving end performs each data block according to the added information. Decompress in parallel.

The compression of the file in this embodiment is applicable to the GZIP compression format. Each of the independent GZIP compressed files includes a header, a data portion, and a trailer. The header may include an extended extra option, and may also include: ID1 ID2, CM, FLG, MTIME, XFL, OS, where

The ID1 ID2 is a fixed value, S卩ID1 = 0X1F, ID2 = 0X8B; is used to identify the GZIP format; the CM: represents a compression method, currently only one value, that is, CM=8, indicating the DEFLATE method; the MTIME: Indicates compression time, in UNIX format;

The XFL: indicates compression mode, XFL=2 _: indicates maximum compression but the slowest algorithm; XFL=4 indicates the fastest but minimum compression algorithm;

The OS: represents a file system, for example: OS=0 indicates a FAT file system; OS=3 indicates a UNIX file system;

The FLG: indicates an extended function identifier, and each BIT represents an additional data, and the specific corresponding content is represented in an extra, the extra includes: an additional option, an original file name, a comment text, and a CRC16.

The above describes the content information of the header of the GZIP compressed file. In addition, the GZIP compressed file may also include the data part and the tail part. That is, currently, each independent GZIP compressed file is composed of the header, the data part and the tail part. composition. The information of the head is as described above, and details are not described herein again.

The data portion includes one or more data blocks (this embodiment is one or more compressed data blocks, the same below), and the format of each data block includes BFINAL, BTYPE, and data DATA information. The BFINAL bit occupies lbit, indicating whether it is the last data block. If the BFINAL bit is 1, it indicates the last data block. BYTPE indicates the compression mode of the data, the compression type (2 bit), either static Huffman compression (01), dynamic Huffman compression (10) or uncompressed (00); DATA indicates compressed data, (for example, LZ77+huffman Encoding + binary tree characteristics, etc.).

The tail includes the 32-bit CRC value of the original file and the lower 32-bit value of the original data length, and the tail is mainly used to verify whether the decompressed file is consistent with the original file before compression.

Step 106: Send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file. It should be noted that, in this embodiment, for the compression mode of the embodiment, the existing serial decompression may be used, or the parallel decompression provided by the embodiment of the present invention may be used (as shown in the following FIG. 2 embodiment). If serial decompression is used, it is necessary to use the content of the tail of the compressed file to verify whether the decompressed file is consistent with the original file before compression; and if the parallel decompression of the present application is used, it is not necessary to use the content of the tail. To verify whether the decompressed file is consistent with the original file before compression, but to verify the decompressed data block and the original data before compression according to the respective CRC values of the additional options in the extended extra option in the header. Whether the blocks are consistent.

In the embodiment of the present invention, when the file is compressed, the length of each compressed data block and the CRC value of each data block are added in an additional option of the header information by using a new field, so that when the receiving end decompresses, The compressed file can be decompressed in parallel based on this information, thereby improving the speed and efficiency of understanding compression.

Referring to FIG. 2, FIG. 2 is a flowchart of a file decompression method according to an embodiment of the present invention, where the method includes:

Step 201: Obtain a length of each compressed data block in the compressed file, a number of data blocks, and a cyclic redundancy check CRC value of each data block;

The process of obtaining is as follows: The server obtains the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block from the additional options in the compressed file header extension extra option.

Step 202: Block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;

Step 203: performing parallel decompression on the compressed data blocks to obtain corresponding data blocks. Specifically, the server may input each compressed data block into a corresponding decompression engine, and respectively, by using multiple decompression engines. The compressed data block is decompressed in parallel. The process of parallel decompression is well known to those skilled in the art and will not be described herein.

Step 204: Calculate a CRC value of each of the data blocks obtained by decompression;

The calculation process is well known to those skilled in the art and will not be described herein.

Step 205: If the obtained CRC value of each data block and the decompressed data block are

If the CRC values are the same, the respective data blocks are consistent with the original data blocks;

Step 206: Combine the decompressed each of the data blocks to obtain an original file.

In the embodiment of the present invention, when decompressing, the server first obtains the length of each compressed data block and the CRC value of each data block from the compressed file, and decompresses the compressed file in parallel according to the information, and decompresses the compressed file. Data blocks can be checked for correctness by independent CRC values, which improves the speed and efficiency of compression. Based on the implementation process of the foregoing method, the embodiment of the present invention further provides a file compression device, which is shown in FIG. 3, and the device includes: a splitting unit 31, a first calculating unit 32, a compressing unit 33, and a second a calculating unit 34, a storage unit 35, an adding unit 36, and a merging unit 37, wherein the splitting unit 31 is configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; , can be split according to the fixed number of bytes, can be evenly divided, and can be split as needed. The first calculating unit 32 is configured to calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for an additional occupied memory according to the length; the compressing unit 33 is configured to: The plurality of data blocks are compressed in parallel to obtain a plurality of compressed data blocks. Specifically, the plurality of data blocks may be compressed in parallel by a plurality of compression engines. The second calculating unit 34 is configured to perform the compression on the compression unit 33. When the plurality of data blocks are compressed in parallel, the cyclic redundancy check CRC value of each data block is separately calculated; the storage unit 35 is configured to use the length of the extended data content, the number of data blocks, and each compressed data block. The length and the CRC value of each data block are stored in an additional option; the adding unit 36 is configured to add the additional option to the extended extra option corresponding to the header in the compressed format; the merging unit 37, And after the adding unit adds the additional option to a location corresponding to a header in the GZIP format, combining the plurality of compressed data blocks to obtain a compressed file.

For the implementation process of the functions and functions of the various units in the device, refer to the corresponding implementation process in the foregoing method, and details are not described herein again.

Correspondingly, the embodiment of the present invention further provides a file decompression device, which is shown in FIG. 4, and the device includes: an obtaining unit 41, a dividing unit 42, a decompressing unit 43, a calculating unit 44, and a determining unit 45. a determining unit 46, a merging unit 47, and a sending unit 48, wherein the obtaining unit 41 is configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and each data block. The cyclic redundancy check CRC value is specifically used to obtain the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC of each data block from the additional options in the compressed file header extension extra option. The dividing unit 42 is configured to block the compressed file according to the length of the compressed data block and the number of data blocks to obtain each compressed data block; and the decompressing unit 43 is configured to Each of the compressed data blocks is decompressed in parallel to obtain corresponding data blocks. The calculating unit 44 is configured to perform, on the decompression unit, the respective compression numbers. When the block performs parallel decompression, the CRC value of each of the decompressed data blocks is calculated; the determining unit 45 is configured to determine the CRC value of the respective data blocks acquired by the acquiring unit and calculate and decompress the obtained Whether the CRC value of each data block is the same; the determining unit 46, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block; the merging unit 47 is configured to be in the determining unit When it is determined that the data block is consistent with the original data block, the respective data blocks obtained by the decompression are combined to obtain an original file; the sending unit 48 is configured to send the compressed file to the receiving end, so as to facilitate The receiving end performs parallel decompression on the compressed file.

Correspondingly, the embodiment of the present invention further provides a server, which is shown in FIG. 5. The server includes: a processor 51 and a compression engine group 52, wherein the processor 51 is configured to be compressed. Splitting the file into a plurality of data blocks, and counting the number of the plurality of data blocks; calculating the length of the extended data content according to the number of the plurality of data blocks, and applying the memory occupied by the additional options according to the length The compression engine group 52 includes a plurality of compression engines for performing parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks. The processor 51 is further configured to calculate a cyclic redundancy of each data block. And verifying the CRC value, and storing the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in an additional option; and adding the additional option And in the extended extra option corresponding to the header in the GZIP format, combining the plurality of compressed data blocks to obtain a compressed file, and sending the compressed file to the receiving end, In order to facilitate parallel decompression of the compressed file by the receiving end.

For the implementation process of the functions and functions of the server, refer to the corresponding implementation process in the foregoing method, and details are not described herein.

Correspondingly, the embodiment of the present invention further provides another server, which is shown in FIG. 6. The server includes: a processor 61 and a decompression engine group 62, wherein the processor 61 is configured to compress files. Obtaining, in an additional option of the header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block and the number of data blocks The compressed file is divided into blocks to obtain respective compressed data blocks. The decompression engine group 62 is configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks. The processor 61 is further used to Calculating a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block is the same as a CRC value of each data block obtained by decompression, the data block is The original data blocks are consistent; the respective data blocks obtained by the decompression are combined to obtain the original file.

In the embodiment of the present invention, in order to enable a compressed file containing multiple compressed data blocks to be decompressed in parallel during decompression, the advantages of multi-core or multi-channel technology are exerted. On the basis of the existing GZIP format, in the embodiment of the present invention, the length information of each block block and the CRC32 value of each block block original data block are stored in an additional option of the header extension option during compression, so that When decompressing, parallel decompression is performed according to the length information of each block block and the CRC32 value of the original block of each block block, thereby improving the speed and efficiency of understanding compression.

In order to facilitate the understanding of those skilled in the art, the following is a specific application example. FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention; as shown in the figure, the compression mode mainly uses multiple compression engines of hardware or software to perform parallel on a block block. Compression, the entire compression process mainly includes:

1. The processor divides the original file into sub-data blocks, for example, splits the file into n sub-files, that is, sub-file 1, sub-file 2, and sub-file n, and counts the number of each sub-file, for example, n ;

2. The processor calculates the length of the extended data (XLEN) according to the number of subfiles (ie, n), and applies for storing the memory occupied by the extended data;

3. The processor transmits the respective sub-blocks to the corresponding respective compression engines (compression engine groups), and each of the compression engines performs parallel compression on the corresponding sub-files, and calculates a CRC32 value of the data block; After compressing each subfile into a compressed subfile, each compression engine stores the length of the compressed subfile and the CRC32 value of the atomic file into an additional option in the extended option, wherein the length of the subfile is in units of bits; At the same time, it is also necessary to store the number of each subfile, and the length of the extended data, into an additional optional option selected by the extension;

5. When all the data blocks are compressed, the processor adds additional options to the corresponding position of the compressed file header (ie, expandable options), and then merges the compressed subfiles to obtain a compressed file.

In the compressed embodiment, since the extended data information is written in the original GZIP manner, the file compressed in this way, any program that can decompress other GZIP format compressed packages or the decompression engine can decompress the file. , just can't take advantage of parallel decompression. If you want to submit the efficiency of decompression, you can use the parallel decompression provided by this embodiment.

FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention. As shown in the figure, the decompression method mainly uses multiple decompression engines of hardware or software (ie, a solution). Compression engine group) Parallel decompression of a block block, the entire decompression process mainly includes:

1 The processor obtains the number of each block (ie, compressed subfile or compressed data block) and the length of each block from the additional options in the extended option in the compressed file, and according to the number of each block and the length of each block The compressed file is divided into blocks to obtain individual blocks, such as block1 and block2 up to block n.

2 The processor puts each block into the corresponding decompression engines in parallel;

3. Each decompression engine decompresses each block in parallel and calculates the CRC value corresponding to each block.

4. After the processor decompresses each block block into data blocks by each decompression engine, the processor reads the CRC value in the additional options corresponding to each block;

5. The processor compares the CRC value corresponding to each block calculated after decompression with the CRC32 value corresponding to each block read, and if the two are the same, it is confirmed that the data block is consistent with the original data block. 6. After all the blocks are decompressed, merge the decompressed data blocks to obtain the original file.

It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the terms "comprising", "comprising" or "comprising" or "comprising" or "the" Other elements, or elements that are inherent to such a process, method, item, or device. In the absence of more limitations, the elements defined by the phrase "comprising a ..." do not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. the way. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.

The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It is considered as the scope of protection of the present invention.

Claims

Rights request

A file compression method, comprising:

Calculating, according to the number of the plurality of data blocks, a length of the extended data content, and requesting memory occupied by the additional option according to the length;

Storing the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in the additional option;

Adding the additional option to the extended extra option corresponding to the header in the data compression format, and combining the plurality of compressed data blocks to obtain a compressed file;

And sending the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.

The method according to claim 1, wherein the performing the parallel compression on the plurality of data blocks comprises:

The plurality of data blocks are separately compressed in parallel by a plurality of compression engines.

The method according to claim 1 or 2, wherein the additional options further comprise: SI1 and SI2, wherein the SI1 and SI2 represent IDs of extended data in the additional options.

4. A file decompression method, comprising:

Obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block;

And compressing the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;

Performing parallel decompression on the compressed data blocks to obtain corresponding data blocks; calculating a CRC value of each of the decompressed data blocks;

When the CRC value of each data block is the same as the CRC value of each data block obtained by decompression, The respective data blocks obtained by decompressing are combined to obtain an original file.

The method according to claim 4, wherein the obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block specifically include:

The method according to claim 4 or 5, wherein the performing the parallel decompression on the compressed data blocks comprises:

7. A file compression device, comprising:

a splitting unit, configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; the first calculating unit is configured to calculate the extended data content according to the number of the plurality of data blocks Length, the memory occupied by the additional options according to the length;

a compression unit, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks; and a second calculating unit, configured to separately calculate each data block when the compression unit performs parallel compression on the plurality of data blocks Cyclic redundancy check CRC value;

Adding a unit for adding the additional option to an extended extra option corresponding to a header in a compressed format;

a merging unit, configured to: after the adding unit adds the additional option to a location corresponding to a header in the GZIP format, combining the multiple compressed data blocks to obtain a compressed file;

8. A file decompression device, comprising:

An obtaining unit, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; a dividing unit, configured to block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;

The device according to claim 8, wherein the acquiring unit is specifically configured to obtain, according to an additional option in the compressed file header extension extra option, the length of each compressed data block and the number of data blocks. And the cyclic redundancy check CRC value of each data block.

10. A server, comprising:

a processor, configured to split a file to be compressed into a plurality of data blocks, and count the number of the plurality of data blocks; calculate a length of the extended data content according to the number of the plurality of data blocks, and The length of the application for additional memory occupied by the option;

11. A server, comprising:

a processor, configured to obtain a length of each compressed data block from an additional option of a compressed file header, The number of data blocks and the cyclic redundancy check CRC value of each data block; the compressed file is divided according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;

Decompressing an engine group, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;

The processor is further configured to calculate a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block and the CRC value of each data block obtained by decompression If the same, the data block is consistent with the original data block; the respective data blocks obtained by the decompression are combined to obtain the original file.