WO2014089753A1 - File compression method, file decompression method, device and server - Google Patents

File compression method, file decompression method, device and server Download PDF

Info

Publication number
WO2014089753A1
WO2014089753A1 PCT/CN2012/086341 CN2012086341W WO2014089753A1 WO 2014089753 A1 WO2014089753 A1 WO 2014089753A1 CN 2012086341 W CN2012086341 W CN 2012086341W WO 2014089753 A1 WO2014089753 A1 WO 2014089753A1
Authority
WO
WIPO (PCT)
Prior art keywords
compressed
data blocks
data block
file
data
Prior art date
Application number
PCT/CN2012/086341
Other languages
French (fr)
Chinese (zh)
Inventor
沈慧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2012/086341 priority Critical patent/WO2014089753A1/en
Priority to CN201280003410.0A priority patent/CN103384884B/en
Publication of WO2014089753A1 publication Critical patent/WO2014089753A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Definitions

  • the present invention relates to the field of information technology, and in particular, to a file compression method, a file decompression method, an apparatus, and a server. Background technique
  • the file is first split into multiple data blocks, and then the split multiple data blocks are compressed in parallel, and then, each bit is compressed in units of bits.
  • the data blocks are combined into one compressed file.
  • the entire GZIP compressed file has only the starting address of the compressed data block, and there is no data block number of compressed data and the length of each data block.
  • the second compressed data block can be decompressed, that is, only one data block can be serially decompressed.
  • the embodiment of the invention provides a file compression method, a file decompression method, a device and a server, and the parallel decompression of data improves the speed and efficiency of decompression.
  • the first aspect provides a file compression method, including:
  • Length of the extended data content, number of data blocks, length of each compressed data block, and individual data The CRC value of the block is stored in the additional option;
  • the performing the parallel compression on the multiple data blocks comprises: performing parallel compression on the plurality of data blocks by using multiple compression engines.
  • the additional option further includes: SI1 and SI2, where the SI1 and SI2 represent additional options.
  • the ID of the extended data is a first aspect or the first possible implementation manner of the first aspect.
  • the second aspect provides a file decompression method, including:
  • the obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block specifically includes:
  • each compressed data block The length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block are obtained from the additional options in the compressed file header extension extra option.
  • the performing, the decompressing the compressed data blocks in parallel includes:
  • the plurality of compressed data blocks are respectively decompressed in parallel by a plurality of decompression engines.
  • a third aspect provides a file compression apparatus, including:
  • a splitting unit configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; the first calculating unit is configured to calculate the extended data content according to the number of the plurality of data blocks Length, the memory occupied by the additional options according to the length; a compression unit, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks; and a second calculating unit, configured to separately calculate each data block when the compression unit performs parallel compression on the plurality of data blocks Cyclic redundancy check CRC value;
  • a storage unit the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are stored in an additional option;
  • Adding a unit configured to add the additional option to an extended extra option corresponding to a header in a compressed format
  • a merging unit configured to merge the plurality of compressed data blocks to obtain a compressed file after the adding unit adds the additional option to a location corresponding to a header in the GZIP format;
  • a sending unit configured to send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.
  • the fourth aspect provides a file decompression device, including:
  • an obtaining unit configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block;
  • a dividing unit configured to block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
  • a decompression unit configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks
  • a calculating unit configured to calculate a CRC value of each of the decompressed data blocks when the decompressing unit performs parallel decompression on the compressed data blocks
  • a determining unit configured to determine whether a CRC value of each of the data blocks acquired by the acquiring unit is the same as a CRC value of each data block obtained by calculating the decompression
  • a determining unit configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block;
  • a merging unit configured to: when the determining unit determines that the data block is consistent with the original data block, merge the decompressed respective data blocks to obtain an original file.
  • the acquiring unit is specifically configured to obtain, according to an additional option in the compressed file header extension extra option, a length of each compressed data block, a number of data blocks, and Cyclic Redundancy Check CRC value for each data block.
  • the fifth aspect provides a server, including:
  • a processor configured to split a file to be compressed into a plurality of data blocks, and count the number of the plurality of data blocks; Calculating a length of the extended data content according to the number of the plurality of data blocks, and applying for an additional occupied memory according to the length;
  • a compression engine group comprising: a plurality of compression engines, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks;
  • the processor is further configured to calculate a cyclic redundancy check CRC value of each data block, and the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block. Storing in an additional option; and adding the additional option to the extended extra option corresponding to the header in the GZIP format, merging the plurality of compressed data blocks, obtaining a compressed file, and transmitting the compressed file to the receiving End, so that the receiving end performs parallel decompression on the compressed file.
  • a sixth aspect provides a server, including:
  • a processor configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block The number of data blocks is divided into blocks, and each compressed data block is obtained;
  • a decompressing engine group configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks
  • the processor is further configured to calculate a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block and the CRC value of each data block obtained by decompression If the data block is the same as the original data block, the respective data blocks obtained by the decompression are combined to obtain the original file.
  • the length of each compressed data block and the Cyclic Redundancy Check value of each data block are added to the header information.
  • the compressed file may be decompressed in parallel according to the length of each compressed data block and the CRC value of each data block, thereby improving the speed of understanding compression. effectiveness.
  • FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a file decompression method according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a file compression apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a file decompressing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another server according to an embodiment of the present disclosure.
  • FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention.
  • the technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
  • FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention. The method includes: Step 101: split a file into multiple data blocks, and count the plurality of data blocks.
  • the server (X86 server, minimalist instruction set computer, IA-64 server, etc.) for files (such as UNIX system files, locally stored files or received files, or various formats running on the operating system)
  • files such as UNIX system files, locally stored files or received files, or various formats running on the operating system
  • the files especially the text files, etc.
  • splitting methods can be used. For different data blocks, different splitting methods can be used. For example, TMPGEnc can be used to split files in MPEG format. ASF can also be used. Tools split files in ASF or WMV format, and can also use AVI chop to split files in MPEG4 format and so on.
  • the method of splitting may be split according to a fixed number of bytes, or may be equally divided according to the size of the file, and may be arbitrarily split according to requirements, which is not limited in this embodiment.
  • the file in this embodiment refers to a file suitable for GZIP compression or decompression.
  • Step 102 Calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for memory occupied by the additional option according to the length;
  • the length of the extended data content (XLEN, eXtra LENgth) is: The sum of the lengths of each data block, that is, the sum of the lengths of 10 data blocks. This embodiment defines the length occupied by all data blocks as the length of the extended data content.
  • the memory of 82bite can be applied through the malloc function.
  • the additional options are located in an extended option of the header of the compressed file, and the structure of the additional optional items is shown in Table 1 below.
  • Step 103 Perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks, and calculate a cyclic redundancy check CRC value of each data block;
  • the parallel compression needs to compress the multiple data blocks by using multiple compression engines.
  • parallel compression in the case of hardware compression, parallel compression is to use multiple compression engines to simultaneously multiple data.
  • Block compression in the case of software compression, parallel compression is the use of multi-threading technology to compress multiple data blocks while the central processing unit (CPU) has multiple physical cores.
  • the CRC check value of each data block is calculated.
  • One of the principles of the CRC algorithm is, but not limited to, the following: By means of polynomial division, the remaining numbers are check fields.
  • Polynomial division is used: The remainder is: 1010 (ie the check field is: 1010).
  • Step 104 Store the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in an additional option;
  • the file is first divided into two data blocks, the number of statistical data blocks, and the length of the extended data content is obtained according to the number of data blocks; then, after compressing each data block, Obtaining the length of each compressed data block, and calculating a CRC value of each data block, and then, the length of the extended data content, the number of data blocks, the length of the first compressed data block, and the first data block a CRC value, a length of the second compressed data block, and a CRC value of the second data block, until the length of the second compressed data block and the CRC value of the second data block are sequentially added to the corresponding XLEN field in the additional option, respectively.
  • the additional options may further include identification information such as SI1 and SI2, wherein the SI1 and SI2 are IDs of the extended data content in the additional options.
  • the SI1 and the SI2 are identification information.
  • the XLEN is the length of the extended content, that is, the length from NUM to nCRC;
  • the NUM represents the number of data blocks owned by the compressed file
  • the 1 LEN, 1 CRC to NLEN, NCRC is used to indicate extended information, including length information of each compressed data block, and specifically includes: length of each compressed data block (block) after compression and CRC32 value of each data block before compression
  • the CRC32 is a data error check code.
  • the data is checked for error by comparing whether the original data and the CRC32 value of the compressed packet decompressed data are the same.
  • the structure of the additional options includes the specific contents as shown in Table 2:
  • the adaptive repair step 105 may also be performed as needed: adding the additional option to the extended extra option corresponding to the header in the compressed format. Merging the plurality of compressed data blocks to obtain a compressed file;
  • the extended extra option may include a source file name, a comment text, or a CRC 16 and the like in addition to the additional options.
  • This embodiment mainly expands the additional options. That is, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are increased in an additional option, so that the receiving end performs each data block according to the added information. Decompress in parallel.
  • Each of the independent GZIP compressed files includes a header, a data portion, and a trailer.
  • the header may include an extended extra option, and may also include: ID1 ID2, CM, FLG, MTIME, XFL, OS, where
  • the MTIME Indicates compression time, in UNIX format;
  • the XFL indicates compression mode
  • the FLG indicates an extended function identifier, and each BIT represents an additional data, and the specific corresponding content is represented in an extra, the extra includes: an additional option, an original file name, a comment text, and a CRC16.
  • the GZIP compressed file may also include the data part and the tail part. That is, currently, each independent GZIP compressed file is composed of the header, the data part and the tail part. composition.
  • the information of the head is as described above, and details are not described herein again.
  • the data portion includes one or more data blocks (this embodiment is one or more compressed data blocks, the same below), and the format of each data block includes BFINAL, BTYPE, and data DATA information.
  • the BFINAL bit occupies lbit, indicating whether it is the last data block. If the BFINAL bit is 1, it indicates the last data block.
  • BYTPE indicates the compression mode of the data, the compression type (2 bit), either static Huffman compression (01), dynamic Huffman compression (10) or uncompressed (00); DATA indicates compressed data, (for example, LZ77+huffman Encoding + binary tree characteristics, etc.).
  • the tail includes the 32-bit CRC value of the original file and the lower 32-bit value of the original data length, and the tail is mainly used to verify whether the decompressed file is consistent with the original file before compression.
  • Step 106 Send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.
  • the existing serial decompression may be used, or the parallel decompression provided by the embodiment of the present invention may be used (as shown in the following FIG. 2 embodiment). If serial decompression is used, it is necessary to use the content of the tail of the compressed file to verify whether the decompressed file is consistent with the original file before compression; and if the parallel decompression of the present application is used, it is not necessary to use the content of the tail. To verify whether the decompressed file is consistent with the original file before compression, but to verify the decompressed data block and the original data before compression according to the respective CRC values of the additional options in the extended extra option in the header. Whether the blocks are consistent.
  • the length of each compressed data block and the CRC value of each data block are added in an additional option of the header information by using a new field, so that when the receiving end decompresses, The compressed file can be decompressed in parallel based on this information, thereby improving the speed and efficiency of understanding compression.
  • FIG. 2 is a flowchart of a file decompression method according to an embodiment of the present invention, where the method includes:
  • Step 201 Obtain a length of each compressed data block in the compressed file, a number of data blocks, and a cyclic redundancy check CRC value of each data block;
  • the process of obtaining is as follows: The server obtains the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block from the additional options in the compressed file header extension extra option.
  • Step 202 Block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
  • Step 203 performing parallel decompression on the compressed data blocks to obtain corresponding data blocks.
  • the server may input each compressed data block into a corresponding decompression engine, and respectively, by using multiple decompression engines.
  • the compressed data block is decompressed in parallel.
  • the process of parallel decompression is well known to those skilled in the art and will not be described herein.
  • Step 204 Calculate a CRC value of each of the data blocks obtained by decompression
  • Step 205 If the obtained CRC value of each data block and the decompressed data block are
  • Step 206 Combine the decompressed each of the data blocks to obtain an original file.
  • the server when decompressing, the server first obtains the length of each compressed data block and the CRC value of each data block from the compressed file, and decompresses the compressed file in parallel according to the information, and decompresses the compressed file. Data blocks can be checked for correctness by independent CRC values, which improves the speed and efficiency of compression.
  • the embodiment of the present invention further provides a file compression device, which is shown in FIG.
  • the device includes: a splitting unit 31, a first calculating unit 32, a compressing unit 33, and a second a calculating unit 34, a storage unit 35, an adding unit 36, and a merging unit 37, wherein the splitting unit 31 is configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; , can be split according to the fixed number of bytes, can be evenly divided, and can be split as needed.
  • the first calculating unit 32 is configured to calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for an additional occupied memory according to the length; the compressing unit 33 is configured to: The plurality of data blocks are compressed in parallel to obtain a plurality of compressed data blocks.
  • the plurality of data blocks may be compressed in parallel by a plurality of compression engines.
  • the second calculating unit 34 is configured to perform the compression on the compression unit 33.
  • the cyclic redundancy check CRC value of each data block is separately calculated;
  • the storage unit 35 is configured to use the length of the extended data content, the number of data blocks, and each compressed data block.
  • the length and the CRC value of each data block are stored in an additional option; the adding unit 36 is configured to add the additional option to the extended extra option corresponding to the header in the compressed format; the merging unit 37, And after the adding unit adds the additional option to a location corresponding to a header in the GZIP format, combining the plurality of compressed data blocks to obtain a compressed file.
  • the embodiment of the present invention further provides a file decompression device, which is shown in FIG. 4, and the device includes: an obtaining unit 41, a dividing unit 42, a decompressing unit 43, a calculating unit 44, and a determining unit 45. a determining unit 46, a merging unit 47, and a sending unit 48, wherein the obtaining unit 41 is configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and each data block.
  • the cyclic redundancy check CRC value is specifically used to obtain the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC of each data block from the additional options in the compressed file header extension extra option.
  • the dividing unit 42 is configured to block the compressed file according to the length of the compressed data block and the number of data blocks to obtain each compressed data block; and the decompressing unit 43 is configured to Each of the compressed data blocks is decompressed in parallel to obtain corresponding data blocks.
  • the calculating unit 44 is configured to perform, on the decompression unit, the respective compression numbers.
  • the determining unit 45 is configured to determine the CRC value of the respective data blocks acquired by the acquiring unit and calculate and decompress the obtained Whether the CRC value of each data block is the same; the determining unit 46, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block; the merging unit 47 is configured to be in the determining unit When it is determined that the data block is consistent with the original data block, the respective data blocks obtained by the decompression are combined to obtain an original file; the sending unit 48 is configured to send the compressed file to the receiving end, so as to facilitate The receiving end performs parallel decompression on the compressed file.
  • the embodiment of the present invention further provides a server, which is shown in FIG. 5.
  • the server includes: a processor 51 and a compression engine group 52, wherein the processor 51 is configured to be compressed. Splitting the file into a plurality of data blocks, and counting the number of the plurality of data blocks; calculating the length of the extended data content according to the number of the plurality of data blocks, and applying the memory occupied by the additional options according to the length
  • the compression engine group 52 includes a plurality of compression engines for performing parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks.
  • the processor 51 is further configured to calculate a cyclic redundancy of each data block.
  • the embodiment of the present invention further provides another server, which is shown in FIG. 6.
  • the server includes: a processor 61 and a decompression engine group 62, wherein the processor 61 is configured to compress files. Obtaining, in an additional option of the header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block and the number of data blocks The compressed file is divided into blocks to obtain respective compressed data blocks.
  • the decompression engine group 62 is configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks.
  • the processor 61 is further used to Calculating a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block is the same as a CRC value of each data block obtained by decompression, the data block is The original data blocks are consistent; the respective data blocks obtained by the decompression are combined to obtain the original file.
  • the advantages of multi-core or multi-channel technology are exerted.
  • the length information of each block block and the CRC32 value of each block block original data block are stored in an additional option of the header extension option during compression, so that When decompressing, parallel decompression is performed according to the length information of each block block and the CRC32 value of the original block of each block block, thereby improving the speed and efficiency of understanding compression.
  • FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention. As shown in the figure, the compression mode mainly uses multiple compression engines of hardware or software to perform parallel on a block block. Compression, the entire compression process mainly includes:
  • the processor divides the original file into sub-data blocks, for example, splits the file into n sub-files, that is, sub-file 1, sub-file 2, and sub-file n, and counts the number of each sub-file, for example, n ;
  • the processor calculates the length of the extended data (XLEN) according to the number of subfiles (ie, n), and applies for storing the memory occupied by the extended data;
  • the processor transmits the respective sub-blocks to the corresponding respective compression engines (compression engine groups), and each of the compression engines performs parallel compression on the corresponding sub-files, and calculates a CRC32 value of the data block;
  • each compression engine After compressing each subfile into a compressed subfile, each compression engine stores the length of the compressed subfile and the CRC32 value of the atomic file into an additional option in the extended option, wherein the length of the subfile is in units of bits;
  • the processor adds additional options to the corresponding position of the compressed file header (ie, expandable options), and then merges the compressed subfiles to obtain a compressed file.
  • FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention.
  • the decompression method mainly uses multiple decompression engines of hardware or software (ie, a solution).
  • Compression engine group Parallel decompression of a block block, the entire decompression process mainly includes:
  • the processor obtains the number of each block (ie, compressed subfile or compressed data block) and the length of each block from the additional options in the extended option in the compressed file, and according to the number of each block and the length of each block
  • the compressed file is divided into blocks to obtain individual blocks, such as block1 and block2 up to block n.
  • Each decompression engine decompresses each block in parallel and calculates the CRC value corresponding to each block.
  • the processor After the processor decompresses each block block into data blocks by each decompression engine, the processor reads the CRC value in the additional options corresponding to each block;
  • the processor compares the CRC value corresponding to each block calculated after decompression with the CRC32 value corresponding to each block read, and if the two are the same, it is confirmed that the data block is consistent with the original data block. 6. After all the blocks are decompressed, merge the decompressed data blocks to obtain the original file.
  • the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. the way.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A file compression method, a decompression method, a device and a server. The decompression method comprises: acquiring the length of each compressed data block, the number of data blocks and the cyclic redundancy check (CRC) value of each data block in a compressed file; according to the length of the compressed data block and the number of data blocks, blocking the compressed file to obtain each compressed data block; performing parallel decompression on each compressed data block to obtain each corresponding data block; calculating the CRC value of each data block obtained by decompression; if the acquired CRC value of each data block is the same as the CRC value of each data block obtained by decompression, the data block being consistent with an original data block; and merging each data block obtained by decompression to obtain an original file. According to the acquired length of each compressed data block and the CRC value of each data block, the present invention performs parallel decompression on the compressed file, improving the speed and efficiency of decompression.

Description

技术领域 本发明涉及信息技术领域, 特别涉及一种文件压缩方法、 文件解压缩方法、装置 及服务器。 背景技术 TECHNICAL FIELD The present invention relates to the field of information technology, and in particular, to a file compression method, a file decompression method, an apparatus, and a server. Background technique
目前, 现有的 GZIP (GNU Zip) 压缩方法中, 先将文件拆分成多个数据块, 然 后对拆分后的多个数据块进行并行压缩, 之后, 在以 bit为单位将压缩的每个数据块 合并为一个压缩文件。 由该 GZIP压缩方法可知, 整个 GZIP压缩文件只有压缩数据 块的起始地址, 并没有压缩数据的数据块数和每个数据块的长度。  At present, in the existing GZIP (GNU Zip) compression method, the file is first split into multiple data blocks, and then the split multiple data blocks are compressed in parallel, and then, each bit is compressed in units of bits. The data blocks are combined into one compressed file. According to the GZIP compression method, the entire GZIP compressed file has only the starting address of the compressed data block, and there is no data block number of compressed data and the length of each data block.
所以, 在对应的 GZIP解压缩方法中, 即对所述压缩文件进行解压缩时, 只能顺 序的一个个 bit进行读取和解析, 也就是说, 只有对所述压缩文件中的第一个压缩数 据块解压缩完成后, 才能对第二个压缩数据块进行解压, 即, 只能对一个一个数据块 进行串行解压。  Therefore, in the corresponding GZIP decompression method, that is, when the compressed file is decompressed, only one bit can be read and parsed in sequence, that is, only the first one of the compressed files is After the compressed data block is decompressed, the second compressed data block can be decompressed, that is, only one data block can be serially decompressed.
现有的 GZIP解压缩方式中, 只能对压缩文件进行串行解压缩, 解压缩的速度和 效率不高。 发明内容  In the existing GZIP decompression method, only the compressed file can be serially decompressed, and the decompression speed and efficiency are not high. Summary of the invention
本发明实施例中提供了一种文件压缩方法、 文件解压缩的方法、 装置及服务器, 数据的并行解压提高解压缩的速度和效率。  The embodiment of the invention provides a file compression method, a file decompression method, a device and a server, and the parallel decompression of data improves the speed and efficiency of decompression.
为了解决上述技术问题, 本发明实施例公开了如下技术方案:  In order to solve the above technical problem, the embodiment of the present invention discloses the following technical solutions:
第一方面提供了一种文件压缩方法, 包括:  The first aspect provides a file compression method, including:
将文件拆分成多个数据块, 并统计所述数据块的个数;  Splitting the file into multiple data blocks, and counting the number of the data blocks;
根据所述多个数据块的个数计算需要扩展数据内容的长度,并根据所述长度申请 额外可选项占用的内存;  Calculating, according to the number of the plurality of data blocks, a length of the extended data content, and applying for memory occupied by the additional option according to the length;
对所述多个数据块进行并行压缩,得到对应的多个压缩数据块, 并获取各个数据 块的循环冗余校验 CRC值;  Performing parallel compression on the plurality of data blocks to obtain corresponding compressed data blocks, and acquiring a cyclic redundancy check CRC value of each data block;
将所述扩展数据内容的长度、数据块的个数、各个压缩数据块的长度和各个数据 块的 CRC值存储在所述额外可选项中; Length of the extended data content, number of data blocks, length of each compressed data block, and individual data The CRC value of the block is stored in the additional option;
将所述额外可选项添加到数据压缩格式中头部对应的扩展 extra选项中, 合并所 述多个压缩数据块, 以获得压缩文件;  Adding the additional option to the extended extra option corresponding to the header in the data compression format, and combining the multiple compressed data blocks to obtain a compressed file;
将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进行并行解压。 在第一方面的第一种可能的实现方式中, 所述对所述多个数据块进行并行压缩, 具体包括: 通过多个压缩引擎分别对所述多个数据块进行并行压缩。  Sending the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file. In a first possible implementation manner of the first aspect, the performing the parallel compression on the multiple data blocks comprises: performing parallel compression on the plurality of data blocks by using multiple compression engines.
结合第一方面或第一方面第一种可能的实现方式, 在第二种可能的实现方式中, 所述额外可选项还包括: SI1和 SI2, 其中, 所述 SI1和 SI2表示额外可选项中扩展数 据的 ID。  With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the additional option further includes: SI1 and SI2, where the SI1 and SI2 represent additional options. The ID of the extended data.
第二方面提供了一种文件解压缩方法, 包括:  The second aspect provides a file decompression method, including:
获取压缩文件中各个压缩数据块的长度、数据块的个数以及各个数据块的循环冗 余校验 CRC值;  Obtaining a length of each compressed data block in the compressed file, a number of data blocks, and a cyclic redundancy check CRC value of each data block;
根据所述压缩数据块的长度和数据块的个数对所述压缩文件进行分块,得到各个 压缩数据块;  And compressing the compressed file according to the length of the compressed data block and the number of data blocks to obtain respective compressed data blocks;
对所述各个压缩数据块进行并行解压缩, 得到对应的各个数据块;  Performing parallel decompression on the compressed data blocks to obtain corresponding data blocks;
计算解压缩得到的所述各个数据块的 CRC值;  Calculating a CRC value of each of the data blocks obtained by decompression;
判断获取的所述各个数据块的 CRC值与解压缩得到的各个数据块的 CRC值是否 相同;  Determining whether the obtained CRC value of each data block is the same as the CRC value of each data block obtained by decompression;
在所述各个数据块的 CRC值与解压缩得到的各个数据块的 CRC值相同时,合并 解压缩得到的所述各个数据块, 得到原文件。  When the CRC value of each data block is the same as the CRC value of each data block obtained by decompression, the respective data blocks obtained by decompression are combined to obtain an original file.
在第二方面的第一种可能的实现方式中,所述获取压缩文件中各个压缩数据块的 长度、 数据块的个数以及各个数据块的循环冗余校验 CRC值具体包括:  In a first possible implementation manner of the second aspect, the obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block specifically includes:
从压缩文件头部扩展 extra选项中的额外可选项中获取各个压缩数据块的长度、 数据块的个数以及各个数据块的循环冗余校验 CRC值。  The length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block are obtained from the additional options in the compressed file header extension extra option.
结合第二方面或第二方面第一种可能的实现方式, 在第二种可能的实现方式中, 所述对所述各个压缩数据块进行并行解压缩, 具体包括:  With reference to the second aspect, or the first possible implementation manner of the second aspect, in the second possible implementation manner, the performing, the decompressing the compressed data blocks in parallel includes:
通过多个解压缩引擎分别对所述多个压缩数据块进行并行解压缩。  The plurality of compressed data blocks are respectively decompressed in parallel by a plurality of decompression engines.
第三方面提供了一种文件压缩装置, 包括:  A third aspect provides a file compression apparatus, including:
拆分单元, 用于将文件拆分成多个数据块, 并统计所述多个数据块的个数; 第一计算单元,用于根据所述多个数据块的个数计算扩展数据内容的长度,根据 所述长度申请额外可选项占用的内存; 压缩单元, 用于对所述多个数据块进行并行压缩, 得到多个压缩数据块; 第二计算单元,用于在压缩单元对所述多个数据块进行并行压缩时, 分别计算各 个数据块的循环冗余校验 CRC值; a splitting unit, configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; the first calculating unit is configured to calculate the extended data content according to the number of the plurality of data blocks Length, the memory occupied by the additional options according to the length; a compression unit, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks; and a second calculating unit, configured to separately calculate each data block when the compression unit performs parallel compression on the plurality of data blocks Cyclic redundancy check CRC value;
存储单元, 用于所述扩展数据内容的长度、数据块的个数、各个压缩数据块的长 度和各个数据块的 CRC值存储在额外可选项中;  a storage unit, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are stored in an additional option;
添加单元, 用于将所述额外可选项添加到压缩格式中头部对应的扩展 extra选项 中;  Adding a unit, configured to add the additional option to an extended extra option corresponding to a header in a compressed format;
合并单元, 用于在所述添加单元将所述额外可选项添加到 GZIP格式中头部对应 的位置后, 合并所述多个压缩数据块, 得到压缩文件;  a merging unit, configured to merge the plurality of compressed data blocks to obtain a compressed file after the adding unit adds the additional option to a location corresponding to a header in the GZIP format;
发送单元,用于将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进 行并行解压。  And a sending unit, configured to send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.
第四方面提供了一种文件解压缩装置, 包括:  The fourth aspect provides a file decompression device, including:
获取单元,用于从压缩文件头部的额外可选项中获取各个压缩数据块的长度、数 据块的个数以及各个数据块的循环冗余校验 CRC值;  And an obtaining unit, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block;
划分单元,用于根据所述压缩数据块的长度和数据块的个数对所述压缩文件进行 分块, 得到各个压缩数据块;  a dividing unit, configured to block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
解压缩单元,用于对所述各个压缩数据块进行并行解压缩,得到对应的各个数据 块;  a decompression unit, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;
计算单元,用于在解压缩单元对所述各个压缩数据块进行并行解压缩时, 计算解 压缩得到的所述各个数据块的 CRC值;  a calculating unit, configured to calculate a CRC value of each of the decompressed data blocks when the decompressing unit performs parallel decompression on the compressed data blocks;
判断单元, 用于判断所述获取单元获取的所述各个数据块的 CRC值与计算解压 缩得到的各个数据块的 CRC值是否相同;  a determining unit, configured to determine whether a CRC value of each of the data blocks acquired by the acquiring unit is the same as a CRC value of each data block obtained by calculating the decompression;
确定单元, 用于在判断单元判断 CRC值相同时, 确定所述数据块与原数据块一 致;  a determining unit, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block;
合并单元,用于在所述确定单元合确定所述数据块与原数据块一致时,合并解压 缩得到的所述各个数据块, 得到原文件。  And a merging unit, configured to: when the determining unit determines that the data block is consistent with the original data block, merge the decompressed respective data blocks to obtain an original file.
在第四方面的第一种可能的实现方式中,所述获取单元, 具体用于从压缩文件头 部扩展 extra选项中的额外可选项中获取各个压缩数据块的长度、 数据块的个数以及 各个数据块的循环冗余校验 CRC值。  In a first possible implementation manner of the fourth aspect, the acquiring unit is specifically configured to obtain, according to an additional option in the compressed file header extension extra option, a length of each compressed data block, a number of data blocks, and Cyclic Redundancy Check CRC value for each data block.
第五方面提供了一种服务器, 包括:  The fifth aspect provides a server, including:
处理器,用于将待压缩的文件拆分成多个数据块,并统计所述多个数据块的个数; 根据所述多个数据块的个数计算扩展数据内容的长度,以及根据所述长度申请额外可 选项占用的内存; a processor, configured to split a file to be compressed into a plurality of data blocks, and count the number of the plurality of data blocks; Calculating a length of the extended data content according to the number of the plurality of data blocks, and applying for an additional occupied memory according to the length;
压缩引擎组, 包括多个压缩引擎, 用于对所述多个数据块进行并行压缩, 得到多 个压缩数据块;  a compression engine group, comprising: a plurality of compression engines, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks;
所述处理器, 还用于计算各个数据块的循环冗余校验 CRC值, 并将所述扩展数 据内容的长度、 数据块的个数、 各个压缩数据块的长度和各个数据块的 CRC值存储 在额外可选项中; 以及将所述额外可选项添加到 GZIP格式中头部对应的扩展 extra 选项中,合并所述多个压缩数据块,得到压缩文件,并将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进行并行解压。  The processor is further configured to calculate a cyclic redundancy check CRC value of each data block, and the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block. Storing in an additional option; and adding the additional option to the extended extra option corresponding to the header in the GZIP format, merging the plurality of compressed data blocks, obtaining a compressed file, and transmitting the compressed file to the receiving End, so that the receiving end performs parallel decompression on the compressed file.
第六方面提供了一种服务器, 包括:  A sixth aspect provides a server, including:
处理器,用于从压缩文件头部的额外可选项中获取各个压缩数据块的长度、数据 块的个数以及各个数据块的循环冗余校验 CRC值; 根据所述压缩数据块的长度和数 据块的个数对所述压缩文件进行分块, 得到各个压缩数据块;  a processor, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block The number of data blocks is divided into blocks, and each compressed data block is obtained;
解压缩引擎组,用于对所述各个压缩数据块进行并行解压缩,得到对应的各个数 据块;  a decompressing engine group, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;
所述处理器,还用于计算解压缩得到的所述各个数据块的循环冗余校验 CRC值; 如果判断获取的所述各个数据块的 CRC值与解压缩得到的各个数据块的 CRC值相 同,则所述数据块与原数据块一致;合并解压缩得到的所述各个数据块,得到原文件。  The processor is further configured to calculate a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block and the CRC value of each data block obtained by decompression If the data block is the same as the original data block, the respective data blocks obtained by the decompression are combined to obtain the original file.
由上述技术方案可知, 本发明实施例中, 在对文件进行压缩时, 将各个压缩数据 块的长度和各个数据块的循环冗余校验 (CRC, Cyclic Redundancy Check) 值添加在 头部信息的额外可选项中, 以便于接收端在对该压缩文件解压时,可以根据所述各个 压缩数据块的长度和各个数据块的 CRC值对该压缩文件进行并行解压缩, 从而提高 了解压缩的速度与效率。 附图说明  According to the foregoing technical solution, in the embodiment of the present invention, when the file is compressed, the length of each compressed data block and the Cyclic Redundancy Check value of each data block are added to the header information. In an additional option, when the receiving end decompresses the compressed file, the compressed file may be decompressed in parallel according to the length of each compressed data block and the CRC value of each data block, thereby improving the speed of understanding compression. effectiveness. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现 有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅 是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前 提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图 1为本发明实施例提供的一种文件压缩方法的流程图;  FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention;
图 2为本发明实施例提供的一种文件解压缩方法的流程图; 图 3为本发明实施例提供的一种文件压缩装置的结构示意图; 2 is a flowchart of a file decompression method according to an embodiment of the present invention; FIG. 3 is a schematic structural diagram of a file compression apparatus according to an embodiment of the present disclosure;
图 4为本发明实施例提供的一种文件解压缩装置的结构示意图;  4 is a schematic structural diagram of a file decompressing apparatus according to an embodiment of the present invention;
图 5为本发明实施例提供的一种服务器的结构示意图;  FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present disclosure;
图 6为本发明实施例提供的另一种服务器的结构示意图;  FIG. 6 is a schematic structural diagram of another server according to an embodiment of the present disclosure;
图 7为本发明实施例提供的一种文件压缩方法的应用实例的流程图;  FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention;
图 8为本发明实施例提供的一种文件解压缩方法的应用实例的流程图。 具体实施方式 下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完 整的描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。 基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的 所有其他实施例, 都属于本发明保护的范围。 请参阅图 1, 图 1为本发明实施例提供的一种文件压缩方法的流程图; 所述方法 包括: 步骤 101 : 将文件拆分成多个数据块, 并统计所述多个数据块的个数; 其中, 服务器 (X86服务器、 简约指令集计算机、 IA-64服务器等) 对文件 (比 如 UNIX系统的文件、本地存储的文件或者接收到的文件, 或者是操作系统上运行的 各种格式的文件, 特别是文本文件等等) 拆分方式有多种, 针对不同格式的数据块, 可以采用的不同的拆分方法, 比如, 可以采用 TMPGEnc来拆分 MPEG格式的文件; 还可以采用 ASF Tools拆分 ASF或 WMV格式的文件, 还可以采用 AVI chop拆分 MPEG4格式的文件等等。  FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. Referring to FIG. 1, FIG. 1 is a flowchart of a file compression method according to an embodiment of the present invention. The method includes: Step 101: split a file into multiple data blocks, and count the plurality of data blocks. Number; where, the server (X86 server, minimalist instruction set computer, IA-64 server, etc.) for files (such as UNIX system files, locally stored files or received files, or various formats running on the operating system) The files, especially the text files, etc.) There are many ways to split. For different data blocks, different splitting methods can be used. For example, TMPGEnc can be used to split files in MPEG format. ASF can also be used. Tools split files in ASF or WMV format, and can also use AVI chop to split files in MPEG4 format and so on.
其中, 拆分的方式可以按照固定字节数拆分, 也可以按照文件的大小均分, 还可 以根据需要任意拆分, 本实施例不作限制。  The method of splitting may be split according to a fixed number of bytes, or may be equally divided according to the size of the file, and may be arbitrarily split according to requirements, which is not limited in this embodiment.
本实施例中的文件, 是指适用于 GZIP压缩或解压缩的文件。  The file in this embodiment refers to a file suitable for GZIP compression or decompression.
步骤 102: 根据所述多个数据块的个数计算扩展数据内容的长度, 并根据所述长 度申请额外可选项占用的内存;  Step 102: Calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for memory occupied by the additional option according to the length;
比如, 如果将文件拆分成 10个数据块, 则扩展数据内容的长度 (XLEN, eXtra LENgth) 就是: 每个数据块的长度之和, 即 10个数据块的长度之和。 本实施例将所 有数据块占用的长度定义为扩展数据内容的长度。  For example, if the file is split into 10 data blocks, the length of the extended data content (XLEN, eXtra LENgth) is: The sum of the lengths of each data block, that is, the sum of the lengths of 10 data blocks. This embodiment defines the length occupied by all data blocks as the length of the extended data content.
也就是说, 所述 XLEN是可选项字节数。 即下述表 1的 NUM~nCRC的字节数。 比如, 如果将文件拆分成 10个数据块的情况下, 参考下述表 2中每个表示项的 字节数, S卩 XLEN=2 (NUM) + (4+4) * 10=82。 That is, the XLEN is an optional number of bytes. That is, the number of bytes of NUM~nCRC in Table 1 below. For example, if the file is split into 10 data blocks, refer to the number of bytes of each of the items in Table 2 below, S卩XLEN=2 (NUM) + (4+4) * 10=82.
然后, 由于计算出 XLEN的长度为 82, 可以通过 malloc函数来申请 82bite的内 存。  Then, since the length of XLEN is calculated to be 82, the memory of 82bite can be applied through the malloc function.
本实施例中, 所述额外可选项位于压缩文件的头部的扩展选项中, 其额外可选项 的结构具体详见下述表 1所示。  In this embodiment, the additional options are located in an extended option of the header of the compressed file, and the structure of the additional optional items is shown in Table 1 below.
步骤 103 : 对所述多个数据块进行并行压缩, 得到多个压缩数据块, 并计算各个 数据块的循环冗余校验 CRC值;  Step 103: Perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks, and calculate a cyclic redundancy check CRC value of each data block;
在该实施例中, 并行压缩需要通过多个压缩引擎分别对所述多个数据块进行压 缩, 本实施例中, 在硬件压缩的场合下, 并行压缩是利用多个压缩引擎同时对多个数 据块进行压缩; 在软件压缩的场合下, 并行压缩是在中央处理器 (CPU, Central Processing Unit) 有多个物理核的情况下, 利用多线程技术, 同时对多个数据块进行 压缩。  In this embodiment, the parallel compression needs to compress the multiple data blocks by using multiple compression engines. In this embodiment, in the case of hardware compression, parallel compression is to use multiple compression engines to simultaneously multiple data. Block compression; in the case of software compression, parallel compression is the use of multi-threading technology to compress multiple data blocks while the central processing unit (CPU) has multiple physical cores.
通常情况下, 为了校验各个数据解压后是否正确, 需要计算各个数据块的 CRC 校验值, 以便于通过 CRC算法进行校验, 其中 C R C的原理是:  In general, in order to verify whether the data is decompressed correctly, the CRC check value of each data block needs to be calculated, so as to be verified by the CRC algorithm, wherein the principle of C R C is:
而计算各个数据块的 CRC校验值,其中一种 CRC算法的原理为,但并不限于此: 下面, 借助于多项式除法, 其余数为校验字段。  The CRC check value of each data block is calculated. One of the principles of the CRC algorithm is, but not limited to, the following: By means of polynomial division, the remaining numbers are check fields.
例如: 数据段代码为: 1011001 ; 对应 m(x)=x6+x4+x3+l  For example: The data segment code is: 1011001 ; Corresponding m(x)=x6+x4+x3+l
假设生成多项式为: gCx)=x4+x3+l ; 则对应 gCx)的代码为: 11001 x4m(x)=xl 0+χ8+χ7+χ4 对应的代码记为: 10110010000; Suppose the generator polynomial is: gCx)=x4+x3+l ; then the code corresponding to gCx) is: 11001 x4m(x)=xl 0+χ8+χ7+χ4 The corresponding code is recorded as: 10110010000;
采用多项式除法: 得余数为: 1010 (即校验字段为: 1010)。  Polynomial division is used: The remainder is: 1010 (ie the check field is: 1010).
当然, 对于本领域技术人员, 还可以采用其他的 CRC算法, 本实例不作限制。 步骤 104: 将所述扩展数据内容的长度、 数据块的个数、 各个压缩数据块的长度 和各个数据块的 CRC值存储在额外可选项中;  Of course, other CRC algorithms may also be used by those skilled in the art, and the present embodiment is not limited. Step 104: Store the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in an additional option;
比如, 在上述过程中, 先将文件拆分成 Ν个数据块, 统计数据块的个数, 以及 根据数据块的个数得到扩展数据内容的长度; 然后, 再对各个数据块进行压缩后, 获 知每个压缩数据块的长度, 并计算得到各个数据块的 CRC值, 然后, 将所述扩展数 据内容的长度、 数据块的个数、 第一个压缩数据块的长度和第一数据块的 CRC值, 第二个压缩数据块的长度和第二数据块的 CRC值, 一直到第 Ν个压缩数据块的长度 和第 Ν数据块的 CRC值依次分别添加到额外可选项中对应的 XLEN字段、 NUM字 段、 1LEN字段、 1CRC字段; 2LEN字段、 2CRC字段; 以及 NLEN字段、 NCRC字 段中; For example, in the above process, the file is first divided into two data blocks, the number of statistical data blocks, and the length of the extended data content is obtained according to the number of data blocks; then, after compressing each data block, Obtaining the length of each compressed data block, and calculating a CRC value of each data block, and then, the length of the extended data content, the number of data blocks, the length of the first compressed data block, and the first data block a CRC value, a length of the second compressed data block, and a CRC value of the second data block, until the length of the second compressed data block and the CRC value of the second data block are sequentially added to the corresponding XLEN field in the additional option, respectively. , NUM field, 1 LEN field, 1 CRC field; 2 LEN field, 2 CRC field; and NLEN field, NCRC word In the paragraph;
进一步, 所述额外可选项还可以包括标识信息, 比如 SI1和 SI2, 其中, 所述 SI1 和 SI2为额外可选项中扩展数据内容的 ID。  Further, the additional options may further include identification information such as SI1 and SI2, wherein the SI1 and SI2 are IDs of the extended data content in the additional options.
具体的, 所述额外可选项的结构具体如表 1所示:  Specifically, the structure of the additional optional items is specifically as shown in Table 1:
表 1
Figure imgf000009_0001
其中, 所述 SI1和 SI2为标识信息;
Table 1
Figure imgf000009_0001
The SI1 and the SI2 are identification information.
所述 XLEN为扩展内容的长度, 即从 NUM到 nCRC的长度;  The XLEN is the length of the extended content, that is, the length from NUM to nCRC;
所述 NUM, 表示该压缩文件拥有的数据块数;  The NUM represents the number of data blocks owned by the compressed file;
所述 1LEN, 1CRC至 NLEN, NCRC, 用于表示扩展信息, 包括各个压缩数据块 的长度信息, 具体包括: 压缩后的各个压缩数据块 (block) 的长度和压缩前的各个 数据块的 CRC32值,所述 CRC32是一种数据差错校验码,在数据通信中以及压缩等 情况下, 通过比较原始数据和压缩包解压数据的 CRC32值是否相同来校验数据是否 出错。 其中, 额外可选项的结构中包括具体内容如表 2所示:  The 1 LEN, 1 CRC to NLEN, NCRC, is used to indicate extended information, including length information of each compressed data block, and specifically includes: length of each compressed data block (block) after compression and CRC32 value of each data block before compression The CRC32 is a data error check code. In the case of data communication and compression, the data is checked for error by comparing whether the original data and the CRC32 value of the compressed packet decompressed data are the same. Among them, the structure of the additional options includes the specific contents as shown in Table 2:
表 2  Table 2
Figure imgf000009_0002
其中, 表 2所示内容只是举例说明, 并不限于此, 还可以根据需要进行适应性修 步骤 105: 将所述额外可选项添加到压缩格式中头部对应的扩展 extra选项中, 合并所述多个压缩数据块, 以得到压缩文件;
Figure imgf000009_0002
The content shown in Table 2 is only an example, and is not limited thereto. The adaptive repair step 105 may also be performed as needed: adding the additional option to the extended extra option corresponding to the header in the compressed format. Merging the plurality of compressed data blocks to obtain a compressed file;
进一步, 所述扩展 extra选项除了包括所述额外可选项外, 还可以包括源文件名、 注释文字或 CRC16等。 本实施例主要是对额外可选项进行扩展。 即在额外可选项中 增加所述扩展数据内容的长度、数据块的个数、各个压缩数据块的长度和各个数据块 的 CRC值, 以便于接收端根据所述增加的信息对各个数据块进行并行解压。  Further, the extended extra option may include a source file name, a comment text, or a CRC 16 and the like in addition to the additional options. This embodiment mainly expands the additional options. That is, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are increased in an additional option, so that the receiving end performs each data block according to the added information. Decompress in parallel.
本实施例对文件的压缩适用于 GZIP压缩格式。 每个独立的 GZIP压缩文件都包 括头部、 数据部和尾部, 其中, 头部可以包括扩展 extra选项, 之外还可以包括: ID1 ID2、 CM、 FLG、 MTIME、 XFL、 OS, 其中,  The compression of the file in this embodiment is applicable to the GZIP compression format. Each of the independent GZIP compressed files includes a header, a data portion, and a trailer. The header may include an extended extra option, and may also include: ID1 ID2, CM, FLG, MTIME, XFL, OS, where
所述 ID1 ID2为固定值, S卩 ID1 = 0X1F, ID2 = 0X8B; 用于标识 GZIP格式; 所述 CM: 表示压缩方法, 目前只有一个值, 即 CM=8, 表示 DEFLATE方法; 所述 MTIME: 表示压缩时间, 采用 UNIX格式;  The ID1 ID2 is a fixed value, S卩ID1 = 0X1F, ID2 = 0X8B; is used to identify the GZIP format; the CM: represents a compression method, currently only one value, that is, CM=8, indicating the DEFLATE method; the MTIME: Indicates compression time, in UNIX format;
所述 XFL: 表示压缩模式, XFL=2: 表示最大压缩但最慢算法; XFL=4, 表示最 快但最小压缩算法; The XFL: indicates compression mode, XFL=2 : indicates maximum compression but the slowest algorithm; XFL=4 indicates the fastest but minimum compression algorithm;
所述 OS: 表示文件系统, 例如: OS=0表示 FAT文件系统; OS=3表示 UNIX文 件系统;  The OS: represents a file system, for example: OS=0 indicates a FAT file system; OS=3 indicates a UNIX file system;
所述 FLG: 表示扩展功能标识, 每个 BIT表示一种附加数据, 具体的对应内容 会在 extra里面表示, 所述 extra包括: 额外可选项, 原文件名, 注释文字和 CRC16 等。  The FLG: indicates an extended function identifier, and each BIT represents an additional data, and the specific corresponding content is represented in an extra, the extra includes: an additional option, an original file name, a comment text, and a CRC16.
上述描述了 GZIP压缩文件的头部的内容信息, 之外, GZIP压缩文件还可以包 括数据部和尾部, 也就是说, 目前, 每个独立的 GZIP压缩文件都是由头部, 数据部 和尾部组成。 其中, 所述头部的信息详见上述, 在此不再赘述。  The above describes the content information of the header of the GZIP compressed file. In addition, the GZIP compressed file may also include the data part and the tail part. That is, currently, each independent GZIP compressed file is composed of the header, the data part and the tail part. composition. The information of the head is as described above, and details are not described herein again.
所述数据部包括一个或者多个数据块 (本实施例即一个或多个压缩数据块, 下 同), 每个数据块的格式包括 BFINAL、 BTYPE和数据 DATA信息。 BFINAL位占用 lbit, 表示是否为最后一个数据块, 如果为 BFINAL位为 1时表示最后一个数据块。 BYTPE表示数据的压缩方式, 压缩类型 (2 bit), 可能是静态 Huffman压缩 (01)、 动 态 Huffman 压缩(10)也可能是表示不压缩 (00) ; DATA 表示压缩数据, (比如, LZ77+huffman编码 +二叉树特性等)。  The data portion includes one or more data blocks (this embodiment is one or more compressed data blocks, the same below), and the format of each data block includes BFINAL, BTYPE, and data DATA information. The BFINAL bit occupies lbit, indicating whether it is the last data block. If the BFINAL bit is 1, it indicates the last data block. BYTPE indicates the compression mode of the data, the compression type (2 bit), either static Huffman compression (01), dynamic Huffman compression (10) or uncompressed (00); DATA indicates compressed data, (for example, LZ77+huffman Encoding + binary tree characteristics, etc.).
所述尾部, 包括原文件的 32位 CRC值和以及原始数据长度的低 32位值, 尾部 主要用来验证解压后的文件和压缩前的原文件是否一致。  The tail includes the 32-bit CRC value of the original file and the lower 32-bit value of the original data length, and the tail is mainly used to verify whether the decompressed file is consistent with the original file before compression.
步骤 106: 将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进行并 行解压。 需要说明的是, 在该实施例中, 针对本实施例的压缩方式, 可以采用现有的串行 解压, 也可以采用本发明实施例提供的并行解压 (具体如下述图 2实施例所示), 如 果采用串行解压缩,就需要利用该压缩文件尾部的内容来校验解压后的文件和压缩前 的原文件是否一致; 而如果采用本申请的并行解压缩, 不需要利用尾部的内容用来验 证解压后的文件和压缩前的原文件是否一致, 而是需要根据头部中扩展 extra选项中 的额外可选项的各个 CRC值来校验解压后的各个数据块和压缩前的原各个数据块是 否一致。 Step 106: Send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file. It should be noted that, in this embodiment, for the compression mode of the embodiment, the existing serial decompression may be used, or the parallel decompression provided by the embodiment of the present invention may be used (as shown in the following FIG. 2 embodiment). If serial decompression is used, it is necessary to use the content of the tail of the compressed file to verify whether the decompressed file is consistent with the original file before compression; and if the parallel decompression of the present application is used, it is not necessary to use the content of the tail. To verify whether the decompressed file is consistent with the original file before compression, but to verify the decompressed data block and the original data before compression according to the respective CRC values of the additional options in the extended extra option in the header. Whether the blocks are consistent.
本发明实施例中,在对文件进行压缩时,将各个压缩数据块的长度和各个数据块 的 CRC值通过新增字段添加在头部信息的额外可选项中, 以便于在接收端解压时, 可以根据这些信息对该压缩文件进行并行解压缩, 从而提高了解压缩的速度与效率。  In the embodiment of the present invention, when the file is compressed, the length of each compressed data block and the CRC value of each data block are added in an additional option of the header information by using a new field, so that when the receiving end decompresses, The compressed file can be decompressed in parallel based on this information, thereby improving the speed and efficiency of understanding compression.
还请参阅图 2, 图 2为本发明实施例提供的一种文件解压缩方法的流程图, 所述 方法包括:  Referring to FIG. 2, FIG. 2 is a flowchart of a file decompression method according to an embodiment of the present invention, where the method includes:
步骤 201 : 获取压缩文件中各个压缩数据块的长度、 数据块的个数以及各个数据 块的循环冗余校验 CRC值;  Step 201: Obtain a length of each compressed data block in the compressed file, a number of data blocks, and a cyclic redundancy check CRC value of each data block;
其获取的过程为: 服务器从压缩文件头部扩展 extra选项中的额外可选项中获取 各个压缩数据块的长度、 数据块的个数以及各个数据块的循环冗余校验 CRC值。  The process of obtaining is as follows: The server obtains the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block from the additional options in the compressed file header extension extra option.
步骤 202: 根据所述压缩数据块的长度和数据块的个数对所述压缩文件进行分 块, 得到各个压缩数据块;  Step 202: Block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
步骤 203 : 对所述各个压缩数据块进行并行解压缩, 得到对应的各个数据块; 具体可以服务器将各个压缩数据块输入到对应解压缩引擎中,通过多个解压缩引 擎分别对所述多个压缩数据块进行并行解压缩。其中并行解压缩的过程对于本领域技 术人员来说, 已是熟知技术, 在此不再赘述。  Step 203: performing parallel decompression on the compressed data blocks to obtain corresponding data blocks. Specifically, the server may input each compressed data block into a corresponding decompression engine, and respectively, by using multiple decompression engines. The compressed data block is decompressed in parallel. The process of parallel decompression is well known to those skilled in the art and will not be described herein.
步骤 204: 计算解压缩得到的所述各个数据块的 CRC值;  Step 204: Calculate a CRC value of each of the data blocks obtained by decompression;
其计算过程对于本领域技术人员已是熟知技术, 在此不再赘述。  The calculation process is well known to those skilled in the art and will not be described herein.
步骤 205 : 如果获取的所述各个数据块的 CRC值与解压缩得到的各个数据块的 Step 205: If the obtained CRC value of each data block and the decompressed data block are
CRC值相同, 则所述各个数据块与原各个数据块一致; If the CRC values are the same, the respective data blocks are consistent with the original data blocks;
步骤 206: 合并解压缩得到的所述各个数据块, 得到原文件。  Step 206: Combine the decompressed each of the data blocks to obtain an original file.
本发明实施例中,在解压缩时,服务器先从压缩文件中获取各个压缩数据块的长 度和各个数据块的 CRC值, 根据这些信息对该压缩文件进行并行解压缩, 并对解压 缩后的数据块可以分别通过独立的 CRC值进行正确性检查, 从而提高了解压缩的速 度与效率。 基于上述方法的实现过程,本发明实施例还提供一种文件压缩装置,其结构示意 图如图 3所示, 所述装置包括: 拆分单元 31, 第一计算单元 32, 压缩单元 33, 第二 计算单元 34, 存储单元 35, 添加单元 36和合并单元 37, 其中, 所述拆分单元 31, 用于将文件拆分成多个数据块, 并统计所述多个数据块的个数; 其中, 可以按照固定 字节数拆分, 可以均分, 还可以根据需要任意拆分。 所述第一计算单元 32, 用于根 据所述多个数据块的个数计算扩展数据内容的长度,根据所述长度申请额外可选项占 用的内存; 所述压缩单元 33, 用于对所述多个数据块进行并行压缩, 得到多个压缩 数据块, 具体可以通过多个压缩引擎分别对所述多个数据块进行并行压缩; 所述第二 计算单元 34, 用于在压缩单元 33对所述多个数据块进行并行压缩时, 分别计算各个 数据块的循环冗余校验 CRC值; 所述存储单元 35, 用于所述扩展数据内容的长度、 数据块的个数、各个压缩数据块的长度和各个数据块的 CRC值存储在额外可选项中; 所述添加单元 36, 用于将所述额外可选项添加到压缩格式中头部对应的扩展 extra选 项中; 所述合并单元 37, 用于在所述添加单元将所述额外可选项添加到 GZIP格式中 头部对应的位置后, 合并所述多个压缩数据块, 得到压缩文件。 In the embodiment of the present invention, when decompressing, the server first obtains the length of each compressed data block and the CRC value of each data block from the compressed file, and decompresses the compressed file in parallel according to the information, and decompresses the compressed file. Data blocks can be checked for correctness by independent CRC values, which improves the speed and efficiency of compression. Based on the implementation process of the foregoing method, the embodiment of the present invention further provides a file compression device, which is shown in FIG. 3, and the device includes: a splitting unit 31, a first calculating unit 32, a compressing unit 33, and a second a calculating unit 34, a storage unit 35, an adding unit 36, and a merging unit 37, wherein the splitting unit 31 is configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; , can be split according to the fixed number of bytes, can be evenly divided, and can be split as needed. The first calculating unit 32 is configured to calculate a length of the extended data content according to the number of the plurality of data blocks, and apply for an additional occupied memory according to the length; the compressing unit 33 is configured to: The plurality of data blocks are compressed in parallel to obtain a plurality of compressed data blocks. Specifically, the plurality of data blocks may be compressed in parallel by a plurality of compression engines. The second calculating unit 34 is configured to perform the compression on the compression unit 33. When the plurality of data blocks are compressed in parallel, the cyclic redundancy check CRC value of each data block is separately calculated; the storage unit 35 is configured to use the length of the extended data content, the number of data blocks, and each compressed data block. The length and the CRC value of each data block are stored in an additional option; the adding unit 36 is configured to add the additional option to the extended extra option corresponding to the header in the compressed format; the merging unit 37, And after the adding unit adds the additional option to a location corresponding to a header in the GZIP format, combining the plurality of compressed data blocks to obtain a compressed file.
所述装置中各个单元的功能和作用的实现过程, 详见上述方法中对应的实现过 程, 在此不再赘述。  For the implementation process of the functions and functions of the various units in the device, refer to the corresponding implementation process in the foregoing method, and details are not described herein again.
相应的, 本发明实施例还提供一种文件解压缩装置, 其结构示意图如图 4所示, 所述装置包括: 获取单元 41, 划分单元 42, 解压缩单元 43, 计算单元 44, 判断单元 45, 确定单元 46、 合并单元 47和发送单元 48, 其中, 所述获取单元 41, 用于从压 缩文件头部的额外可选项中获取各个压缩数据块的长度、数据块的个数以及各个数据 块的循环冗余校验 CRC值,具体用于从压缩文件头部扩展 extra选项中的额外可选项 中获取各个压缩数据块的长度、 数据块的个数以及各个数据块的循环冗余校验 CRC 值; 所述划分单元 42, 用于根据所述压缩数据块的长度和数据块的个数对所述压缩 文件进行分块, 得到各个压缩数据块; 所述解压缩单元 43, 用于对所述各个压缩数 据块进行并行解压缩, 得到对应的各个数据块; 所述计算单元 44, 用于在解压缩单 元对所述各个压缩数据块进行并行解压缩时, 计算解压缩得到的所述各个数据块的 CRC值; 所述判断单元 45,用于判断所述获取单元获取的所述各个数据块的 CRC值 与计算解压缩得到的各个数据块的 CRC值是否相同; 所述确定单元 46, 用于在判断 单元判断 CRC值相同时, 确定所述数据块与原数据块一致; 所述合并单元 47, 用于 在所述确定单元合确定所述数据块与原数据块一致时,合并解压缩得到的所述各个数 据块, 得到原文件; 所述发送单元 48, 用于将所述压缩文件发送给接收端, 以便于 接收端对所述压缩文件进行并行解压。 Correspondingly, the embodiment of the present invention further provides a file decompression device, which is shown in FIG. 4, and the device includes: an obtaining unit 41, a dividing unit 42, a decompressing unit 43, a calculating unit 44, and a determining unit 45. a determining unit 46, a merging unit 47, and a sending unit 48, wherein the obtaining unit 41 is configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and each data block. The cyclic redundancy check CRC value is specifically used to obtain the length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC of each data block from the additional options in the compressed file header extension extra option. The dividing unit 42 is configured to block the compressed file according to the length of the compressed data block and the number of data blocks to obtain each compressed data block; and the decompressing unit 43 is configured to Each of the compressed data blocks is decompressed in parallel to obtain corresponding data blocks. The calculating unit 44 is configured to perform, on the decompression unit, the respective compression numbers. When the block performs parallel decompression, the CRC value of each of the decompressed data blocks is calculated; the determining unit 45 is configured to determine the CRC value of the respective data blocks acquired by the acquiring unit and calculate and decompress the obtained Whether the CRC value of each data block is the same; the determining unit 46, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block; the merging unit 47 is configured to be in the determining unit When it is determined that the data block is consistent with the original data block, the respective data blocks obtained by the decompression are combined to obtain an original file; the sending unit 48 is configured to send the compressed file to the receiving end, so as to facilitate The receiving end performs parallel decompression on the compressed file.
所述装置中各个单元的功能和作用的实现过程, 详见上述方法中对应的实现过 程, 在此不再赘述。  For the implementation process of the functions and functions of the various units in the device, refer to the corresponding implementation process in the foregoing method, and details are not described herein again.
相应的, 本发明实施例还提供一种服务器, 其结构示意图如图 5所示, 所述服务 器包括: 处理器 51和压缩引擎组 52, 其中, 所述处理器 51, 用于将待压缩的文件拆 分成多个数据块, 并统计所述多个数据块的个数; 根据所述多个数据块的个数计算扩 展数据内容的长度, 以及根据所述长度申请额外可选项占用的内存; 所述压缩引擎组 52,包括对个压缩引擎,用于对所述多个数据块进行并行压缩,得到多个压缩数据块; 所述处理器 51, 还用于计算各个数据块的循环冗余校验 CRC值, 并将所述扩展数据 内容的长度、 数据块的个数、 各个压缩数据块的长度和各个数据块的 CRC值存储在 额外可选项中; 以及将所述额外可选项添加到 GZIP格式中头部对应的扩展 extra选 项中, 合并所述多个压缩数据块, 得到压缩文件, 并将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进行并行解压。  Correspondingly, the embodiment of the present invention further provides a server, which is shown in FIG. 5. The server includes: a processor 51 and a compression engine group 52, wherein the processor 51 is configured to be compressed. Splitting the file into a plurality of data blocks, and counting the number of the plurality of data blocks; calculating the length of the extended data content according to the number of the plurality of data blocks, and applying the memory occupied by the additional options according to the length The compression engine group 52 includes a plurality of compression engines for performing parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks. The processor 51 is further configured to calculate a cyclic redundancy of each data block. And verifying the CRC value, and storing the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in an additional option; and adding the additional option And in the extended extra option corresponding to the header in the GZIP format, combining the plurality of compressed data blocks to obtain a compressed file, and sending the compressed file to the receiving end, In order to facilitate parallel decompression of the compressed file by the receiving end.
所述服务器的功能和作用的实现过程, 详见上述方法中对应的实现过程,在此不 再赘述。  For the implementation process of the functions and functions of the server, refer to the corresponding implementation process in the foregoing method, and details are not described herein.
相应的, 本发明实施例还提供另一种服务器, 其结构示意图如图 6所示, 所述服 务器包括: 处理器 61和解压缩引擎组 62, 其中, 所述处理器 61, 用于从压缩文件头 部的额外可选项中获取各个压缩数据块的长度、数据块的个数以及各个数据块的循环 冗余校验 CRC值; 根据所述压缩数据块的长度和数据块的个数对所述压缩文件进行 分块, 得到各个压缩数据块; 所述解压缩引擎组 62, 用于对所述各个压缩数据块进 行并行解压缩, 得到对应的各个数据块; 所述处理器 61, 还用于计算解压缩得到的 所述各个数据块的循环冗余校验 CRC值;如果判断获取的所述各个数据块的 CRC值 与解压缩得到的各个数据块的 CRC值相同, 则所述数据块与原数据块一致; 合并解 压缩得到的所述各个数据块, 得到原文件。  Correspondingly, the embodiment of the present invention further provides another server, which is shown in FIG. 6. The server includes: a processor 61 and a decompression engine group 62, wherein the processor 61 is configured to compress files. Obtaining, in an additional option of the header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; according to the length of the compressed data block and the number of data blocks The compressed file is divided into blocks to obtain respective compressed data blocks. The decompression engine group 62 is configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks. The processor 61 is further used to Calculating a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block is the same as a CRC value of each data block obtained by decompression, the data block is The original data blocks are consistent; the respective data blocks obtained by the decompression are combined to obtain the original file.
本发明实施例中, 为了使包含多压缩数据块 (block) 的压缩文件在解压的时候 能够并行解压, 从而发挥多核或者多通道技术的优势。 在遵循现有 GZIP格式的基础 上, 本发明实施例中, 在压缩的时把各 block块的长度信息和各 block块原数据块的 CRC32值存放在头部扩展选项的额外可选项中, 以便于在解压时, 根据所述各 block 块的长度信息和各 block块原数据块的 CRC32值进行并行解压, 从而提高了解压缩 的速度与效率。  In the embodiment of the present invention, in order to enable a compressed file containing multiple compressed data blocks to be decompressed in parallel during decompression, the advantages of multi-core or multi-channel technology are exerted. On the basis of the existing GZIP format, in the embodiment of the present invention, the length information of each block block and the CRC32 value of each block block original data block are stored in an additional option of the header extension option during compression, so that When decompressing, parallel decompression is performed according to the length information of each block block and the CRC32 value of the original block of each block block, thereby improving the speed and efficiency of understanding compression.
为了便于本领域技术人员的理解, 下面以具体的应用实例来说明。 还请参阅图 7, 为本发明实施例提供的一种文件压缩方法的应用实例的流程图; 如图所示, 该压缩方式主要是利用硬件或者软件的多个压缩引擎对个 block块进行并 行压缩, 整个压缩过程主要包括: In order to facilitate the understanding of those skilled in the art, the following is a specific application example. FIG. 7 is a flowchart of an application example of a file compression method according to an embodiment of the present invention; as shown in the figure, the compression mode mainly uses multiple compression engines of hardware or software to perform parallel on a block block. Compression, the entire compression process mainly includes:
1、 处理器将原文件分成各个子数据块, 比如, 将文件拆分成 n个子文件, 即子 文件 1、 子文件 2直到子文件 n, 并统计各个子文件的个数, 比如为 n个;  1. The processor divides the original file into sub-data blocks, for example, splits the file into n sub-files, that is, sub-file 1, sub-file 2, and sub-file n, and counts the number of each sub-file, for example, n ;
2、 处理器根据子文件的个数 (即 n), 计算出扩展数据的长度(XLEN), 并申请存 储扩展数据所占用内存;  2. The processor calculates the length of the extended data (XLEN) according to the number of subfiles (ie, n), and applies for storing the memory occupied by the extended data;
3 、 处理器将各个子数据块被传输到对应的各个压缩引擎( 压缩引擎组), 所述 每个压缩引擎对所述对应各个子文件进行并行压缩, 并计算数据块的 CRC32值; 4、 各压缩引擎在把各个子文件压缩成压缩子文件后, 把压缩子文件的长度和原 子文件的 CRC32值存储到扩展可选项中的额外可选项中,其中子文件的长度是以 bit 为单位; 同时, 还需要将各个子文件的个数, 以及扩展数据的长度的存储到扩展可 选选中的额外可选项中;  3. The processor transmits the respective sub-blocks to the corresponding respective compression engines (compression engine groups), and each of the compression engines performs parallel compression on the corresponding sub-files, and calculates a CRC32 value of the data block; After compressing each subfile into a compressed subfile, each compression engine stores the length of the compressed subfile and the CRC32 value of the atomic file into an additional option in the extended option, wherein the length of the subfile is in units of bits; At the same time, it is also necessary to store the number of each subfile, and the length of the extended data, into an additional optional option selected by the extension;
5 、当所有数据块压缩好后, 处理器把额外可选项添加到压缩文件的头部对应位 置 (即扩展可选项), 然后合并所述压缩子文件, 得到压缩文件。  5. When all the data blocks are compressed, the processor adds additional options to the corresponding position of the compressed file header (ie, expandable options), and then merges the compressed subfiles to obtain a compressed file.
在该压缩的实施例中, 因为扩展数据信息是按原先的 GZIP方式进行编写, 所以 按这个方式压缩的文件, 任何能解压其他 GZIP格式压缩包的程序或者解压引擎都可 以对这个文件进行解压缩, 只是不能发挥并行解压缩的优势。如果想提交解压缩的效 率, 可以采用本实施例提供的并行解压缩。  In the compressed embodiment, since the extended data information is written in the original GZIP manner, the file compressed in this way, any program that can decompress other GZIP format compressed packages or the decompression engine can decompress the file. , just can't take advantage of parallel decompression. If you want to submit the efficiency of decompression, you can use the parallel decompression provided by this embodiment.
还请参阅图 8, 为本发明实施例提供的一种文件解压缩方法的应用实例的流程 图; 如图所示, 该解压缩方式主要是利用硬件或者软件的多个解压缩引擎(即解压缩 引擎组) 对个 block块进行并行解压缩, 整个解压缩过程主要包括:  FIG. 8 is a flowchart of an application example of a file decompression method according to an embodiment of the present invention. As shown in the figure, the decompression method mainly uses multiple decompression engines of hardware or software (ie, a solution). Compression engine group) Parallel decompression of a block block, the entire decompression process mainly includes:
1 处理器从压缩文件中扩展选项中的额外可选项中获取各个 block (即压缩子文 件或压缩数据块) 的个数和各 block的长度, 并根据各个 block的个数和各 block的 长度对所述压缩文件进行分块, 得到各个 block, 比如 blockl、 block2直到 block n。  1 The processor obtains the number of each block (ie, compressed subfile or compressed data block) and the length of each block from the additional options in the extended option in the compressed file, and according to the number of each block and the length of each block The compressed file is divided into blocks to obtain individual blocks, such as block1 and block2 up to block n.
2 处理器将各个 block并行放入对应的各个解压引擎中;  2 The processor puts each block into the corresponding decompression engines in parallel;
3、各个解压引擎对各个 block进行并行解压, 并计算各 block块对应的 CRC值。 3. Each decompression engine decompresses each block in parallel and calculates the CRC value corresponding to each block.
4、处理器在各个解压引擎对各个 block块解压成数据块后, 读取各个 block对应 的额外可选项中的 CRC值; 4. After the processor decompresses each block block into data blocks by each decompression engine, the processor reads the CRC value in the additional options corresponding to each block;
5、 处理器比较解压后计算的各个 block对应的 CRC值与读取的各个 block对应 的 CRC32值, 如果二者相同, 则确认该数据块与原数据块一致。 6、 当所有 block解压完毕后, 合并所述解压的数据块, 得到原文件。 5. The processor compares the CRC value corresponding to each block calculated after decompression with the CRC32 value corresponding to each block read, and if the two are the same, it is confirmed that the data block is consistent with the original data block. 6. After all the blocks are decompressed, merge the decompressed data blocks to obtain the original file.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实 体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之 间存在任何这种实际的关系或者顺序。 而且, 术语 "包括" 、 "包含"或者其任何其 他变体意在涵盖非排他性的包含, 从而使得包括一系列要素的过程、 方法、物品或者 设备不仅包括那些要素, 而且还包括没有明确列出的其他要素, 或者是还包括为这种 过程、 方法、 物品或者设备所固有的要素。 在没有更多限制的情况下, 由语句 "包括 一个…… " 限定的要素, 并不排除在包括所述要素的过程、 方法、物品或者设备中还 存在另外的相同要素。  It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the terms "comprising", "comprising" or "comprising" or "comprising" or "the" Other elements, or elements that are inherent to such a process, method, item, or device. In the absence of more limitations, the elements defined by the phrase "comprising a ..." do not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明可借助 软件加必需的通用硬件平台的方式来实现, 当然也可以通过硬件,但很多情况下前者 是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做 出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介 质中, 如 ROM/RAM、 磁碟、 光盘等, 包括若干指令用以使得一台计算机设备(可以 是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例或者实施例的某些 部分所述的方法。  Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. the way. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人 员来说, 在不脱离本发明原理的前提下, 还可以作出若干改进和润饰, 这些改进和润 饰也应视为本发明的保护范围。  The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It is considered as the scope of protection of the present invention.

Claims

权 利 要 求 Rights request
1、 一种文件压缩方法, 其特征在于, 包括: A file compression method, comprising:
将文件拆分成多个数据块, 并统计所述数据块的个数;  Splitting the file into multiple data blocks, and counting the number of the data blocks;
根据所述多个数据块的个数计算需要扩展数据内容的长度,并根据所述长度 申请额外可选项占用的内存;  Calculating, according to the number of the plurality of data blocks, a length of the extended data content, and requesting memory occupied by the additional option according to the length;
对所述多个数据块进行并行压缩, 得到对应的多个压缩数据块, 并获取各个 数据块的循环冗余校验 CRC值;  Performing parallel compression on the plurality of data blocks to obtain corresponding compressed data blocks, and acquiring a cyclic redundancy check CRC value of each data block;
将所述扩展数据内容的长度、数据块的个数、各个压缩数据块的长度和各个 数据块的 CRC值存储在所述额外可选项中;  Storing the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block in the additional option;
将所述额外可选项添加到数据压缩格式中头部对应的扩展 extra选项中, 合 并所述多个压缩数据块, 以获得压缩文件;  Adding the additional option to the extended extra option corresponding to the header in the data compression format, and combining the plurality of compressed data blocks to obtain a compressed file;
将所述压缩文件发送给接收端, 以便于接收端对所述压缩文件进行并行解 压。  And sending the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.
2、 根据权利要求 1所述的方法, 其特征在于, 所述对所述多个数据块进行 并行压缩, 具体包括: The method according to claim 1, wherein the performing the parallel compression on the plurality of data blocks comprises:
通过多个压缩引擎分别对所述多个数据块进行并行压缩。  The plurality of data blocks are separately compressed in parallel by a plurality of compression engines.
3、根据权利要求 1或 2所述的方法, 其特征在于, 所述额外可选项还包括: SI1和 SI2, 其中, 所述 SI1和 SI2表示额外可选项中扩展数据的 ID。 The method according to claim 1 or 2, wherein the additional options further comprise: SI1 and SI2, wherein the SI1 and SI2 represent IDs of extended data in the additional options.
4、 一种文件解压缩方法, 其特征在于, 包括: 4. A file decompression method, comprising:
获取压缩文件中各个压缩数据块的长度、数据块的个数以及各个数据块的循 环冗余校验 CRC值;  Obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block;
根据所述压缩数据块的长度和数据块的个数对所述压缩文件进行分块,得到 各个压缩数据块;  And compressing the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
对所述各个压缩数据块进行并行解压缩, 得到对应的各个数据块; 计算解压缩得到的所述各个数据块的 CRC值;  Performing parallel decompression on the compressed data blocks to obtain corresponding data blocks; calculating a CRC value of each of the decompressed data blocks;
判断获取的所述各个数据块的 CRC值与解压缩得到的各个数据块的 CRC值 是否相同;  Determining whether the obtained CRC value of each data block is the same as the CRC value of each data block obtained by decompression;
在所述各个数据块的 CRC值与解压缩得到的各个数据块的 CRC值相同时, 合并解压缩得到的所述各个数据块, 得到原文件。 When the CRC value of each data block is the same as the CRC value of each data block obtained by decompression, The respective data blocks obtained by decompressing are combined to obtain an original file.
5、 根据权利要求 4所述的方法, 其特征在于, 所述获取压缩文件中各个压 缩数据块的长度、 数据块的个数以及各个数据块的循环冗余校验 CRC值具体包 括: The method according to claim 4, wherein the obtaining the length of each compressed data block in the compressed file, the number of data blocks, and the cyclic redundancy check CRC value of each data block specifically include:
从压缩文件头部扩展 extra选项中的额外可选项中获取各个压缩数据块的长 度、 数据块的个数以及各个数据块的循环冗余校验 CRC值。  The length of each compressed data block, the number of data blocks, and the cyclic redundancy check CRC value of each data block are obtained from the additional options in the compressed file header extension extra option.
6、 根据权利要求 4或 5所述的方法, 其特征在于, 所述对所述各个压缩数 据块进行并行解压缩, 具体包括: The method according to claim 4 or 5, wherein the performing the parallel decompression on the compressed data blocks comprises:
通过多个解压缩引擎分别对所述多个压缩数据块进行并行解压缩。  The plurality of compressed data blocks are respectively decompressed in parallel by a plurality of decompression engines.
7、 一种文件压缩装置, 其特征在于, 包括: 7. A file compression device, comprising:
拆分单元, 用于将文件拆分成多个数据块, 并统计所述多个数据块的个数; 第一计算单元, 用于根据所述多个数据块的个数计算扩展数据内容的长度, 根据所述长度申请额外可选项占用的内存;  a splitting unit, configured to split the file into a plurality of data blocks, and count the number of the plurality of data blocks; the first calculating unit is configured to calculate the extended data content according to the number of the plurality of data blocks Length, the memory occupied by the additional options according to the length;
压缩单元, 用于对所述多个数据块进行并行压缩, 得到多个压缩数据块; 第二计算单元, 用于在压缩单元对所述多个数据块进行并行压缩时, 分别计 算各个数据块的循环冗余校验 CRC值;  a compression unit, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks; and a second calculating unit, configured to separately calculate each data block when the compression unit performs parallel compression on the plurality of data blocks Cyclic redundancy check CRC value;
存储单元, 用于所述扩展数据内容的长度、数据块的个数、各个压缩数据块 的长度和各个数据块的 CRC值存储在额外可选项中;  a storage unit, the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block are stored in an additional option;
添加单元, 用于将所述额外可选项添加到压缩格式中头部对应的扩展 extra 选项中;  Adding a unit for adding the additional option to an extended extra option corresponding to a header in a compressed format;
合并单元, 用于在所述添加单元将所述额外可选项添加到 GZIP格式中头部 对应的位置后, 合并所述多个压缩数据块, 得到压缩文件;  a merging unit, configured to: after the adding unit adds the additional option to a location corresponding to a header in the GZIP format, combining the multiple compressed data blocks to obtain a compressed file;
发送单元, 用于将所述压缩文件发送给接收端, 以便于接收端对所述压缩文 件进行并行解压。  And a sending unit, configured to send the compressed file to the receiving end, so that the receiving end performs parallel decompression on the compressed file.
8、 一种文件解压缩装置, 其特征在于, 包括: 8. A file decompression device, comprising:
获取单元, 用于从压缩文件头部的额外可选项中获取各个压缩数据块的长 度、 数据块的个数以及各个数据块的循环冗余校验 CRC值; 划分单元,用于根据所述压缩数据块的长度和数据块的个数对所述压缩文件 进行分块, 得到各个压缩数据块; An obtaining unit, configured to obtain, according to an additional option of the compressed file header, a length of each compressed data block, a number of data blocks, and a cyclic redundancy check CRC value of each data block; a dividing unit, configured to block the compressed file according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
解压缩单元, 用于对所述各个压缩数据块进行并行解压缩, 得到对应的各个 数据块;  a decompression unit, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;
计算单元, 用于在解压缩单元对所述各个压缩数据块进行并行解压缩时, 计 算解压缩得到的所述各个数据块的 CRC值;  a calculating unit, configured to calculate a CRC value of each of the decompressed data blocks when the decompressing unit performs parallel decompression on the compressed data blocks;
判断单元, 用于判断所述获取单元获取的所述各个数据块的 CRC值与计算 解压缩得到的各个数据块的 CRC值是否相同;  a determining unit, configured to determine whether a CRC value of each of the data blocks acquired by the acquiring unit is the same as a CRC value of each data block obtained by calculating the decompression;
确定单元, 用于在判断单元判断 CRC值相同时, 确定所述数据块与原数据 块一致;  a determining unit, configured to determine, when the determining unit determines that the CRC value is the same, that the data block is consistent with the original data block;
合并单元, 用于在所述确定单元合确定所述数据块与原数据块一致时, 合并 解压缩得到的所述各个数据块, 得到原文件。  And a merging unit, configured to: when the determining unit determines that the data block is consistent with the original data block, merge the decompressed respective data blocks to obtain an original file.
9、 根据权利要求 8所述的装置, 其特征在于, 所述获取单元, 具体用于从 压缩文件头部扩展 extra选项中的额外可选项中获取各个压缩数据块的长度、 数 据块的个数以及各个数据块的循环冗余校验 CRC值。 The device according to claim 8, wherein the acquiring unit is specifically configured to obtain, according to an additional option in the compressed file header extension extra option, the length of each compressed data block and the number of data blocks. And the cyclic redundancy check CRC value of each data block.
10、 一种服务器, 其特征在于, 包括: 10. A server, comprising:
处理器, 用于将待压缩的文件拆分成多个数据块, 并统计所述多个数据块的 个数; 根据所述多个数据块的个数计算扩展数据内容的长度, 以及根据所述长度 申请额外可选项占用的内存;  a processor, configured to split a file to be compressed into a plurality of data blocks, and count the number of the plurality of data blocks; calculate a length of the extended data content according to the number of the plurality of data blocks, and The length of the application for additional memory occupied by the option;
压缩引擎组, 包括多个压缩引擎, 用于对所述多个数据块进行并行压缩, 得 到多个压缩数据块;  a compression engine group, comprising: a plurality of compression engines, configured to perform parallel compression on the plurality of data blocks to obtain a plurality of compressed data blocks;
所述处理器, 还用于计算各个数据块的循环冗余校验 CRC值, 并将所述扩 展数据内容的长度、数据块的个数、各个压缩数据块的长度和各个数据块的 CRC 值存储在额外可选项中; 以及将所述额外可选项添加到 GZIP格式中头部对应的 扩展 extra选项中, 合并所述多个压缩数据块, 得到压缩文件, 并将所述压缩文 件发送给接收端, 以便于接收端对所述压缩文件进行并行解压。  The processor is further configured to calculate a cyclic redundancy check CRC value of each data block, and the length of the extended data content, the number of data blocks, the length of each compressed data block, and the CRC value of each data block. Storing in an additional option; and adding the additional option to the extended extra option corresponding to the header in the GZIP format, merging the plurality of compressed data blocks, obtaining a compressed file, and transmitting the compressed file to the receiving End, so that the receiving end performs parallel decompression on the compressed file.
11、 一种服务器, 其特征在于, 包括: 11. A server, comprising:
处理器, 用于从压缩文件头部的额外可选项中获取各个压缩数据块的长度、 数据块的个数以及各个数据块的循环冗余校验 CRC值; 根据所述压缩数据块的 长度和数据块的个数对所述压缩文件进行分块, 得到各个压缩数据块; a processor, configured to obtain a length of each compressed data block from an additional option of a compressed file header, The number of data blocks and the cyclic redundancy check CRC value of each data block; the compressed file is divided according to the length of the compressed data block and the number of data blocks, to obtain each compressed data block;
解压缩引擎组, 用于对所述各个压缩数据块进行并行解压缩, 得到对应的各 个数据块;  Decompressing an engine group, configured to perform parallel decompression on the compressed data blocks to obtain corresponding data blocks;
所述处理器,还用于计算解压缩得到的所述各个数据块的循环冗余校验 CRC 值; 如果判断获取的所述各个数据块的 CRC 值与解压缩得到的各个数据块的 CRC 值相同, 则所述数据块与原数据块一致; 合并解压缩得到的所述各个数据 块, 得到原文件。  The processor is further configured to calculate a cyclic redundancy check CRC value of the respective data blocks obtained by decompression; if it is determined that the obtained CRC value of each data block and the CRC value of each data block obtained by decompression If the same, the data block is consistent with the original data block; the respective data blocks obtained by the decompression are combined to obtain the original file.
PCT/CN2012/086341 2012-12-11 2012-12-11 File compression method, file decompression method, device and server WO2014089753A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2012/086341 WO2014089753A1 (en) 2012-12-11 2012-12-11 File compression method, file decompression method, device and server
CN201280003410.0A CN103384884B (en) 2012-12-11 2012-12-11 A kind of file compression method, file decompression method, device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086341 WO2014089753A1 (en) 2012-12-11 2012-12-11 File compression method, file decompression method, device and server

Publications (1)

Publication Number Publication Date
WO2014089753A1 true WO2014089753A1 (en) 2014-06-19

Family

ID=49492140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086341 WO2014089753A1 (en) 2012-12-11 2012-12-11 File compression method, file decompression method, device and server

Country Status (2)

Country Link
CN (1) CN103384884B (en)
WO (1) WO2014089753A1 (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740298A (en) * 2014-12-12 2016-07-06 北京奇虎科技有限公司 File processing method and apparatus, and server-side equipment
CN105573785A (en) * 2015-12-11 2016-05-11 青岛海信电器股份有限公司 Differential package manufacturing method and device
EP3419238B1 (en) 2016-03-14 2020-05-06 Huawei Technologies Co., Ltd. Method, apparatus, and system for transmitting data
CN106021003B (en) * 2016-05-05 2019-11-29 捷开通讯(深圳)有限公司 Restorative procedure, intelligent terminal and the server of intelligent terminal
CN106126367B (en) * 2016-06-28 2019-09-20 湖北锐世数字医学影像科技有限公司 A kind of self checking method and system of file
CN107919935B (en) * 2016-10-08 2022-04-15 中兴通讯股份有限公司 Method and device for improving voice communication quality
CN107977233B (en) 2016-10-19 2021-06-01 华为技术有限公司 Method and device for quickly loading kernel mirror image file
CN106503165A (en) * 2016-10-31 2017-03-15 杭州华为数字技术有限公司 Compression, decompressing method, device and equipment
CN106648955B (en) * 2016-11-15 2020-01-31 杭州华为数字技术有限公司 Compression method and related device
CN110603759B (en) * 2017-05-04 2022-04-05 上海诺基亚贝尔股份有限公司 Unified error correction and detection code generator
CN107967157B (en) * 2017-08-17 2021-06-01 青岛海信移动通信技术股份有限公司 Data processing method and device in OTA (over the air) tundish generation process
WO2019119336A1 (en) * 2017-12-21 2019-06-27 深圳大学 Multi-thread compression and decompression methods in generic data gz format, and device
CN108134609A (en) * 2017-12-21 2018-06-08 深圳大学 Multithreading compression and decompressing method and the device of a kind of conventional data gz forms
CN108446300B (en) * 2018-01-26 2021-04-09 北京奇虎科技有限公司 Data information scanning method and device
CN108520067A (en) * 2018-04-12 2018-09-11 郑州云海信息技术有限公司 Compression, the method, apparatus and storage medium for decompressing gzip formatted files
CN108509642A (en) * 2018-04-12 2018-09-07 郑州云海信息技术有限公司 Compression, the method, apparatus and storage medium for decompressing gzip formatted files
CN110784225A (en) * 2018-07-31 2020-02-11 华为技术有限公司 Data compression method, data decompression method, related device, electronic equipment and system
CN108958966A (en) * 2018-09-27 2018-12-07 合肥达博科技有限公司 A kind of data guard method and device of lossless data compression
CN109582653B (en) * 2018-11-14 2020-12-08 网易(杭州)网络有限公司 Method and device for compressing and decompressing files
CN111382856B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data processing device, method, chip and electronic equipment
CN111290697B (en) * 2018-12-07 2022-01-28 上海寒武纪信息科技有限公司 Data compression method, encoding circuit and arithmetic device
WO2020114283A1 (en) * 2018-12-07 2020-06-11 上海寒武纪信息科技有限公司 Data processing method and device
CN111294056B (en) * 2018-12-07 2022-03-29 上海寒武纪信息科技有限公司 Data decompression method and coding circuit
CN111382852B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data processing device, method, chip and electronic equipment
CN111382853B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data processing device, method, chip and electronic equipment
CN111294057A (en) * 2018-12-07 2020-06-16 上海寒武纪信息科技有限公司 Data compression method, encoding circuit and arithmetic device
CN109710581B (en) * 2018-12-25 2023-05-30 四川巧夺天工信息安全智能设备有限公司 Method for decompressing compressed data in qcow image file
CN117172296A (en) * 2018-12-28 2023-12-05 上海寒武纪信息科技有限公司 Data processing device, method, chip and electronic equipment
CN110247666B (en) * 2019-05-22 2023-08-18 深圳大学 System and method for hardware parallel compression
WO2020232682A1 (en) * 2019-05-22 2020-11-26 深圳大学 Parallel compression system and method employing hardware
CN112099725A (en) * 2019-06-17 2020-12-18 华为技术有限公司 Data processing method and device and computer readable storage medium
CN110727720B (en) * 2019-10-21 2023-06-20 腾讯科技(深圳)有限公司 List display and query method and device, storage medium and computer equipment
CN110808054B (en) * 2019-11-04 2022-05-06 思必驰科技股份有限公司 Multi-channel audio compression and decompression method and system
CN111258621B (en) * 2019-11-19 2022-08-02 浙江瑞银电子有限公司 Differentiated firmware upgrading method
CN111723059B (en) * 2020-05-25 2021-03-16 深圳市科楠科技开发有限公司 Data compression method and device, terminal equipment and storage medium
CN112084158A (en) * 2020-09-25 2020-12-15 北京百家科技集团有限公司 Data set file compression method and device
CN114697309A (en) * 2020-12-25 2022-07-01 北京三快在线科技有限公司 File transmission method, file processing method and device
CN112866227A (en) * 2021-01-13 2021-05-28 北京连山科技股份有限公司 File authorization protection method and system
CN112860646B (en) * 2021-02-24 2022-12-02 上海泰宇信息技术股份有限公司 Method for distributed aggregate compression and unitary extraction of mass file files
CN113225180A (en) * 2021-04-29 2021-08-06 北京连山科技股份有限公司 Method and system for protecting communication key
CN114726924B (en) * 2022-05-17 2022-10-21 北京安盟信息技术股份有限公司 Method for improving network bandwidth utilization rate based on TCP/IP protocol stack characteristics
WO2023226036A1 (en) * 2022-05-27 2023-11-30 深圳华大基因科技服务有限公司 Fastq data processing method and apparatus, electronic device, and storage medium
CN115883839B (en) * 2023-03-09 2023-06-06 湖北芯擎科技有限公司 Image verification method, device, equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102138170A (en) * 2008-08-25 2011-07-27 索尼公司 Data conversion device, data conversion method, and program
CN102724500A (en) * 2012-06-05 2012-10-10 沙基昌 Method and system for compressing/decompressing video data
CN102740075A (en) * 2012-06-05 2012-10-17 沙基昌 Video data compressing/decompressing method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5686915A (en) * 1995-12-27 1997-11-11 Xerox Corporation Interleaved Huffman encoding and decoding method
CN102244518B (en) * 2010-05-10 2016-01-20 百度在线网络技术(北京)有限公司 The hard-wired system and method for parallel decompression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102138170A (en) * 2008-08-25 2011-07-27 索尼公司 Data conversion device, data conversion method, and program
CN102724500A (en) * 2012-06-05 2012-10-10 沙基昌 Method and system for compressing/decompressing video data
CN102740075A (en) * 2012-06-05 2012-10-17 沙基昌 Video data compressing/decompressing method and system

Also Published As

Publication number Publication date
CN103384884B (en) 2016-11-16
CN103384884A (en) 2013-11-06

Similar Documents

Publication Publication Date Title
WO2014089753A1 (en) File compression method, file decompression method, device and server
US10680645B2 (en) System and method for data storage, transfer, synchronization, and security using codeword probability estimation
US10706018B2 (en) Bandwidth-efficient installation of software on target devices using reference code libraries
US10509582B2 (en) System and method for data storage, transfer, synchronization, and security
US11366790B2 (en) System and method for random-access manipulation of compacted data files
US11385794B2 (en) System and method for data compaction and security using multiple encoding algorithms
US11748309B2 (en) System and method for error-resilient data reduction
US11762557B2 (en) System and method for data compaction and encryption of anonymized datasets
US20240028563A1 (en) System and method for securing high-speed intrachip communications
US20230401173A1 (en) System and methods for secure deduplication of compacted data
JP2017097437A (en) Information processing system, information processing equipment and program
WO2015176249A1 (en) Transmission method for wireless ethernet interface hard disk, related device, and system
US11838034B2 (en) System and method for blockchain data compaction
US11853262B2 (en) System and method for computer data type identification
US20240080040A1 (en) System and method for data storage, transfer, synchronization, and security using automated model monitoring and training
WO2020264522A1 (en) Data storage, transfer, synchronization, and security using recursive encoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12889982

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12889982

Country of ref document: EP

Kind code of ref document: A1