WO2019119336A1

WO2019119336A1 - Multi-thread compression and decompression methods in generic data gz format, and device

Info

Publication number: WO2019119336A1
Application number: PCT/CN2017/117619
Authority: WO
Inventors: 朱泽轩; 孙怡雯
Original assignee: 深圳大学
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2019-06-27

Abstract

Multi-thread compression and decompression methods in generic data gz format, applicable to the field of data processing technology. The step of compression is as follows (S1): inputting original data, and performing block division processing on the original data, to obtain M data blocks (S11); using N1 threads in a preset first thread pool to compress said M data blocks, and during compression, reserving a preset space in a file header portion in a gz format, to obtain M copies of compressed data gzDi and the sizes of the data gzDi (S12); writing said M copies of compressed data gzDi into a disk in sequence, and writing the corresponding sizes of the M copies of data gzDi into the preset space in sequence, to obtain compressed data (S13). The step of decompression is as follows (S2): inputting the compressed data, reading the list information of the written sizes, and dividing the compressed data according to the list information of the sizes, to obtain M data blocks gzDi (S21); using N2 threads in a preset second thread pool to decompress said M data blocks gzDi, to obtain M copies of decompressed original data Di (S22); and connecting the decompressed original data Di in series according to the list information of the sizes to obtain complete original data (S23). The present method achieves the purpose of multi-thread compression and multi-thread decompression.

Description

Multi-thread compression and decompression method and device for universal data gz format

Technical field

The invention belongs to the technical field of data processing, and in particular relates to a multi-thread compression and decompression method and device for a universal data gz format.

Background technique

At present, for the general compression scheme of text data, the gz compression format is mainly used. For the gz compression format, the most widely used library is zlib single-threaded gz compression and multi-threaded gz compression with pigz (A parallel implementation of gzip). The main disadvantages of the gz format compression software using the Zlib and Pigz methods are as follows:

1, the general gz format compression software often assumes that the input is a single character stream, that is, there is only one data source, and for multi-source data, parallel processing is not well performed. In the field of big data, the most common is multi-source data, such as Internet user information data collection, at the same time, there may be more than one user information needs to be compressed and saved to the same file. When the amount of data is large enough, only processing these data in parallel can meet the time requirement. The zlib library only implements the most basic single-threaded gz compression and decompression, while pigz is a parallel gz compressed version. If you use pigz parallel compression to save, there will be serious IO competition, resulting in low IO resource utilization, because, Pigz binds compression to write, decompress, and read; in addition, zlib binds compression to write, decompress, and read. Binding compression to writing, decompression, and reading, while simplifying user operations, is not flexible enough to use the best read and write configuration based on the CPU and IO performance of the computer. For computers with computing power far beyond IO read and write capabilities, to maximize the computational performance of the computer, read and write operations must be separated from decompression and compression calculations.

2, Pigz's multi-threaded compression software mainly implements block compression of single data. For decompression, it only provides a single-threaded solution, which makes the efficiency of decompression limited by the single-thread computing power of the CPU. In the decompression reading of massive data, there is also a huge demand in the industrial application and academic field through the decompression method of parallel multi-threading, such as high-throughput DNA sequencing to generate hundreds of GB of FASTA files; but in fact, subsequent bioinformatics analysis In this case, only one thread can be used for decompression reading (usually one compute node of HPC will provide dozens of threads), which actually extends the analysis time.

Summary of the invention

The invention provides a multi-thread compression and decompression method and device for general data gz format, aiming at multi-thread compression of original data under the premise of separating read and write operations from decompression and compression calculation. The compressed data is multi-threaded and decompressed.

The present invention provides a multi-thread compression and decompression method in a general data gz format, comprising: a compression step S1 and a decompression step S2, wherein the compression step S1 comprises:

Step S11, inputting original data, and performing block processing on the original data to obtain M pieces of data blocks;

Wherein, each data block is represented as Di, i ∈ [0, M-1];

Step S12, preset by the first thread pool threads N ₁ of each compressed data block to the M parts of the compression process is reserved in the header portion of the predetermined space of the gz format to obtain data compression gzDi parts M And the size of the data gzDi size (gzDi);

Step S13, sequentially writing the M pieces of the compressed data gzDi into the disk, and sequentially writing the size (gzDi) of the corresponding M pieces of the data gzDi into the preset space to obtain compressed data;

The decompression step S2 includes:

Step S21, inputting the compressed data, reading the written list information of the size (gzDi), and segmenting the compressed data according to the list information of the size (gzDi) to obtain M pieces of data gzDi ;

Step S22, using the N ₂ threads in the preset second thread pool to decompress M pieces of the data block gzDi, respectively, to obtain M pieces of decompressed original data Di;

Step S23, the original data Di is decompressed according to the list information of the size (gzDi) to obtain complete original data.

The present invention also provides a multi-thread compression and decompression device of the general data gz format, comprising: a compression module 1 and a decompression module 2, wherein the compression module 1 comprises:

The blocking module 11 is configured to input original data, and perform block processing on the original data to obtain M pieces of data blocks;

Wherein, each data block is represented as Di, i ∈ [0, M-1];

The compression module 12 is configured to respectively compress the M data blocks by using N ₁ threads in the preset first thread pool, and reserve a preset space in the file header portion of the gz format during the compression process, and obtain M compression. Data gzDi and the size of the data gzDi size (gzDi);

The writing module 13 is configured to sequentially write the M pieces of the compressed data gzDi into the disk, and sequentially write the size (gzDi) of the corresponding M pieces of the data gzDi into the preset space to obtain compression. data;

The decompression module 2 includes:

a segmentation module 21, configured to input the compressed data, read the written list information of the size (gzDi), and segment the compressed data according to the list information of the size (gzDi) to obtain M Data block gzDi;

The decompression module 22 is configured to decompress the M pieces of the data block gzDi by using N ₂ threads in the preset second thread pool to obtain the M pieces of decompressed original data Di;

The serial module 23 is configured to obtain the complete original data by serially decompressing the original data Di according to the list information of the size (gzDi).

Compared with the prior art, the present invention has the beneficial effects that the universal data gz format multi-thread compression and decompression method and device provided by the present invention, the compression step is that the input original data is first processed into blocks, and then utilized. N1 threads respectively compress the data block, obtain M pieces of compressed data gzDi and corresponding size(gzDi), and finally write gzDi to the disk, wherein M parts size(gzDi) are written into the file header part of gz format; decompression The step of reading the size (gzDi) of the written list information, and segmenting the input compressed data according to the list information to obtain M pieces of data blocks; and decompressing the M pieces of the data block by using N2 threads respectively. Obtaining M pieces of decompressed original data Di; finally concatenating the original data Di to obtain complete original data; compared with the prior art, the present invention uses a parallel multi-thread gz compression method separated from the read-write IO operation to read The write operation is separated from the decompression and compression calculations, and can be scheduled according to the actual situation, effectively avoiding the IO competition of the multi-thread compression write operation; in addition, using the multi-thread gz decompression method to make the soft The program can use more CPUs for decompression calculations on the computer, resulting in larger data input, and a single program can achieve higher computational occupancy; at the same time, this size (gzDi) gz format is stored in the header portion of the file. It can be decompressed by the original single-thread decompression method of zlib or pigz, or bgz multi-thread decompression can be used to ensure compatibility, so that the method promotion cost is extremely low.

DRAWINGS

1 is a schematic flowchart of a multi-thread compression and decompression method of a general data gz format according to an embodiment of the present invention;

2 is a schematic diagram of a process of Bgz multi-thread compression and decompression provided by an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a multi-thread compression and decompression device of a general data gz format according to an embodiment of the present invention.

Detailed ways

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Due to the existence in the prior art, on the one hand, when using pigz parallel compression storage, since pigz binds compression and writing, decompression and reading together, the use of hardware in the case where IO resources are limited with respect to CPU calculation. The technical problem is very low; on the other hand, both zlib and pigz fail to implement the technical problem of gz multi-thread decompression.

In order to solve the above technical problem, the present invention provides a multi-thread compression and decompression method and device for a universal data gz format, and develops a multi-thread compression and decompression solution for multi-source data for the currently widely used gz compression format. Among them, the compression calculation and the write operation are separated, which can effectively avoid the IO competition of multi-thread compression write operation; the multi-thread gz decompression method provides the software to use the more CPU to decompress the calculation on the computer, thereby obtaining a larger Data input, a single program can obtain higher computing occupancy; at the same time, the present invention notices software compatibility, and the compressed data structure is specially designed to ensure that the existing zlib and pigz methods can be performed without modification. Single-threaded decompression.

In fact, with the development of the Internet and electronic technology, the amount of data information is getting larger and larger, and the computer performance is getting better and better. Between data and hardware, more suitable software is needed to connect. The present invention provides a flexible big data multi-thread compression and reading solution, which is especially suitable for large amounts of large data in multi-data sources. Compression and reading on the performance computing platform, enabling big data software to more fully utilize the computing power of high-performance computers (HPC). Big data storage solutions must be economical, so compressed storage is an inevitable choice. Compared with the direct storage of ordinary data, big data is compressed by using certain computing resources during storage. After the total amount of data characters is reduced, storage is performed, which can greatly reduce the occupation of IO resources and hard disk storage space. Such a solution can use all parts of the computer more comprehensively and in a more coordinated manner, avoiding the situation that direct storage only takes up IO and does not apply to the CPU, and fully utilizes the hardware performance of the HPC.

The method provided by the present invention is specifically described below. The present invention is based on the general data gz format multi-thread compression and decompression algorithm bgz (block gzip) of the zlib open source library. First, the bgz method utilizes zlib data structure and deflate method to realize and read and write. The IO operation separates the compression and decompression functions, namely gzwrite=bgzCompress+fwrite, gzread=fread+bgzDecompress; and implements multithreading based on memory compression and decompression.

1 is a multi-thread compression and decompression method of a general data gz format according to an embodiment of the present invention. The method includes: a compression step S1 and a decompression step S2, wherein the compression step S1 includes:

Wherein, each data block is represented as Di, i ∈ [0, M-1];

Specifically, the original data provided by the embodiment of the present invention is not limited to a data source form, and may be from one data source or multiple source data; nor is it limited to the number of data copies, and may be one piece of data. It can also be multiple copies of data.

Specifically, the original data is prepared, the original data to be compressed is loaded into the memory, and the block processing is performed to obtain M pieces of data. If there are multiple pieces of data, the large data is divided into blocks according to the data source classification. The block size is adjustable and is set according to the machine memory configuration. The default is 10MB. For example, if there are 10 data, 9 1M size, and 1 10G size, then only 10G data can be divided into blocks, and another 9 data can be processed as 1 copy.

Specifically, when the first thread pool is set in advance, a reasonable number of threads N ₁ needs to be set according to the performance of the machine. In general, the number of threads N _{1 is} less than or equal to the maximum number of threads of the machine.

Specifically, the above parallel compression process is: one thread processes a data block, and recycles the thread pool until all the data is compressed, and different data compression takes time, so the thread pool needs to be flexibly scheduled to ensure that all threads are in Calculate the status. More specifically, the N ₁ threads in the preset first thread pool respectively correspond to compressing N pieces of N ₁ data blocks in the data block, and the data blocks corresponding to any one of the N ₁ threads are compressed. Thereafter, the remaining uncompressed data blocks are processed by the thread until the data blocks of the M shares are compressed.

Specifically, the purpose of preserving the preset space in the file header portion of the gz format is to record the size (gzDi) of the plurality of compressed data as a fast address index in the subsequent decompression process, thereby implementing multi-thread decompression.

Specifically, the compressed data gzDi is written to the hard disk and the corresponding size (gzDi) is recorded according to requirements. If it is a single hard disk system, single-threaded writing, if it is a multi-machine distributed system, that is, a multi-hard disk system, the number of writing threads is determined according to the actual number of hard disks.

The size (gzDi) of the record is necessary for multi-thread decompression. In the embodiment of the present invention, the size (gzDi) is recorded in the data header of the gz compression format, and in fact, may also be recorded in the memory or index according to the requirements of the software system. List. Through this specially designed compressed data structure, the decompression process can support single-thread decompression of zlib and pigz, or multi-thread decompression using size(gzDi).

The decompression step S2 includes:

Specifically, the list information of size (gzDi) is used as a fast address index to perform segmentation, and then the subsequent multi-thread decompression is implemented.

Specifically, when the second thread pool is set in advance, it is necessary to set a reasonable number of threads N ₂ according to the performance of the machine and the number of blocks. In general, the number of threads N _{2 is} less than or equal to the maximum number of threads of the machine.

Specifically, the above parallel decompression process is that one thread processes one compressed data block and recycles the thread pool until all data is decompressed. More specifically, the N ₂ threads in the preset second thread pool are respectively corresponding to decompressing M pieces of N ₂ data blocks in the data block gzDi, and the data blocks corresponding to any one of the N ₂ threads are decompressed. After the completion, the remaining undecompressed data blocks are processed by the thread until the M pieces of the data block are decompressed, and the M pieces of decompressed original data Di are obtained.

Specifically, in the embodiment of the present invention, the N ₁ threads in the first thread pool are equal to the N ₂ threads in the second thread pool. It should be noted that if the amount of data is large and the number of data shares is sufficient, M is preferably an integer multiple of N ₁ and N ₂ to prevent the computing resource from being idle. In fact, N ₁ is not necessarily limited to N ₂ .

Step S23, the original data Di is decompressed in series according to the list information of the size (gzDi) to obtain complete original data.

It should be noted that both compression and decompression use thread pool technology to implement multi-threading, compression process inputs raw data, and generates compressed data and block information; multi-thread decompression process inputs compressed data and block information to obtain solutions. The compressed original information is shown in Figure 2.

The invention uses the thread pool to schedule multi-thread compression and decompression, and can better exert the performance of the hardware on different systems. Its multi-threaded scheduling pseudo code is as follows:

It should be noted that a major innovation of the present invention is that the block information is stored in the gz compression format data header, thereby ensuring compatibility. Using the bgz compression method, the processed format can still be decompressed using the original zlib or pigz single-thread decompression method without any modification. When using bgz multi-thread decompression, it can extract the block information from the gz compressed file itself, thus achieving multi-thread fast decompression.

The data structure of the Gz compression format header is as follows:

From the above data structure, we can see that there is an extra field in the gz header. Under normal circumstances, the extra field does not participate in the decompression process. When stored on the hard disk, the extra field of the gz header is empty. The bgz method provided by the invention can open up a fixed length extra field (capable of storing 100 blocks of block information) in advance, and in the continuous compression process, the additional field can be continually modified. If the number of data blocks is less than 100, the remaining space is 0 padding. If the amount of compressed data is large, when the data block exceeds 100, then the extra field of the 100 block space is opened again in the 101st data compression, and so on.

At present, one block block information occupies 8 bytes, and the main data size size (Di) and the compressed size (gzDi) of the block are mainly recorded before compression. Because each block data is independently compressed, each compressed data block gzDi will have a gz header, but not all headers have extra fields with block information. Depending on the number of blocks, the extra fields are only Exists in the data header of gzD100*i+1, where i=0,1,2....

The bgz file stored on the hard disk follows the gzip file format and consists of multiple data blocks. The contents of each data block are as follows:

+—+—+—+—+—+—+—+—+—+—+========//===================== =//==========+—+—+—+

|ID1|ID2|CM|FLG|MTIME|XFL|OS|Additional header fields|Compressed data|CRC32|ISIZE|

Each data block consists of three parts, a header part, a data part, and a tail part. From ID1 to the extra header field is the data header portion, and CRC32 and ISIZE are the tail portions. Except for the extra fields, the rest of the content is consistent with the normal gzip format, which is defined as follows:

ID1 and ID2: 1 byte each. Fixed value, ID1 = 31 (0 × 1F), ID2 = 139 (0 × 8B), indicating the GZIP format.

CM: 1 byte. Compression method. There is currently only one type: CM=8, indicating the DEFLATE method.

FLG: 1 byte. Sign.

Bit 0FTEXT – indicates text data

Bit 1FHCRC – indicates the presence of a CRC16 header check field

Bit 2FEXTRA – indicates the presence of an optional field

Bit 3FNAME – indicates the presence of the original filename field

Bit 4FCOMMENT – indicates the presence of a comment field

Bit 5-7 reserved

MTIME: 4 bytes. Change time. UINX format.

XFL: 1 byte. Additional logo. When CM=8, XFL=2 – the most compressed but slowest algorithm; XFL=4 – the fastest but least compressed algorithm

OS: 1 byte. Indicate the operating system, specifically the file system. The following definitions are available:

0–FAT file system (MS-DOS, OS/2, NT/Win32)

1–Amiga

2–VMS/OpenVMS

3–Unix

4–VM/CMS

5–Atari TOS

6–HPFS file system (OS/2, NT)

7–Macintosh

8–Z-System

9–CP/M

10–TOPS-20

11–NTFS File System (NT)

12–QDOS

13–Acorn RISCOS

255–unknown

Additional header fields:

FLG.FEXTRA=1 means there is an extra field, XLEN means extra field length, 800

+—+—+—+—+===================================

|SI1|SI2|XLEN|Optional for length XLEN bytes|

+—+—+—+—+===================================

FLG.FNAME=0 means no original file

FLG.FCOMMENT=0 means there is no comment information. If it is equal to 1, add comment information.

FLG.FHCRC=0 indicates that the default CRC32 check is used. If it is equal to 1, the CRC16 check is used.

The above-mentioned zlib multi-thread compression and decompression algorithm is for DNA sequence analysis services. Therefore, in the era of bioinformatics big data analysis, the requirement for the efficiency of compression and storage of base pair information files is extremely important, so relying on multiple The improvement of thread compression and decompression is still not enough for massive data. I think that the multi-threaded approach can still be used in file reading and writing and transmission. At the same time, we also know that compressed storage can be solidified on the hardware. In part, you can make file compression storage more efficient.

In addition, in this implementation of the zlib multi-thread compression decompression algorithm, you can also write the file block information in the extra field of the zlib header file, so that multi-thread compression decompression can be better integrated into zlib In the library, instead of creating an index file to record separately, multi-thread compression decompression can be more simple and convenient to call, which can save the reading and writing time of the index file.

The multi-thread compression and decompression method of the universal data gz format provided by the invention adopts a parallel multi-thread gz compression method separated from the read-write IO operation, and the read, write, compress and decompress are separated from each other, and can be separated according to the actual computing platform. The resources are reasonably matched with compression and decompression. The compression end and the decompression end may not be on the same machine. The read and write threads, compression and decompression threads are allocated reasonably according to the needs to maximize the performance of the machine. In addition, the multi-threaded gz decompression method allows the software to use more CPUs for decompression calculations on the computer, resulting in larger data input, and a single program can achieve higher computational occupancy; In the case of software compatibility, the compressed data structure is specially designed to store the size (gzDi) gz format in the header section to ensure that the existing zlib and pigz methods can be decompressed without modification. In this way, the user's version update concerns are greatly reduced, especially in the case where the data producer is separated from the data user, there is no version incompatibility, so that the method promotion cost is extremely low, only when the user thinks that there is When the relevant decompression requirements are concerned, the software can be replaced.

Referring to FIG. 3, a multi-thread compression and decompression device of a general data gz format according to an embodiment of the present invention includes: a compression module 1 and a decompression module 2, wherein the compression module 1 includes:

The blocking module 11 is configured to input original data, and perform block processing on the original data to obtain M data blocks; wherein each data block is represented as Di, i ∈ [0, M-1];

The decompression module 2 includes:

Regarding the application field, the gz compression format is used in Internet text data archival storage, network transmission, and general storage of FASTQ data (partial network transmission uses bz2 compression). For software platforms with huge data volume, multi-threaded gz compression schemes are widely used. On the Linux platform, the pigz method is mainly used. On other platforms such as windows, the zlib library is redeveloped to achieve multi-thread compression. This is the currently known application area and application method.

In the field of high-throughput DNA sequencing, with the further development of bioinformatics computing, high-performance computers are used for bioinformatics analysis, fastq data may be read multiple times for information statistics or calculations, so for the fastq.gz file format Multi-threaded decompression will also become an important potential application area.

With the further development of cloud computing, data centralized processing and centralized storage will become more and more common. Both storage and transmission, multi-thread compression and decompression can be widely used.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

A multi-thread compression and decompression method of the general data gz format, the method comprising: a compression step S1 and a decompression step S2, wherein the compressing step S1 comprises:

Step S11, inputting original data, and performing block processing on the original data to obtain M pieces of data blocks;

Wherein, each data block is represented as Di, i ∈ [0, M-1];

Step S12, preset by the first thread pool threads N 1 of each compressed data block to the M parts of the compression process is reserved in the header portion of the predetermined space of the gz format to obtain data compression gzDi parts M And the size of the data gzDi size (gzDi);

Step S13, sequentially writing the M pieces of the compressed data gzDi into the disk, and sequentially writing the size (gzDi) of the corresponding M pieces of the data gzDi into the preset space to obtain compressed data;

The decompression step S2 includes:

Step S21, inputting the compressed data, reading the written list information of the size (gzDi), and segmenting the compressed data according to the list information of the size (gzDi) to obtain M pieces of data gzDi ;

Step S22, using the N 2 threads in the preset second thread pool to decompress M pieces of the data block gzDi, respectively, to obtain M pieces of decompressed original data Di;

Step S23, the original data Di is decompressed in series according to the list information of the size (gzDi) to obtain complete original data.
The multi-thread compression and decompression method according to claim 1, wherein the original data is multi-source data.
Multithreading according to claim 1 compression and decompression method, wherein said first thread pool using N 1 of the preset compression threads are parts of the M data block, comprising:

Preset by the first thread pool threads N 1 of the N 1 respectively corresponding to the compressed data blocks in the data block M parts, and after any one of the N 1 corresponding to the thread in the thread compressed data block is completed, continue to use The thread processes the remaining uncompressed data blocks until the M portions of the data blocks are compressed.
The multi-thread compression and decompression method according to claim 1, wherein the step S22 is specifically:

After a second preset using thread pool threads corresponding to N 2 N 2 decompressed data blocks of the data block M gzDi parts respectively, corresponding to the thread of any one of N 2 threads decompressed data blocks is completed, continued The remaining undecompressed data blocks are processed by the thread until the M pieces of the data blocks are decompressed, and the M pieces of decompressed original data Di are obtained.
The multi-thread compression and decompression method according to any one of claims 1 to 4, wherein N 1 threads in the first thread pool are equal to N 2 threads in the second thread pool.
A multi-thread compression and decompression device of the general-purpose data gz format, the device comprising: a compression module 1 and a decompression module 2, wherein the compression module 1 comprises:

The blocking module 11 is configured to input original data, and perform block processing on the original data to obtain M pieces of data blocks;

Wherein, each data block is represented as Di, i ∈ [0, M-1];

The compression module 12 is configured to respectively compress the M data blocks by using N 1 threads in the preset first thread pool, and reserve a preset space in the file header portion of the gz format during the compression process, and obtain M compression. Data gzDi and the size of the data gzDi size (gzDi);

The writing module 13 is configured to sequentially write the M pieces of the compressed data gzDi into the disk, and sequentially write the size (gzDi) of the corresponding M pieces of the data gzDi into the preset space to obtain compression. data;

The decompression module 2 includes:

a segmentation module 21, configured to input the compressed data, read the written list information of the size (gzDi), and segment the compressed data according to the list information of the size (gzDi) to obtain M Data block gzDi;

The decompression module 22 is configured to decompress the M pieces of the data block gzDi by using N 2 threads in the preset second thread pool to obtain the M pieces of decompressed original data Di;

The serial module 23 is configured to obtain the complete original data by serially decompressing the original data Di according to the list information of the size (gzDi).
The multi-thread compression and decompression apparatus according to claim 6, wherein said raw data is multi-source data.
Multithreading according to claim 6 compression and decompression means, wherein said compression module 12 is specifically configured to: use a first preset thread pool threads of the N 1 corresponding compressed parts of the data block M After N 1 data blocks in the N 1 data block, after the data block corresponding to any one of the N 1 threads is compressed, the remaining uncompressed data blocks are processed by the thread until the M pieces of the data block are compressed, and the compression process is completed. The preset space is reserved in the header portion of the gz format, and the size data (gzDi) of the M-compressed data gzDi and the data gzDi is obtained.
The multi-thread compression and decompression apparatus according to claim 6, wherein the decompression module 22 is specifically configured to: decompress M pieces of the data block by using N 2 threads in a preset second thread pool. N 2 data blocks in gzDi, after the data blocks corresponding to any one of the N 2 threads are decompressed, continue to use the thread to process the remaining undecompressed data blocks until the M pieces of the data blocks are decompressed and obtained. The original data Di after decompression of M parts.
The multi-thread compression and decompression apparatus according to any one of claims 6 to 9, wherein N 1 threads in the first thread pool are equal to N 2 threads in the second thread pool.