CN108363719B

CN108363719B - Configurable transparent compression method in distributed file system

Info

Publication number: CN108363719B
Application number: CN201810002379.8A
Authority: CN
Inventors: 李新明; 刘斌
Original assignee: Edge Intelligence Of Cas Co ltd
Current assignee: Edge Intelligence Of Cas Co ltd
Priority date: 2018-01-02
Filing date: 2018-01-02
Publication date: 2022-10-21
Anticipated expiration: 2038-01-02
Also published as: CN108363719A

Abstract

The invention provides a configurable transparent compression method in a distributed file system, which comprises the following steps: checking whether idle compression working threads exist in the independent process of the DataNode, and if the idle compression working threads exist, searching the to-be-compressed block file with the highest compression priority in the corresponding disk according to a set priority rule; calling a compression working thread to carry out pressure test on the partitioned file, and estimating an expected compression ratio of the partitioned file: 1) And calling a compression working thread to compress the block file according to the set compression rule in response to the expected compression ratio being larger than the set threshold value. The compression method provided by the invention is an asynchronous compression method, the cluster can perform data compression when the CPU and the IO are relatively idle, and sets the priority of the data file to be compressed, thereby achieving the effect of peak clipping and valley filling on the CPU/IO while ensuring the compression effect.

Description

Configurable transparent compression method in distributed file system

Technical Field

The invention relates to the field of compression methods, and belongs to a configurable transparent compression method in a distributed file system.

Background

Distributed storage technology is one of the common means for enterprises to deal with the problem of large-scale data storage. However, as the cluster size grows, the problem of cluster storage space becomes increasingly prominent. How to reduce the cost of data ownership and improve the storage capacity of the existing cluster become the problems that need to be considered and solved by the technical departments of enterprises.

The method improves the design of the data storage node on the basis of the traditional distributed file system, and adds the transparent compression characteristic, thereby achieving the purposes of saving the storage space, avoiding the compression from influencing the calculation operation of the upper storage layer and being transparent to the user.

The transparent compression method in the file system is mostly a synchronous compression method, compression is performed before data flow, so as to improve the data transmission efficiency of a network and a hard disk, and simultaneously reduce the disk consumption to a certain extent, at the cost of occupying certain computing resources for data compression and decompression.

The CPU utilization of the on-line cluster is high, and synchronous compression during writing data to the distributed file system occupies CPU computing resources, resulting in slow computation tasks and adversely affecting the data writing speed.

Disclosure of Invention

The invention aims to provide a configurable transparent compression method in a distributed file system, which is an asynchronous compression method, wherein a cluster can perform data compression when a CPU and an IO are relatively idle, and sets the priority of a data file to be compressed, so that the effect of peak clipping and valley filling of the CPU/IO is achieved while the compression effect is ensured.

In order to achieve the purpose, the invention provides the following technical scheme:

a configurable transparent compression method in a distributed file system, comprising:

step 1, providing a DataNode independent process which is used as a distributed storage cluster data node and is used for storing file blocks in a local file form, and providing a Compressor worker independent process used for executing a compression task, wherein the Compressor worker independent process comprises a plurality of Compressor worker thread working threads, an independent Compressor worker thread working thread is distributed for each disk, and the Compressor worker thread working threads are used for executing the compression task distributed to the current disk;

step 2, a heartbeat report is periodically sent to a local DataNode independent process through a Compressor worker independent process so as to inform the DataNode independent process of the current compression task state;

step 3, responding to a heartbeat report sent by the Compressor worker independent process, checking whether the DataNode independent process has an idle Compressor worker thread, if so, calling the DataNode independent process, searching a to-be-compressed block file with the highest compression priority in a corresponding disk according to a set priority rule, and feeding back a compression task corresponding to the block file to the Compressor worker independent process as a return value;

step 4, responding to the received compression task of any one of the partitioned files, calling a Compressor worker independent process, distributing the compression task of the partitioned file to a Compressor worker thread of a corresponding disk, carrying out pressure test on the partitioned file by the Compressor worker thread, and estimating an expected compression ratio of the partitioned file:

1) In response to the expected compression ratio being larger than the set threshold, calling a Compressor worker thread to compress the block file according to the set compression rule;

2) And responding to the expected compression ratio smaller than or equal to the set threshold value, judging that the block file does not need to be compressed, sending the judgment result to the independent process of the DataNode in the next heartbeat report, and informing the independent process of the DataNode that the block file is not marked as the block file to be compressed any more.

The invention has the beneficial effects that:

the cluster can perform data compression when the CPU and the IO are relatively idle, and set the priority of the data file to be compressed, so that the effect of peak clipping and valley filling of the CPU/IO is achieved while the compression effect is ensured.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

FIG. 1 is a flow chart of the operation of the Compressor Worker of the present invention.

Fig. 2 is a transparent compression architecture diagram of the present invention.

FIG. 3 is a flow chart of the Compressor Worker Thread work of the present invention.

FIG. 4 is a diagram illustrating a compressed block management method according to the present invention.

FIG. 5 is a schematic diagram of a compressed block.

Detailed Description

The following detailed description of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

For convenience of description, noun explanations that may be referred to hereinafter are first defined.

NameNode: the system comprises an independent process, distributed storage cluster metadata nodes, file system metadata management and database and file mapping maintenance.

A DataNode: and the independent process stores the cluster data nodes in a distributed manner, and stores file blocks (Block) in a local file form.

Compressor Worker (CW): and the independent process is deployed to the DataNode machine. The compression task is performed. The roles are a decider and a compression executor of whether to perform compression.

Compressor WorerThread (CWT): the worker threads in the Compressor worker, one processing thread per disk. For performing compression tasks.

Compressor admin: and the out-of-band tool is mainly used for manually interfering the priority of block compression and plays a role in suggesting the compression task per se.

Codec type: the method adopts various methods to test the pressure of the file, and adopts the algorithm with the best compression ratio to compress.

In the asynchronous compression process mentioned in the present invention, three of the foregoing terms are mainly included: nameNode, dataNode, CW. In order to enable the asynchronous compression to be online in batch, the compression method avoids modifying NameNode logic in the asynchronous compression process, and enables the compression to be transparent to NameNode. The DataNode is equivalent to a master in the asynchronous compression process and is mainly used for determining which blocks need to be compressed preferentially and allocating block compression tasks. The CW is responsible for performing the compression tasks assigned to it by the DataNode.

The following is a detailed description of the asynchronous compression method referred to in the present invention.

With reference to fig. 1 and fig. 2, the present invention provides a configurable transparent compression method in a distributed file system, including:

step 1, providing a DataNode independent process which is used as a distributed storage cluster data node and is used for storing file blocks in a local file form, and providing a Compressor worker independent process used for executing a compression task, wherein the Compressor worker independent process comprises a plurality of Compressor worker thread working threads, an independent Compressor worker thread working thread is distributed for each disk, and the Compressor worker thread working threads are used for executing the compression task distributed to the current disk.

And 2, periodically initiating heartbeat report to a local DataNode independent process through a Compressor worker independent process so as to inform the DataNode independent process of the current compression task state.

The CW periodically initiates a heartbeat RPC to the local DataNode. Tells the DataNode how many block compression tasks are currently running, which blocks have completed compression and which blocks are not compressed. After receiving the RPC, the DataNode allocates tasks to the CW, and the return value of the RPC is the block which needs to be compressed by the CW.

And 3, responding to a heartbeat report sent by the Compressor worker independent process, checking whether an idle Compressor worker thread exists in the DataNode independent process, if so, calling the DataNode independent process, searching a to-be-compressed block file with the highest compression priority in a corresponding disk according to a set priority rule, and feeding back a compression task corresponding to the block file to the Compressor worker independent process as a return value.

And 4, responding to the received compression task of any one of the partitioned files, calling a Compressor worker independent process, distributing the compression task of the partitioned file to a Compressor worker thread of a corresponding disk, and carrying out pressure test on the partitioned file by the Compressor worker thread to estimate the expected compression ratio of the partitioned file.

1) And in response to the expected compression ratio being larger than the set threshold value, calling a Compressor worker thread to compress the block file according to the set compression rule.

The purpose of compression is to enable a compression task and a calculation task to run off peak on the basis of improving the utilization rate of a disk space, and avoid the compression task from influencing the progress of the calculation task. The most core problem of asynchronous compression is two:

the first question, which data to press?

The second question, which data to press first?

The above two problems are discussed one after the other.

Problem one, determining data needing compression

Conventionally, a compressed file system compresses all data, which may cause extra resource waste if the data is compressed without being separated into a scarlet and a soapy place.

To guarantee higher compression yields, we have determined the following rules:

1) Data with a high compression ratio is compressed.

2) And (5) repeatedly testing the pressure of the data with low compression ratio.

3) And the last block corresponding to the appendix file is not compressed.

Aiming at the first problem, the method solves the problem through pressure test, and only the partitioned file with the expected compression ratio larger than the set threshold value is called to compress the partitioned file according to the set compression rule by a Compressor worker thread.

With reference to fig. 3, the set compression rule is:

and compressing the chunk blocks of the block files with the compression task, writing the compressed data into a corresponding folder under the tmp directory every time one chunk is read, and generating an Index file.

Problem two, determining priority of compression

The priority of compression depends mainly on 2 factors:

1) Atime of Block.

2) Admin suggested priority.

For the factor 1, specifically, the set priority rule is that a field access time used for expressing access time is added to each block file in the independent process of the DataNode, the block file with the last access time represented by the access time being greater than a set time threshold is marked as a block file to be compressed, and the earlier the last access time represented by the access time is, the higher the compression priority of the corresponding block file is.

Factor 1 is based on the theory that: suppose "data that has not been accessed for a long time in the past, will not be accessed frequently in the future".

According to the method, a field access time (atime for short) is added in a Block of the DataNode, and the DataNode updates the access time when a WRITE _ BLOCK request exists. The fact that the atime of the local file system is not used is that operations such as balance, block check, block copy and the like all affect the atime of the file, and misjudgment is caused.

A block is compressed only if it has not been accessed for a period of time and its compression is high (above a certain threshold).

In some examples, the method further comprises:

an out-of-band tool for human intervention, defined as the Compressor admin, is provided with policy suggestions for the Compressor Worker independent process to change the priority of any one compression task.

In addition to the access time of Block, considering that the user may need to compress based on the level of the file/directory, the user needs to prioritize certain blocks to be compressed more preferentially. There are two main strategies: suggested policies and decision policies. And (4) proposing a strategy, providing strategy suggestions for an external Compressor Worker, and influencing the priority of the compression task by each role according to the grasped information. The decision strategy is a factor that really decides whether to compress or not.

And (4) suggesting a strategy:

the priority of the Block compression task is adjusted by the Compressor Admin.

And (3) determining a strategy:

a DataNode: and recording the access time of the Block, and adjusting the priority of Block compression according to the access time. If atime is below a certain threshold, then no pressure is applied.

Compressor Worker: and testing the pressure of the Block, and determining whether the Block file is finally compressed or not according to the compression ratio of the Block file.

The third rule mentioned in the first problem, that is, the last block corresponding to the appendix file is not compressed. The specific description is as follows:

if the last block of the additionally written object file is compressed, when the object file is applied, the last block is decompressed to a tmp directory, then data is additionally written into the block, and when the file is closed, the file mv is returned to the current directory. After the file is additionally written, the database node is notified by the heartbeat from CW to the database node, and the flag is not compressed (update atime).

In further examples, with reference to fig. 4, the method further includes:

the following 3 kinds of information are added to the compressed block file name: the pre-compression length, the compression algorithm employed, and the file suffix cdata.

And adding the original checksum length and the file suffix cmeta in the file name of the meta file corresponding to the compressed file.

The data node stores the data structure mode as follows:

as shown in the table above, the method adds the following 3 kinds of information to the compressed data file name, the length before compression, the compression algorithm used, and the file suffix cdata. The file name of the meta file (the cmeta file) corresponding to the compressed file is increased by the original checksum length and the file suffix cmeta. The content in the compressed data file is compressed data. The Cmeta file structure is as follows:

check sum (original meta file content)
	1st chunk post-compression offset (int)
2nd chunk pressurePost-reduction offset (int)
	…
Post chunk compressed offset (int)

This design saves file data related information (codec type and pre-compression length) in the file path, which reduces the extra random disk access caused when reading compressed data compared to saving codec in a meta file or data file. In addition, when the data of the file and the meta are not matched, the data file can be independently decompressed, and check sum check is carried out according to the currently available meta.

Further, the method further comprises:

a READ BLOCK independent process is provided for reading the BLOCK file.

In response to a request instruction for reading any one of the BLOCK files sent by a user side, adopting a READ _ BLOCK independent process to inquire whether the BLOCK file is compressed or not:

1) And responding to the uncompressed block file, and directly sending the original data of the block file to the user side.

2) And in response to the compression of the partitioned file, decompressing the partitioned file at the DataNode terminal, and returning the decompressed original data and the corresponding checksum to the user terminal.

By modifying the READ _ BLOCK RPC method, when a client READs a data BLOCK using the READ _ BLOCK, if the corresponding BLOCK is already compressed, the BLOCK is compressed at the DataNode end, and the decompressed data is returned to the client, and for the client, the original data and the corresponding checksum are READ.

The method further comprises the following steps:

in the data reading process, each chunk is read, the data corresponding to the chunk is decompressed, the decompressed data is written into a corresponding file under a tmp directory, and an Index file is generated and stores the compressed offset corresponding to each chunk according to a fixed length.

In the data reading process, each chunk is read, the data corresponding to the chunk is decompressed, the decompressed data is written into a corresponding file under a tmp directory, and an index file is generated.

The Index file is located at the end of the cmeta file and has the following structure:

check sum (original meta file content)
	1st chunk post-compression offset (int)
2nd chunk post-compression offset (int)
	…
Post chunk compressed offset (int)

Index is stored in fixed length, one field per 8B. The compressed offset for each chunk is stored.

The process of random reading is illustrated:

FIG. 5 is a block after compression. The Chunk size is 256K, and the offset of each Chunk is stored in an index file. If seek is to the offset of 1.1M before compression, we find the offset 1M closest to 1.1M due to the fixed-length storage, and then seek is to the position of Index file offset 16B, read an int, which is the offset where 1M is located, and use this as the offset for starting reading.

The compressed data is sequentially read from the start offset and decompressed, and the position of 1.1M is found, and the data is read out.

In other embodiments, the method further comprises:

in step 3, in response to the heartbeat report sent by the Compressor Worker independent process, the number of local data read-write connections is checked, and in response to the number of connections exceeding a set connection threshold, the compression task allocation is reduced or stopped according to a set rule, and the read-write disk speed of part or all of the Compressor Worker Thread is limited.

Further, the method further comprises:

the Compressor worker independent process is set to idle level using cpu nic and io nic.

Resource control of the compression task needs to ensure that the compression task works in a relatively idle time period of the cluster as far as possible, and when a computing task runs at the node, the compression task is prevented from influencing the progress of the computing task.

In order to better realize resource isolation, the method allocates the executor of the compression task as a process, namely, compression Worker. The resource usage of the CW is limited in two ways:

when the DataNode distributes the block compression task, the number of local data read-write connections is considered, and when the number of the connections is large, the distribution speed of the block compression task is properly reduced.

The read and write disk speed of the Compressor Worker Thread is limited by the Compressor Worker process.

The CPU nic and the io nic set the Compressor process to idle level, so as to avoid the CW seizing the CPU/io when the calculation is busy at the operating system level.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A configurable transparent compression method in a distributed file system, comprising:

1) In response to the fact that the expected compression ratio is larger than the set threshold value, calling a Compressor workerThread working thread to compress the block file according to the set compression rule;

2) Responding to the expected compression ratio smaller than or equal to the set threshold value, judging that the block file does not need to be compressed, sending the judgment result to the independent process of the DataNode in the next heartbeat report, and informing the independent process of the DataNode that the block file is not marked as the block file to be compressed any more;

the method comprises the steps that a Compressor worker independent process is set to an idle level through a cpu nic and an io nic, the Compressor worker is prevented from seizing cpu/io when the calculation is busy on the operating system level, the resource control of a compression task needs to ensure that the compression task works in a relatively idle time period of a cluster as far as possible, and when the calculation task runs at the node, the compression task is prevented from influencing the progress of the calculation task; when the DataNode distributes the block compression tasks, the number of local data read-write connections is considered, and when the number of the connections is large, the distribution speed of the block compression tasks is properly reduced; the read-write disk speed of the Compressor worker thread is limited by the Compressor worker independent process.

2. The configurable transparent compression method in a distributed file system according to claim 1, wherein said set compression rule is,

3. The configurable transparent compression method in a distributed file system according to claim 1, wherein said set priority rule is,

adding a field access time for expressing access time to each block file in the independent progress of the DataNode, marking the block file with the last access time represented by the access time larger than a set time threshold as a block file to be compressed, and,

the earlier the access time indicated by the access time is, the higher the compression priority of the block file corresponding to the access time is.

4. A method of configurable transparent compression in a distributed file system as claimed in claim 1 or 3, wherein said set priority rules further comprise:

the last block file corresponding to the additional file is not compressed.

5. A method of configurable transparent compression in a distributed file system as claimed in claim 1 or 3, the method further comprising:

6. The configurable transparent compression method in a distributed file system as claimed in claim 1, wherein the method further comprises:

the following 3 kinds of information are added to the compressed block file name: the pre-compression length, the compression algorithm used, and the file suffix cdata;

and adding the original checksum length and the file suffix cmeta to the file name of the meta file corresponding to the compressed file.

7. The configurable transparent compression method in a distributed file system as claimed in claim 6, wherein the method further comprises:

providing a READ BLOCK independent process for reading the BLOCK file;

responding to a request instruction of reading any one BLOCK file from a user terminal, adopting a READ _ BLOCK independent process to inquire whether the BLOCK file is compressed or not,

1) Responding to the uncompressed block file, and directly sending the original data of the block file to the user side;

2) And in response to the compression of the block file, decompressing the block file at the DataNode terminal, and returning the decompressed original data and the corresponding checksum to the user terminal.

8. The configurable transparent compression method in a distributed file system of claim 2, the method further comprising: