CN108363719B - Configurable transparent compression method in distributed file system - Google Patents

Configurable transparent compression method in distributed file system Download PDF

Info

Publication number
CN108363719B
CN108363719B CN201810002379.8A CN201810002379A CN108363719B CN 108363719 B CN108363719 B CN 108363719B CN 201810002379 A CN201810002379 A CN 201810002379A CN 108363719 B CN108363719 B CN 108363719B
Authority
CN
China
Prior art keywords
compression
file
compressor
block
independent process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810002379.8A
Other languages
Chinese (zh)
Other versions
CN108363719A (en
Inventor
李新明
刘斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Edge Intelligence Of Cas Co ltd
Original Assignee
Edge Intelligence Of Cas Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Edge Intelligence Of Cas Co ltd filed Critical Edge Intelligence Of Cas Co ltd
Priority to CN201810002379.8A priority Critical patent/CN108363719B/en
Publication of CN108363719A publication Critical patent/CN108363719A/en
Application granted granted Critical
Publication of CN108363719B publication Critical patent/CN108363719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a configurable transparent compression method in a distributed file system, which comprises the following steps: checking whether idle compression working threads exist in the independent process of the DataNode, and if the idle compression working threads exist, searching the to-be-compressed block file with the highest compression priority in the corresponding disk according to a set priority rule; calling a compression working thread to carry out pressure test on the partitioned file, and estimating an expected compression ratio of the partitioned file: 1) And calling a compression working thread to compress the block file according to the set compression rule in response to the expected compression ratio being larger than the set threshold value. The compression method provided by the invention is an asynchronous compression method, the cluster can perform data compression when the CPU and the IO are relatively idle, and sets the priority of the data file to be compressed, thereby achieving the effect of peak clipping and valley filling on the CPU/IO while ensuring the compression effect.

Description

Configurable transparent compression method in distributed file system
Technical Field
The invention relates to the field of compression methods, and belongs to a configurable transparent compression method in a distributed file system.
Background
Distributed storage technology is one of the common means for enterprises to deal with the problem of large-scale data storage. However, as the cluster size grows, the problem of cluster storage space becomes increasingly prominent. How to reduce the cost of data ownership and improve the storage capacity of the existing cluster become the problems that need to be considered and solved by the technical departments of enterprises.
The method improves the design of the data storage node on the basis of the traditional distributed file system, and adds the transparent compression characteristic, thereby achieving the purposes of saving the storage space, avoiding the compression from influencing the calculation operation of the upper storage layer and being transparent to the user.
The transparent compression method in the file system is mostly a synchronous compression method, compression is performed before data flow, so as to improve the data transmission efficiency of a network and a hard disk, and simultaneously reduce the disk consumption to a certain extent, at the cost of occupying certain computing resources for data compression and decompression.
The CPU utilization of the on-line cluster is high, and synchronous compression during writing data to the distributed file system occupies CPU computing resources, resulting in slow computation tasks and adversely affecting the data writing speed.
Disclosure of Invention
The invention aims to provide a configurable transparent compression method in a distributed file system, which is an asynchronous compression method, wherein a cluster can perform data compression when a CPU and an IO are relatively idle, and sets the priority of a data file to be compressed, so that the effect of peak clipping and valley filling of the CPU/IO is achieved while the compression effect is ensured.
In order to achieve the purpose, the invention provides the following technical scheme:
a configurable transparent compression method in a distributed file system, comprising:
step 1, providing a DataNode independent process which is used as a distributed storage cluster data node and is used for storing file blocks in a local file form, and providing a Compressor worker independent process used for executing a compression task, wherein the Compressor worker independent process comprises a plurality of Compressor worker thread working threads, an independent Compressor worker thread working thread is distributed for each disk, and the Compressor worker thread working threads are used for executing the compression task distributed to the current disk;
step 2, a heartbeat report is periodically sent to a local DataNode independent process through a Compressor worker independent process so as to inform the DataNode independent process of the current compression task state;
step 3, responding to a heartbeat report sent by the Compressor worker independent process, checking whether the DataNode independent process has an idle Compressor worker thread, if so, calling the DataNode independent process, searching a to-be-compressed block file with the highest compression priority in a corresponding disk according to a set priority rule, and feeding back a compression task corresponding to the block file to the Compressor worker independent process as a return value;
step 4, responding to the received compression task of any one of the partitioned files, calling a Compressor worker independent process, distributing the compression task of the partitioned file to a Compressor worker thread of a corresponding disk, carrying out pressure test on the partitioned file by the Compressor worker thread, and estimating an expected compression ratio of the partitioned file:
1) In response to the expected compression ratio being larger than the set threshold, calling a Compressor worker thread to compress the block file according to the set compression rule;
2) And responding to the expected compression ratio smaller than or equal to the set threshold value, judging that the block file does not need to be compressed, sending the judgment result to the independent process of the DataNode in the next heartbeat report, and informing the independent process of the DataNode that the block file is not marked as the block file to be compressed any more.
The invention has the beneficial effects that:
the cluster can perform data compression when the CPU and the IO are relatively idle, and set the priority of the data file to be compressed, so that the effect of peak clipping and valley filling of the CPU/IO is achieved while the compression effect is ensured.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of the operation of the Compressor Worker of the present invention.
Fig. 2 is a transparent compression architecture diagram of the present invention.
FIG. 3 is a flow chart of the Compressor Worker Thread work of the present invention.
FIG. 4 is a diagram illustrating a compressed block management method according to the present invention.
FIG. 5 is a schematic diagram of a compressed block.
Detailed Description
The following detailed description of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
For convenience of description, noun explanations that may be referred to hereinafter are first defined.
NameNode: the system comprises an independent process, distributed storage cluster metadata nodes, file system metadata management and database and file mapping maintenance.
A DataNode: and the independent process stores the cluster data nodes in a distributed manner, and stores file blocks (Block) in a local file form.
Compressor Worker (CW): and the independent process is deployed to the DataNode machine. The compression task is performed. The roles are a decider and a compression executor of whether to perform compression.
Compressor WorerThread (CWT): the worker threads in the Compressor worker, one processing thread per disk. For performing compression tasks.
Compressor admin: and the out-of-band tool is mainly used for manually interfering the priority of block compression and plays a role in suggesting the compression task per se.
Codec type: the method adopts various methods to test the pressure of the file, and adopts the algorithm with the best compression ratio to compress.
In the asynchronous compression process mentioned in the present invention, three of the foregoing terms are mainly included: nameNode, dataNode, CW. In order to enable the asynchronous compression to be online in batch, the compression method avoids modifying NameNode logic in the asynchronous compression process, and enables the compression to be transparent to NameNode. The DataNode is equivalent to a master in the asynchronous compression process and is mainly used for determining which blocks need to be compressed preferentially and allocating block compression tasks. The CW is responsible for performing the compression tasks assigned to it by the DataNode.
The following is a detailed description of the asynchronous compression method referred to in the present invention.
With reference to fig. 1 and fig. 2, the present invention provides a configurable transparent compression method in a distributed file system, including:
step 1, providing a DataNode independent process which is used as a distributed storage cluster data node and is used for storing file blocks in a local file form, and providing a Compressor worker independent process used for executing a compression task, wherein the Compressor worker independent process comprises a plurality of Compressor worker thread working threads, an independent Compressor worker thread working thread is distributed for each disk, and the Compressor worker thread working threads are used for executing the compression task distributed to the current disk.
And 2, periodically initiating heartbeat report to a local DataNode independent process through a Compressor worker independent process so as to inform the DataNode independent process of the current compression task state.
The CW periodically initiates a heartbeat RPC to the local DataNode. Tells the DataNode how many block compression tasks are currently running, which blocks have completed compression and which blocks are not compressed. After receiving the RPC, the DataNode allocates tasks to the CW, and the return value of the RPC is the block which needs to be compressed by the CW.
And 3, responding to a heartbeat report sent by the Compressor worker independent process, checking whether an idle Compressor worker thread exists in the DataNode independent process, if so, calling the DataNode independent process, searching a to-be-compressed block file with the highest compression priority in a corresponding disk according to a set priority rule, and feeding back a compression task corresponding to the block file to the Compressor worker independent process as a return value.
And 4, responding to the received compression task of any one of the partitioned files, calling a Compressor worker independent process, distributing the compression task of the partitioned file to a Compressor worker thread of a corresponding disk, and carrying out pressure test on the partitioned file by the Compressor worker thread to estimate the expected compression ratio of the partitioned file.
1) And in response to the expected compression ratio being larger than the set threshold value, calling a Compressor worker thread to compress the block file according to the set compression rule.
2) And responding to the expected compression ratio smaller than or equal to the set threshold value, judging that the block file does not need to be compressed, sending the judgment result to the independent process of the DataNode in the next heartbeat report, and informing the independent process of the DataNode that the block file is not marked as the block file to be compressed any more.
The purpose of compression is to enable a compression task and a calculation task to run off peak on the basis of improving the utilization rate of a disk space, and avoid the compression task from influencing the progress of the calculation task. The most core problem of asynchronous compression is two:
the first question, which data to press?
The second question, which data to press first?
The above two problems are discussed one after the other.
Problem one, determining data needing compression
Conventionally, a compressed file system compresses all data, which may cause extra resource waste if the data is compressed without being separated into a scarlet and a soapy place.
To guarantee higher compression yields, we have determined the following rules:
1) Data with a high compression ratio is compressed.
2) And (5) repeatedly testing the pressure of the data with low compression ratio.
3) And the last block corresponding to the appendix file is not compressed.
Aiming at the first problem, the method solves the problem through pressure test, and only the partitioned file with the expected compression ratio larger than the set threshold value is called to compress the partitioned file according to the set compression rule by a Compressor worker thread.
With reference to fig. 3, the set compression rule is:
and compressing the chunk blocks of the block files with the compression task, writing the compressed data into a corresponding folder under the tmp directory every time one chunk is read, and generating an Index file.
Problem two, determining priority of compression
The priority of compression depends mainly on 2 factors:
1) Atime of Block.
2) Admin suggested priority.
For the factor 1, specifically, the set priority rule is that a field access time used for expressing access time is added to each block file in the independent process of the DataNode, the block file with the last access time represented by the access time being greater than a set time threshold is marked as a block file to be compressed, and the earlier the last access time represented by the access time is, the higher the compression priority of the corresponding block file is.
Factor 1 is based on the theory that: suppose "data that has not been accessed for a long time in the past, will not be accessed frequently in the future".
According to the method, a field access time (atime for short) is added in a Block of the DataNode, and the DataNode updates the access time when a WRITE _ BLOCK request exists. The fact that the atime of the local file system is not used is that operations such as balance, block check, block copy and the like all affect the atime of the file, and misjudgment is caused.
A block is compressed only if it has not been accessed for a period of time and its compression is high (above a certain threshold).
In some examples, the method further comprises:
an out-of-band tool for human intervention, defined as the Compressor admin, is provided with policy suggestions for the Compressor Worker independent process to change the priority of any one compression task.
In addition to the access time of Block, considering that the user may need to compress based on the level of the file/directory, the user needs to prioritize certain blocks to be compressed more preferentially. There are two main strategies: suggested policies and decision policies. And (4) proposing a strategy, providing strategy suggestions for an external Compressor Worker, and influencing the priority of the compression task by each role according to the grasped information. The decision strategy is a factor that really decides whether to compress or not.
And (4) suggesting a strategy:
the priority of the Block compression task is adjusted by the Compressor Admin.
And (3) determining a strategy:
a DataNode: and recording the access time of the Block, and adjusting the priority of Block compression according to the access time. If atime is below a certain threshold, then no pressure is applied.
Compressor Worker: and testing the pressure of the Block, and determining whether the Block file is finally compressed or not according to the compression ratio of the Block file.
The third rule mentioned in the first problem, that is, the last block corresponding to the appendix file is not compressed. The specific description is as follows:
if the last block of the additionally written object file is compressed, when the object file is applied, the last block is decompressed to a tmp directory, then data is additionally written into the block, and when the file is closed, the file mv is returned to the current directory. After the file is additionally written, the database node is notified by the heartbeat from CW to the database node, and the flag is not compressed (update atime).
In further examples, with reference to fig. 4, the method further includes:
the following 3 kinds of information are added to the compressed block file name: the pre-compression length, the compression algorithm employed, and the file suffix cdata.
And adding the original checksum length and the file suffix cmeta in the file name of the meta file corresponding to the compressed file.
The data node stores the data structure mode as follows:
Figure GDA0003833779780000071
as shown in the table above, the method adds the following 3 kinds of information to the compressed data file name, the length before compression, the compression algorithm used, and the file suffix cdata. The file name of the meta file (the cmeta file) corresponding to the compressed file is increased by the original checksum length and the file suffix cmeta. The content in the compressed data file is compressed data. The Cmeta file structure is as follows:
check sum (original meta file content)
1st chunk post-compression offset (int)
2nd chunk pressurePost-reduction offset (int)
Post chunk compressed offset (int)
This design saves file data related information (codec type and pre-compression length) in the file path, which reduces the extra random disk access caused when reading compressed data compared to saving codec in a meta file or data file. In addition, when the data of the file and the meta are not matched, the data file can be independently decompressed, and check sum check is carried out according to the currently available meta.
Further, the method further comprises:
a READ BLOCK independent process is provided for reading the BLOCK file.
In response to a request instruction for reading any one of the BLOCK files sent by a user side, adopting a READ _ BLOCK independent process to inquire whether the BLOCK file is compressed or not:
1) And responding to the uncompressed block file, and directly sending the original data of the block file to the user side.
2) And in response to the compression of the partitioned file, decompressing the partitioned file at the DataNode terminal, and returning the decompressed original data and the corresponding checksum to the user terminal.
By modifying the READ _ BLOCK RPC method, when a client READs a data BLOCK using the READ _ BLOCK, if the corresponding BLOCK is already compressed, the BLOCK is compressed at the DataNode end, and the decompressed data is returned to the client, and for the client, the original data and the corresponding checksum are READ.
The method further comprises the following steps:
in the data reading process, each chunk is read, the data corresponding to the chunk is decompressed, the decompressed data is written into a corresponding file under a tmp directory, and an Index file is generated and stores the compressed offset corresponding to each chunk according to a fixed length.
In the data reading process, each chunk is read, the data corresponding to the chunk is decompressed, the decompressed data is written into a corresponding file under a tmp directory, and an index file is generated.
The Index file is located at the end of the cmeta file and has the following structure:
check sum (original meta file content)
1st chunk post-compression offset (int)
2nd chunk post-compression offset (int)
Post chunk compressed offset (int)
Index is stored in fixed length, one field per 8B. The compressed offset for each chunk is stored.
The process of random reading is illustrated:
FIG. 5 is a block after compression. The Chunk size is 256K, and the offset of each Chunk is stored in an index file. If seek is to the offset of 1.1M before compression, we find the offset 1M closest to 1.1M due to the fixed-length storage, and then seek is to the position of Index file offset 16B, read an int, which is the offset where 1M is located, and use this as the offset for starting reading.
The compressed data is sequentially read from the start offset and decompressed, and the position of 1.1M is found, and the data is read out.
In other embodiments, the method further comprises:
in step 3, in response to the heartbeat report sent by the Compressor Worker independent process, the number of local data read-write connections is checked, and in response to the number of connections exceeding a set connection threshold, the compression task allocation is reduced or stopped according to a set rule, and the read-write disk speed of part or all of the Compressor Worker Thread is limited.
Further, the method further comprises:
the Compressor worker independent process is set to idle level using cpu nic and io nic.
Resource control of the compression task needs to ensure that the compression task works in a relatively idle time period of the cluster as far as possible, and when a computing task runs at the node, the compression task is prevented from influencing the progress of the computing task.
In order to better realize resource isolation, the method allocates the executor of the compression task as a process, namely, compression Worker. The resource usage of the CW is limited in two ways:
when the DataNode distributes the block compression task, the number of local data read-write connections is considered, and when the number of the connections is large, the distribution speed of the block compression task is properly reduced.
The read and write disk speed of the Compressor Worker Thread is limited by the Compressor Worker process.
The CPU nic and the io nic set the Compressor process to idle level, so as to avoid the CW seizing the CPU/io when the calculation is busy at the operating system level.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A configurable transparent compression method in a distributed file system, comprising:
step 1, providing a DataNode independent process which is used as a distributed storage cluster data node and is used for storing file blocks in a local file form, and providing a Compressor worker independent process used for executing a compression task, wherein the Compressor worker independent process comprises a plurality of Compressor worker thread working threads, an independent Compressor worker thread working thread is distributed for each disk, and the Compressor worker thread working threads are used for executing the compression task distributed to the current disk;
step 2, a heartbeat report is periodically sent to a local DataNode independent process through a Compressor worker independent process so as to inform the DataNode independent process of the current compression task state;
step 3, responding to a heartbeat report sent by the Compressor worker independent process, checking whether the DataNode independent process has an idle Compressor worker thread, if so, calling the DataNode independent process, searching a to-be-compressed block file with the highest compression priority in a corresponding disk according to a set priority rule, and feeding back a compression task corresponding to the block file to the Compressor worker independent process as a return value;
step 4, responding to the received compression task of any one of the partitioned files, calling a Compressor worker independent process, distributing the compression task of the partitioned file to a Compressor worker thread of a corresponding disk, carrying out pressure test on the partitioned file by the Compressor worker thread, and estimating an expected compression ratio of the partitioned file:
1) In response to the fact that the expected compression ratio is larger than the set threshold value, calling a Compressor workerThread working thread to compress the block file according to the set compression rule;
2) Responding to the expected compression ratio smaller than or equal to the set threshold value, judging that the block file does not need to be compressed, sending the judgment result to the independent process of the DataNode in the next heartbeat report, and informing the independent process of the DataNode that the block file is not marked as the block file to be compressed any more;
the method comprises the steps that a Compressor worker independent process is set to an idle level through a cpu nic and an io nic, the Compressor worker is prevented from seizing cpu/io when the calculation is busy on the operating system level, the resource control of a compression task needs to ensure that the compression task works in a relatively idle time period of a cluster as far as possible, and when the calculation task runs at the node, the compression task is prevented from influencing the progress of the calculation task; when the DataNode distributes the block compression tasks, the number of local data read-write connections is considered, and when the number of the connections is large, the distribution speed of the block compression tasks is properly reduced; the read-write disk speed of the Compressor worker thread is limited by the Compressor worker independent process.
2. The configurable transparent compression method in a distributed file system according to claim 1, wherein said set compression rule is,
and compressing the chunk blocks of the block files with the compression task, writing the compressed data into a corresponding folder under the tmp directory every time one chunk is read, and generating an Index file.
3. The configurable transparent compression method in a distributed file system according to claim 1, wherein said set priority rule is,
adding a field access time for expressing access time to each block file in the independent progress of the DataNode, marking the block file with the last access time represented by the access time larger than a set time threshold as a block file to be compressed, and,
the earlier the access time indicated by the access time is, the higher the compression priority of the block file corresponding to the access time is.
4. A method of configurable transparent compression in a distributed file system as claimed in claim 1 or 3, wherein said set priority rules further comprise:
the last block file corresponding to the additional file is not compressed.
5. A method of configurable transparent compression in a distributed file system as claimed in claim 1 or 3, the method further comprising:
an out-of-band tool for human intervention, defined as the Compressor admin, is provided with policy suggestions for the Compressor Worker independent process to change the priority of any one compression task.
6. The configurable transparent compression method in a distributed file system as claimed in claim 1, wherein the method further comprises:
the following 3 kinds of information are added to the compressed block file name: the pre-compression length, the compression algorithm used, and the file suffix cdata;
and adding the original checksum length and the file suffix cmeta to the file name of the meta file corresponding to the compressed file.
7. The configurable transparent compression method in a distributed file system as claimed in claim 6, wherein the method further comprises:
providing a READ BLOCK independent process for reading the BLOCK file;
responding to a request instruction of reading any one BLOCK file from a user terminal, adopting a READ _ BLOCK independent process to inquire whether the BLOCK file is compressed or not,
1) Responding to the uncompressed block file, and directly sending the original data of the block file to the user side;
2) And in response to the compression of the block file, decompressing the block file at the DataNode terminal, and returning the decompressed original data and the corresponding checksum to the user terminal.
8. The configurable transparent compression method in a distributed file system of claim 2, the method further comprising:
in the data reading process, each chunk is read, the data corresponding to the chunk is decompressed, the decompressed data is written into a corresponding file under a tmp directory, and an index file is generated and stores the compressed offset corresponding to each chunk according to a fixed length.
CN201810002379.8A 2018-01-02 2018-01-02 Configurable transparent compression method in distributed file system Active CN108363719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810002379.8A CN108363719B (en) 2018-01-02 2018-01-02 Configurable transparent compression method in distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810002379.8A CN108363719B (en) 2018-01-02 2018-01-02 Configurable transparent compression method in distributed file system

Publications (2)

Publication Number Publication Date
CN108363719A CN108363719A (en) 2018-08-03
CN108363719B true CN108363719B (en) 2022-10-21

Family

ID=63011018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810002379.8A Active CN108363719B (en) 2018-01-02 2018-01-02 Configurable transparent compression method in distributed file system

Country Status (1)

Country Link
CN (1) CN108363719B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309269B (en) * 2020-02-28 2021-12-17 苏州浪潮智能科技有限公司 Method, system and equipment for dropping compressed data and readable storage medium
CN114333965B (en) * 2020-09-30 2023-09-08 长鑫存储技术有限公司 Memory and test method thereof
CN113270120B (en) * 2021-07-16 2022-02-18 北京金山云网络技术有限公司 Data compression method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009100149A (en) * 2007-10-16 2009-05-07 Brother Ind Ltd Data file compressing apparatus and data file compressing method
CN101901275A (en) * 2010-08-23 2010-12-01 华中科技大学 Distributed storage system and method thereof
CN101957836A (en) * 2010-09-03 2011-01-26 清华大学 Configurable real-time transparent compressing method in file system
CN103020205A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Compression and decompression method based on hardware accelerator card on distributive-type file system
CN103559020A (en) * 2013-11-07 2014-02-05 中国科学院软件研究所 Method for realizing parallel compression and parallel decompression on FASTQ file containing DNA (deoxyribonucleic acid) sequence read data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009100149A (en) * 2007-10-16 2009-05-07 Brother Ind Ltd Data file compressing apparatus and data file compressing method
CN101901275A (en) * 2010-08-23 2010-12-01 华中科技大学 Distributed storage system and method thereof
CN101957836A (en) * 2010-09-03 2011-01-26 清华大学 Configurable real-time transparent compressing method in file system
CN103020205A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Compression and decompression method based on hardware accelerator card on distributive-type file system
CN103559020A (en) * 2013-11-07 2014-02-05 中国科学院软件研究所 Method for realizing parallel compression and parallel decompression on FASTQ file containing DNA (deoxyribonucleic acid) sequence read data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
天玑大数据引擎及其应用;查礼 等;《集成技术》;20140715;第3卷(第4期);第18-30页 *

Also Published As

Publication number Publication date
CN108363719A (en) 2018-08-03

Similar Documents

Publication Publication Date Title
CN102385554B (en) Method for optimizing duplicated data deletion system
CN108363719B (en) Configurable transparent compression method in distributed file system
CN111522636B (en) Application container adjusting method, application container adjusting system, computer readable medium and terminal device
WO2023050712A1 (en) Task scheduling method for deep learning service, and related apparatus
CN108134609A (en) Multithreading compression and decompressing method and the device of a kind of conventional data gz forms
CN103412884A (en) Method for managing embedded database in isomerism storage media
CN101673271A (en) Distributed file system and file sharding method thereof
CN114968566A (en) Container scheduling method and device under shared GPU cluster
US20110202733A1 (en) System and/or method for reducing disk space usage and improving input/output performance of computer systems
CN111061752A (en) Data processing method and device and electronic equipment
CN110083306A (en) A kind of distributed objects storage system and storage method
CN107423425B (en) Method for quickly storing and inquiring data in K/V format
CN109213745B (en) Distributed file storage method, device, processor and storage medium
CN110096339B (en) System load-based capacity expansion and contraction configuration recommendation system and method
CN111949681A (en) Data aggregation processing device and method and storage medium
US20240070120A1 (en) Data processing method and apparatus
WO2021179170A1 (en) Data pushing method and device, server, and storage medium
CN116089414B (en) Time sequence database writing performance optimization method and device based on mass data scene
CN110990340A (en) Big data multi-level storage framework
CN114860449A (en) Data processing method, device, equipment and storage medium
CN114238481A (en) Distributed real-time data importing device
CN114625474A (en) Container migration method and device, electronic equipment and storage medium
CN112433889B (en) Log generation method and device based on FTL table
CN116383290B (en) Data generalization and analysis method
CN112783896B (en) Method for reducing memory usage rate by loading files

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant