KR101727508B1

KR101727508B1 - Apparatus and method for accelerating hardware compression based on hadoop

Info

Publication number: KR101727508B1
Application number: KR1020150106457A
Authority: KR
Inventors: 장지훈; 이승은; 이현화; 한재용; 임동일
Original assignee: 서울과학기술대학교 산학협력단
Priority date: 2015-07-28
Filing date: 2015-07-28
Publication date: 2017-04-18
Also published as: WO2017018567A1; KR20170014042A

Abstract

The present invention relates to a Hadoop-based hardware compression and acceleration apparatus. The present invention complements the performance of a low-power CPU by performing compression and decompression processes performed by Hadoop middleware in a low-power Hadoop storage appliance through hardware. To this end, the Hadoop-based hardware compression accelerating apparatus according to the present invention performs pre-registration and search with an input buffer for receiving a data block to be compressed or decompressed, and compresses the data block through a window And a control module for controlling the input buffer, the dictionary module, and the output buffer based on a dictionary module, an output buffer for outputting the result of performing the compression, and Hadoop storage appliance information.

Description

TECHNICAL FIELD [0001] The present invention relates to a Hadoop-based hardware compression accelerating apparatus and method,

The present invention relates to an apparatus and method for accelerating hardware compression for high-speed processing of low power Hadoop storage appliances.

Recently, Hadoop clusters have been used as a method for efficiently distributing Big Data.

Even with the Hadoop cluster, as the amount of data to be processed increases, a larger number of servers are required for data storage and analysis. The expansion of such a server causes a lot of power consumption in cluster operation, and the cost of cluster management is high.

Therefore, the need for low-power Hadoop storage appliances is emerging. The use of these low-power Hadoop storage appliances requires the use of low-power CPUs. However, there is no provision of an apparatus and method for compensating for the poor computing power of such a low-power CPU.

SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a Hadoop-based hardware compression and acceleration apparatus to an appliance.

More specifically, the present invention provides an apparatus and method for minimizing the time consumed in distributing and analyzing big data by performing data compression of the Hadoop system through the hardware compression / speedup device.

The Hadoop-based hardware compression and acceleration apparatus according to an embodiment of the present invention performs pre-registration and search with an input buffer for receiving a data block to be compressed or decompressed, and performs compression on the data block through a window And a control module for controlling the input buffer, the dictionary module, and the output buffer based on a dictionary module, an output buffer for outputting the result of performing the compression, and Hadoop storage appliance information.

The Hadoop-based hardware compression acceleration apparatus and method according to the present invention can utilize hardware parallelism to accelerate the pre-retrieval and registration process of the compression algorithm.

More specifically, Hadoop-based hardware compression accelerators and methods improve throughput over existing software compression through acceleration of pre-retrieval and registration processes.

In addition, according to the Hadoop-based hardware compression speed increasing apparatus and method according to the present invention, it is possible to manage a Hadoop cluster with a low-power Hadoop storage appliance at low cost by supplementing the insufficient computing power of a low-power CPU.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an illustration of a structure of a Hadoop-based hardware compression accelerator according to the present invention. FIG.
2 is a flowchart illustrating an operation of a Hadoop-based hardware compression accelerator according to the present invention;

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs. Also, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. The terms " comprises "and / or" comprising "used in the specification do not exclude the presence or addition of one or more other elements in addition to the stated element.

In this specification, an appliance means hardware such as a server or storage. The appliance may be an information device that is pre-installed with software and sold in a state optimized for a specific task. The user can use the appliance by connecting the power supply at the time of purchase without installing a separate program such as installation or setting of the integrated equipment operating system or application software.

In particular, the Hadoop storage appliance refers to an appliance that performs distributed data storage based on Hadoop.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an illustration of a Hadoop-based hardware compression and acceleration device in accordance with the present invention; FIG. Referring to FIG. 1, a Hadoop-based hardware compression and acceleration apparatus 100 may include an input buffer 10, a pre-module 20, an output buffer 30, and a control module 40.

The hardware compression and acceleration device 100 must be connected to the Hadoop storage appliance through an interface ensuring sufficient bandwidth. For example, the interface may be a PCIe 2.0 x 4 Lane. At this time, the Hadoop-based hardware compression / speedup device 100 can be implemented on, for example, an FPGA (Field Programmable Gate Array) or a SoC (System On Chip).

In addition, the Hadoop-based hardware compression / speedup device 100 may include a compression algorithm calculation circuit. At this time, the compression algorithm may be a dictionary-based lossless compression algorithm. For example, the compression algorithm may be an LZ4 compression algorithm. In addition, the size of a block processed by the LZ4 compression algorithm operation circuit may be 256 KB.

When the Hadoop-based hardware compression / speedup device 100 includes an extrusion algorithm operation circuit, the input buffer 10, the pre-module 20, the output buffer 30, and the control module 40 perform the extrusion algorithm operation And may be a component constituting a circuit.

The input buffer 10 receives a block of data to be compressed or decompressed from a MapReduce task. The data width is equal to the width of the bus used by the Hadoop storage appliance and the input buffer 10 sorts the data in the endian width of the pre-module window to process the data in the hardware compression accelerator . For example, when 32-bit data comes in from the Hadoop storage appliance into the input buffer, you can sort the data into 128-bit big-endian.

The endianness may be determined according to the usage environment of the hardware compression / In other words, it can be determined according to the kind of CPU applied to the Hadoop cluster. Alternatively, the endianness may be determined according to a setting of a user or a manufacturer.

The input buffer 10 may provide the aligned data to the dictionary module 20.

The dictionary module 20 may include a memory for storing a dictionary value such as an offset and a hash function for controlling a memory address. The dictionary module 20 may store at least one program that controls memory and memory addresses using a hash function.

The dictionary module 20 may also include logic and windows to perform pre-registration and searching. The window is the size at which compression is performed and is the unit in which parallel processing is performed. At this time, the size and the number of the used memory can be changed according to the hash function used. Also, the dictionary module 20 can simultaneously perform the pre-registration and the search process in parallel by the window size. The dictionary module 20 may perform a compression operation on the sorted data provided from the input buffer 10 using the window.

The output buffer 30 outputs the result of the compression by sorting the data according to the bus width and the endianness of the Hadoop storage appliance. First In First Out (FIFO) can be used to prevent the delay of the compression process caused by the overhead of outputting the compressed data.

The control module 40 can control the operation of each configuration of the Hadoop-based hardware compression / That is, the control module 30 can control signal processing of each component constituting the Hadoop-based hardware compression / acceleration device 100 and transmission / reception of data between the respective components.

Since each step of the compression process is sequential, the control module 40 can sequentially control the FSM (Finite State Machine) according to the state.

The control module 40 may pre-store at least one extrusion algorithm for data compression. The control module 40 may determine any one of the pre-stored compression algorithms. At this time, the control module 40 can determine the compression algorithm based on the Hadoop storage appliance information. Hadoop storage appliance information can include information such as the number of Hadoop storage appliances in a Hadoop cluster, the available storage space for each Hadoop storage appliance, and the total storage space for all Hadoop storage appliances.

In addition, the control module 40 may measure the amount of computation performed by the Hadoop-based hardware compression / speedup device 100 during the data compression operation. Here, the amount of computation may include information on the operation speed, that is, the speed at which the compression or decompression operation is performed. The control module 40 may determine the compression algorithm based on the measured computation amount and the Hadoop storage appliance information.

That is, the control module 40 can change the compression algorithm when determining a compression algorithm, and accordingly, when the calculated amount of computation during the compression algorithm is less than the predetermined computation speed and throughput. In addition, when there is a change in the Hadoop storage appliance information, the control module 40 may change the compression algorithm.

In addition, the control module 40 may determine the size of the data block to which the input buffer 10 should be compressed or decompressed from the MapReduce task. The control module 40 measures the amount of computation according to the data compression operation, and determines the size of the block based on the calculated amount of computation. The control module 40 may use Hadoop storage appliance information for determining the block size.

The control module 40 can measure the amount of computation by measuring at least one of the compression result provided from the pre-module 20 to the output buffer 30 or the degree of data alignment output from the output buffer 30. [

In addition, the control module 40 may determine the size of the window based on the Hadoop storage appliance information and the measured computation amount.

2 is a flowchart illustrating an operation of a Hadoop-based hardware compression accelerator according to the present invention. The compression operation of the Hadoop-based hardware compression / acceleration apparatus 100 will be described in detail with reference to FIG.

Referring to FIG. 2, the Hadoop-based hardware compression and acceleration apparatus 100 receives data blocks for compression or decompression through the input buffer 10 (S210). Next, the Hadoop-based hardware compression and acceleration apparatus 100 may perform compression or decompression on the data block through the window on the pre-module 20 (S220).

The Hadoop-based hardware compression and acceleration apparatus 100 may output the result of compression or decompression through the output buffer 30 (S230). At this time, the control module 40 may measure the amount of computation for the data compression or decompression performed (S240). The control module 40 may change at least one of the size of the data block and the size of the window based on the measured computation amount and the Hadoop storage appliance information (S250).

Next, the Hadoop-based hardware compression / speedup device 100 may compress or decompress at least one of the size of the data block and the size of the window. At this time, the control module 40 may determine whether the calculated amount of computation satisfies a preset computation amount when the at least one of the compressed values is compressed (S260). As a result of the determination, if the predefined computation amount is satisfied, the Hadoop-based hardware compression / acceleration apparatus 100 can continue compression (S270).

On the other hand, if the calculated computation amount does not satisfy the predefined computation amount, the control module 40 returns to step S250 again, based on the computed computation amount and the Hadoop storage appliance information, at least the size of the data block and the size of the window You can change one. In particular, the control module 40 may change the at least one such that a predetermined amount of computation is satisfied. At this time, the predetermined calculation amount may be determined according to the setting of the user or the manufacturer.

Through this process, the Hadoop-based hardware compression / acceleration apparatus 100 can maintain an optimal data compression / decompression state.

On the other hand, since the constituent elements in Fig. 1 are merely classified according to function or operation, they may be classified according to other criteria. In addition, since the illustrated elements are not essential elements, they may not include some elements or may further include additional elements.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. , And are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art that other modifications based on the technical idea of the present invention may be practiced without departing from the scope of the invention disclosed herein.

Claims

The input buffer that arranges the data to the endian width of the window of the dictionary module must be compressed or decompressed from the MapReduce task and the data width is equal to the width of the bus used by the Hadoop storage appliance A first step of receiving a data block;
The control module pre-stores a plurality of compression algorithms including an LZ 4 compression algorithm for compressing the data block based on the Hadoop storage appliance information included in the Hadoop cluster, and determines a second step;
A dictionary module including a memory for storing a dictionary value of an offset and a hash function for controlling a memory address simultaneously performs a pre-registration and a search in parallel by the size of a window and stores the data block received from the input buffer through a window A third step of performing compression according to the compression algorithm determined by the control module;
A fourth step of the control module measuring a computation amount for compression performed through the pre-module;
A fifth step of determining whether the computation amount measured by the control module satisfies a predetermined computation amount; And
And outputting the result of the compression in accordance with the bus width and endian of the Hadoop storage appliance according to FIFO (First In First Out)
The control module changes the compression algorithm on the basis of the Hadoop storage appliance information and the measured computation amount, and if the size of the data block and the size of the window are less than the predetermined computation amount, Size, and the pre-module compresses the data block with the modified compression algorithm and the changed size so that the control module performs the fifth step again,
Wherein the sixth step is performed after the pre-module in the third step completes performing compression on the data block when the calculated amount of computation in the fifth step satisfies the preset amount of computation,
The Hadoop storage appliance information includes total storage space information of the number of Hadoop storage appliances included in the Hadoop cluster, the available storage space of each of the Hadoop storage appliances, and the storage space of the Hadoop storage appliance
Wherein the control module determines a size of a data block to which the input buffer should be compressed or decompressed from the MapReduce task,
The amount of computation for compression is adjusted according to the compression algorithm change
Hadoop based hardware acceleration method.

delete

The method according to claim 1,
The pre-
Performing compression on the data block according to the LZ 4 compression algorithm,
Hadoop based hardware acceleration method.

The method according to claim 1,
Wherein the size of the data block received by the input buffer is 256 KB,
Hadoop based hardware acceleration method.

delete