KR101727508B1 - Apparatus and method for accelerating hardware compression based on hadoop - Google Patents

Apparatus and method for accelerating hardware compression based on hadoop Download PDF

Info

Publication number
KR101727508B1
KR101727508B1 KR1020150106457A KR20150106457A KR101727508B1 KR 101727508 B1 KR101727508 B1 KR 101727508B1 KR 1020150106457 A KR1020150106457 A KR 1020150106457A KR 20150106457 A KR20150106457 A KR 20150106457A KR 101727508 B1 KR101727508 B1 KR 101727508B1
Authority
KR
South Korea
Prior art keywords
hadoop
compression
data block
control module
size
Prior art date
Application number
KR1020150106457A
Other languages
Korean (ko)
Other versions
KR20170014042A (en
Inventor
장지훈
이승은
이현화
한재용
임동일
Original Assignee
서울과학기술대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 서울과학기술대학교 산학협력단 filed Critical 서울과학기술대학교 산학협력단
Priority to KR1020150106457A priority Critical patent/KR101727508B1/en
Priority to PCT/KR2015/008449 priority patent/WO2017018567A1/en
Publication of KR20170014042A publication Critical patent/KR20170014042A/en
Application granted granted Critical
Publication of KR101727508B1 publication Critical patent/KR101727508B1/en

Links

Images

Classifications

    • G06F17/30318
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30194
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Abstract

The present invention relates to a Hadoop-based hardware compression and acceleration apparatus. The present invention complements the performance of a low-power CPU by performing compression and decompression processes performed by Hadoop middleware in a low-power Hadoop storage appliance through hardware. To this end, the Hadoop-based hardware compression accelerating apparatus according to the present invention performs pre-registration and search with an input buffer for receiving a data block to be compressed or decompressed, and compresses the data block through a window And a control module for controlling the input buffer, the dictionary module, and the output buffer based on a dictionary module, an output buffer for outputting the result of performing the compression, and Hadoop storage appliance information.

Description

TECHNICAL FIELD [0001] The present invention relates to a Hadoop-based hardware compression accelerating apparatus and method,

The present invention relates to an apparatus and method for accelerating hardware compression for high-speed processing of low power Hadoop storage appliances.

Recently, Hadoop clusters have been used as a method for efficiently distributing Big Data.

Even with the Hadoop cluster, as the amount of data to be processed increases, a larger number of servers are required for data storage and analysis. The expansion of such a server causes a lot of power consumption in cluster operation, and the cost of cluster management is high.

Therefore, the need for low-power Hadoop storage appliances is emerging. The use of these low-power Hadoop storage appliances requires the use of low-power CPUs. However, there is no provision of an apparatus and method for compensating for the poor computing power of such a low-power CPU.

SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a Hadoop-based hardware compression and acceleration apparatus to an appliance.

More specifically, the present invention provides an apparatus and method for minimizing the time consumed in distributing and analyzing big data by performing data compression of the Hadoop system through the hardware compression / speedup device.

The Hadoop-based hardware compression and acceleration apparatus according to an embodiment of the present invention performs pre-registration and search with an input buffer for receiving a data block to be compressed or decompressed, and performs compression on the data block through a window And a control module for controlling the input buffer, the dictionary module, and the output buffer based on a dictionary module, an output buffer for outputting the result of performing the compression, and Hadoop storage appliance information.

The Hadoop-based hardware compression acceleration apparatus and method according to the present invention can utilize hardware parallelism to accelerate the pre-retrieval and registration process of the compression algorithm.

More specifically, Hadoop-based hardware compression accelerators and methods improve throughput over existing software compression through acceleration of pre-retrieval and registration processes.

In addition, according to the Hadoop-based hardware compression speed increasing apparatus and method according to the present invention, it is possible to manage a Hadoop cluster with a low-power Hadoop storage appliance at low cost by supplementing the insufficient computing power of a low-power CPU.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an illustration of a structure of a Hadoop-based hardware compression accelerator according to the present invention. FIG.
2 is a flowchart illustrating an operation of a Hadoop-based hardware compression accelerator according to the present invention;

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs. Also, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. The terms " comprises "and / or" comprising "used in the specification do not exclude the presence or addition of one or more other elements in addition to the stated element.

In this specification, an appliance means hardware such as a server or storage. The appliance may be an information device that is pre-installed with software and sold in a state optimized for a specific task. The user can use the appliance by connecting the power supply at the time of purchase without installing a separate program such as installation or setting of the integrated equipment operating system or application software.

In particular, the Hadoop storage appliance refers to an appliance that performs distributed data storage based on Hadoop.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an illustration of a Hadoop-based hardware compression and acceleration device in accordance with the present invention; FIG. Referring to FIG. 1, a Hadoop-based hardware compression and acceleration apparatus 100 may include an input buffer 10, a pre-module 20, an output buffer 30, and a control module 40.

The hardware compression and acceleration device 100 must be connected to the Hadoop storage appliance through an interface ensuring sufficient bandwidth. For example, the interface may be a PCIe 2.0 x 4 Lane. At this time, the Hadoop-based hardware compression / speedup device 100 can be implemented on, for example, an FPGA (Field Programmable Gate Array) or a SoC (System On Chip).

In addition, the Hadoop-based hardware compression / speedup device 100 may include a compression algorithm calculation circuit. At this time, the compression algorithm may be a dictionary-based lossless compression algorithm. For example, the compression algorithm may be an LZ4 compression algorithm. In addition, the size of a block processed by the LZ4 compression algorithm operation circuit may be 256 KB.

When the Hadoop-based hardware compression / speedup device 100 includes an extrusion algorithm operation circuit, the input buffer 10, the pre-module 20, the output buffer 30, and the control module 40 perform the extrusion algorithm operation And may be a component constituting a circuit.

The input buffer 10 receives a block of data to be compressed or decompressed from a MapReduce task. The data width is equal to the width of the bus used by the Hadoop storage appliance and the input buffer 10 sorts the data in the endian width of the pre-module window to process the data in the hardware compression accelerator . For example, when 32-bit data comes in from the Hadoop storage appliance into the input buffer, you can sort the data into 128-bit big-endian.

The endianness may be determined according to the usage environment of the hardware compression / In other words, it can be determined according to the kind of CPU applied to the Hadoop cluster. Alternatively, the endianness may be determined according to a setting of a user or a manufacturer.

The input buffer 10 may provide the aligned data to the dictionary module 20.

The dictionary module 20 may include a memory for storing a dictionary value such as an offset and a hash function for controlling a memory address. The dictionary module 20 may store at least one program that controls memory and memory addresses using a hash function.

The dictionary module 20 may also include logic and windows to perform pre-registration and searching. The window is the size at which compression is performed and is the unit in which parallel processing is performed. At this time, the size and the number of the used memory can be changed according to the hash function used. Also, the dictionary module 20 can simultaneously perform the pre-registration and the search process in parallel by the window size. The dictionary module 20 may perform a compression operation on the sorted data provided from the input buffer 10 using the window.

The output buffer 30 outputs the result of the compression by sorting the data according to the bus width and the endianness of the Hadoop storage appliance. First In First Out (FIFO) can be used to prevent the delay of the compression process caused by the overhead of outputting the compressed data.

The control module 40 can control the operation of each configuration of the Hadoop-based hardware compression / That is, the control module 30 can control signal processing of each component constituting the Hadoop-based hardware compression / acceleration device 100 and transmission / reception of data between the respective components.

 Since each step of the compression process is sequential, the control module 40 can sequentially control the FSM (Finite State Machine) according to the state.

The control module 40 may pre-store at least one extrusion algorithm for data compression. The control module 40 may determine any one of the pre-stored compression algorithms. At this time, the control module 40 can determine the compression algorithm based on the Hadoop storage appliance information. Hadoop storage appliance information can include information such as the number of Hadoop storage appliances in a Hadoop cluster, the available storage space for each Hadoop storage appliance, and the total storage space for all Hadoop storage appliances.

In addition, the control module 40 may measure the amount of computation performed by the Hadoop-based hardware compression / speedup device 100 during the data compression operation. Here, the amount of computation may include information on the operation speed, that is, the speed at which the compression or decompression operation is performed. The control module 40 may determine the compression algorithm based on the measured computation amount and the Hadoop storage appliance information.

That is, the control module 40 can change the compression algorithm when determining a compression algorithm, and accordingly, when the calculated amount of computation during the compression algorithm is less than the predetermined computation speed and throughput. In addition, when there is a change in the Hadoop storage appliance information, the control module 40 may change the compression algorithm.

In addition, the control module 40 may determine the size of the data block to which the input buffer 10 should be compressed or decompressed from the MapReduce task. The control module 40 measures the amount of computation according to the data compression operation, and determines the size of the block based on the calculated amount of computation. The control module 40 may use Hadoop storage appliance information for determining the block size.

The control module 40 can measure the amount of computation by measuring at least one of the compression result provided from the pre-module 20 to the output buffer 30 or the degree of data alignment output from the output buffer 30. [

In addition, the control module 40 may determine the size of the window based on the Hadoop storage appliance information and the measured computation amount.

2 is a flowchart illustrating an operation of a Hadoop-based hardware compression accelerator according to the present invention. The compression operation of the Hadoop-based hardware compression / acceleration apparatus 100 will be described in detail with reference to FIG.

Referring to FIG. 2, the Hadoop-based hardware compression and acceleration apparatus 100 receives data blocks for compression or decompression through the input buffer 10 (S210). Next, the Hadoop-based hardware compression and acceleration apparatus 100 may perform compression or decompression on the data block through the window on the pre-module 20 (S220).

The Hadoop-based hardware compression and acceleration apparatus 100 may output the result of compression or decompression through the output buffer 30 (S230). At this time, the control module 40 may measure the amount of computation for the data compression or decompression performed (S240). The control module 40 may change at least one of the size of the data block and the size of the window based on the measured computation amount and the Hadoop storage appliance information (S250).

Next, the Hadoop-based hardware compression / speedup device 100 may compress or decompress at least one of the size of the data block and the size of the window. At this time, the control module 40 may determine whether the calculated amount of computation satisfies a preset computation amount when the at least one of the compressed values is compressed (S260). As a result of the determination, if the predefined computation amount is satisfied, the Hadoop-based hardware compression / acceleration apparatus 100 can continue compression (S270).

On the other hand, if the calculated computation amount does not satisfy the predefined computation amount, the control module 40 returns to step S250 again, based on the computed computation amount and the Hadoop storage appliance information, at least the size of the data block and the size of the window You can change one. In particular, the control module 40 may change the at least one such that a predetermined amount of computation is satisfied. At this time, the predetermined calculation amount may be determined according to the setting of the user or the manufacturer.

Through this process, the Hadoop-based hardware compression / acceleration apparatus 100 can maintain an optimal data compression / decompression state.

On the other hand, since the constituent elements in Fig. 1 are merely classified according to function or operation, they may be classified according to other criteria. In addition, since the illustrated elements are not essential elements, they may not include some elements or may further include additional elements.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. , And are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art that other modifications based on the technical idea of the present invention may be practiced without departing from the scope of the invention disclosed herein.

Claims (9)

The input buffer that arranges the data to the endian width of the window of the dictionary module must be compressed or decompressed from the MapReduce task and the data width is equal to the width of the bus used by the Hadoop storage appliance A first step of receiving a data block;
The control module pre-stores a plurality of compression algorithms including an LZ 4 compression algorithm for compressing the data block based on the Hadoop storage appliance information included in the Hadoop cluster, and determines a second step;
A dictionary module including a memory for storing a dictionary value of an offset and a hash function for controlling a memory address simultaneously performs a pre-registration and a search in parallel by the size of a window and stores the data block received from the input buffer through a window A third step of performing compression according to the compression algorithm determined by the control module;
A fourth step of the control module measuring a computation amount for compression performed through the pre-module;
A fifth step of determining whether the computation amount measured by the control module satisfies a predetermined computation amount; And
And outputting the result of the compression in accordance with the bus width and endian of the Hadoop storage appliance according to FIFO (First In First Out)
The control module changes the compression algorithm on the basis of the Hadoop storage appliance information and the measured computation amount, and if the size of the data block and the size of the window are less than the predetermined computation amount, Size, and the pre-module compresses the data block with the modified compression algorithm and the changed size so that the control module performs the fifth step again,
Wherein the sixth step is performed after the pre-module in the third step completes performing compression on the data block when the calculated amount of computation in the fifth step satisfies the preset amount of computation,
The Hadoop storage appliance information includes total storage space information of the number of Hadoop storage appliances included in the Hadoop cluster, the available storage space of each of the Hadoop storage appliances, and the storage space of the Hadoop storage appliance
Wherein the control module determines a size of a data block to which the input buffer should be compressed or decompressed from the MapReduce task,
The amount of computation for compression is adjusted according to the compression algorithm change
Hadoop based hardware acceleration method.
delete The method according to claim 1,
The pre-
Performing compression on the data block according to the LZ 4 compression algorithm,
Hadoop based hardware acceleration method.
The method according to claim 1,
Wherein the size of the data block received by the input buffer is 256 KB,
Hadoop based hardware acceleration method.
delete delete delete delete delete
KR1020150106457A 2015-07-28 2015-07-28 Apparatus and method for accelerating hardware compression based on hadoop KR101727508B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020150106457A KR101727508B1 (en) 2015-07-28 2015-07-28 Apparatus and method for accelerating hardware compression based on hadoop
PCT/KR2015/008449 WO2017018567A1 (en) 2015-07-28 2015-08-12 Hadoop-based hardware compression acceleration device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150106457A KR101727508B1 (en) 2015-07-28 2015-07-28 Apparatus and method for accelerating hardware compression based on hadoop

Publications (2)

Publication Number Publication Date
KR20170014042A KR20170014042A (en) 2017-02-08
KR101727508B1 true KR101727508B1 (en) 2017-04-18

Family

ID=57886830

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150106457A KR101727508B1 (en) 2015-07-28 2015-07-28 Apparatus and method for accelerating hardware compression based on hadoop

Country Status (2)

Country Link
KR (1) KR101727508B1 (en)
WO (1) WO2017018567A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102195239B1 (en) 2019-11-29 2020-12-24 숭실대학교산학협력단 Method for data compression transmission considering bandwidth in hadoop cluster, recording medium and device for performing the method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102592785B1 (en) * 2021-06-02 2023-10-23 네이버 주식회사 Method, computer device, and computer program to provide individual data retrieval service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014105323A1 (en) 2012-12-28 2014-07-03 Apple Inc. Methods and apparatus for compressed and compacted virtual memory
US20140258650A1 (en) 2013-03-06 2014-09-11 Ab Initio Technology Llc Managing operations on stored data units
US20150172209A1 (en) 2013-12-12 2015-06-18 International Business Machines Corporation Resource over-subscription

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020205B (en) * 2012-12-05 2018-07-31 中科天玑数据科技股份有限公司 Compression/decompression method based on hardware accelerator card in a kind of distributed file system
US9342557B2 (en) * 2013-03-13 2016-05-17 Cloudera, Inc. Low latency query engine for Apache Hadoop
CN103729429A (en) * 2013-12-26 2014-04-16 浪潮电子信息产业股份有限公司 Hbase based compression method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014105323A1 (en) 2012-12-28 2014-07-03 Apple Inc. Methods and apparatus for compressed and compacted virtual memory
US20140258650A1 (en) 2013-03-06 2014-09-11 Ab Initio Technology Llc Managing operations on stored data units
US20150172209A1 (en) 2013-12-12 2015-06-18 International Business Machines Corporation Resource over-subscription

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102195239B1 (en) 2019-11-29 2020-12-24 숭실대학교산학협력단 Method for data compression transmission considering bandwidth in hadoop cluster, recording medium and device for performing the method

Also Published As

Publication number Publication date
WO2017018567A1 (en) 2017-02-02
KR20170014042A (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN107273331A (en) A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks
CN112514264A (en) Data compression method, data decompression method, related device, electronic equipment and system
CN102694554A (en) Data compression devices, operating methods thereof, and data processing apparatuses including the same
CN113572479B (en) Method and system for generating finite state entropy coding table
US10891082B2 (en) Methods for accelerating compression and apparatuses using the same
EP3330866A1 (en) Methods and apparatus for programmable integrated circuit coprocessor sector management
US10218358B2 (en) Methods and apparatus for unloading data from a configurable integrated circuit
KR101727508B1 (en) Apparatus and method for accelerating hardware compression based on hadoop
CN109075798B (en) Variable size symbol entropy-based data compression
US9319040B2 (en) Distributing multiplexing logic to remove multiplexor latency on the output path for variable clock cycle, delayed signals
JP5674954B2 (en) Stream data abnormality detection method and apparatus
US20220253668A1 (en) Data processing method and device, storage medium and electronic device
CN115941598A (en) Flow table semi-uninstalling method, device and medium
CN110688160B (en) Instruction pipeline processing method, system, equipment and computer storage medium
US20030005189A1 (en) Method for improving inline compression bandwidth for high speed buses
US11604738B2 (en) Device and method for data compression using a metadata cache
Shcherbakov et al. A parallel adaptive range coding compressor: algorithm, FPGA prototype, evaluation
US9197243B2 (en) Compression ratio for a compression engine
Daoud et al. Real-time Bitstream Decompression Scheme for FPGAs Reconfiguration
CN115904488A (en) Data transmission method, system, device and equipment
US9495304B2 (en) Address compression method, address decompression method, compressor, and decompressor
WO2017044128A1 (en) Averaging modules
US20240119022A1 (en) Hardware distributed architecture in a data transform accelerator
Choi et al. Energy efficient and low-cost server architecture for hadoop storage appliance
US20230325230A1 (en) Network functions virtualization platforms with function chaining capabilities

Legal Events

Date Code Title Description
A201 Request for examination
N231 Notification of change of applicant
GRNT Written decision to grant