CN108804040B

CN108804040B - Hadoop map-reduce calculation acceleration method based on kernel bypass technology

Info

Publication number: CN108804040B
Application number: CN201810568335.1A
Authority: CN
Inventors: 赵继胜; 吴宇
Original assignee: Shanghai Fudian Intelligent Technology Co ltd
Current assignee: SHANGHAI FUDIAN INTELLIGENT TECHNOLOGY Co.,Ltd.
Priority date: 2018-06-05
Filing date: 2018-06-05
Publication date: 2020-07-07
Anticipated expiration: 2038-06-05
Also published as: CN108804040A

Abstract

The invention provides a hadoop map-reduce operation acceleration method based on kernel bypass technology (kernel bypass), which comprises the following steps: 1. the read-write speed of the Solid State Disk (SSD) is improved through the kernel bypass technology, and 2, the network is read and written at high speed through the kernel bypass technology. The two high-speed reading and writing technologies can respectively accelerate cache reading and writing of the hadoop map-reduce and network I/O. The Shuffle process is a main process (see the left side of an abstract figure) for consuming cache resources and network bandwidth in map-reduce calculation, and the performance of the data processing process (see the right side of the abstract figure) is effectively improved by integrating the two high-speed reading and writing technologies. Because the map-reduce calculation consists of a plurality of iterations, and each iteration comprises a shuffle process, tuning the performance of the shuffle brings remarkable performance improvement to the whole map-reduce calculation process.

Description

Hadoop map-reduce calculation acceleration method based on kernel bypass technology

Technical Field

The invention belongs to the technical field of information, and particularly relates to an I/O performance optimization method based on an operating system kernel bypass technology, which is mainly used for improving the operational performance of hadoop map-reduce.

Background

Apache Hadoop as an operation engine for big data processing is widely applied to the fields of enterprises, education, scientific research and the like. As an operation engine for parallel processing, Hadoop is simple and intuitive in a program development model and has good fault-tolerant capability, so that Hadoop can be quickly developed and applied and deployed on massive operation nodes, and the production rate of large data application development is greatly improved. Various software frameworks using Hadoop as an operation engine are also rapidly developed, and for example, Spark, Hive, Mahout and the like cover wide application fields from a distributed data warehouse to machine learning and the like. Hadoop is becoming an increasingly important industry standard for large data and parallel processing.

In the face of more and more application expansion, as a computing engine, Hadoop inevitably faces the technical pressure of performance improvement, so that the performance optimization technology for a Hadoop operation model is continuously explored and researched in the industry and the academia. In this patent, we propose to use Intel NVMe [1] protocol to improve the performance of cache and network I/O in the Hadoop operation process in a kernel bypass manner, thereby improving the overall efficiency of Hadoop map-reduce based operation.

Disclosure of Invention

Aiming at a Hadoop map-reduce operation frame, the patent aims to provide a method for improving the performance of a shuffle process in the Hadoop map-reduce process, so that the overall performance of the Hadoop map-reduce operation is improved.

In order to achieve the purpose, the invention provides a method for improving the I/O performance based on kernel bypass and an Intel NVMe protocol. The hadoop is used for developing big data application, mainly by utilizing the distributed parallel processing capacity of the hadoop, and the distributed parallel processing of the hadoop is mainly based on a map-reduce operation model. The map-reduce consists of the following 3 steps:

map Process: the computing task is divided according to data and is placed on different distributed computing nodes (such as an x86 server node), and multiple nodes perform parallel computing;

2, a Shuffle process, in which operation result data of the map process is stored in a local storage medium (a mechanical disk or a solid-state disk SSD), and then the data is sent to other nodes in a Shuffle form to perform a reduce process (see the map Shuffle in fig. 1);

and 3, a Reduce process, namely summarizing the data sent by each node by a Reduce calculation formula (such as accumulation, multiplication and the like), and finally outputting a result (see the Reduce of the attached figure 1).

The performance improvement made by this patent is: the optimization is performed for the two I/O operations in the shuffle process, which are written to the local storage medium and distribute data to different nodes for the reduce process.

For the improvement of the read-write performance of a storage medium, the efficient read-write of the solid-state disk SSD is carried out by utilizing the read-write mode of NVMe equipment with kernel bypass, so that the additional delay and the memory occupation caused by the kernel of an operating system are avoided;

for improving the network transmission performance, the efficient network data transmission is carried out by using an IP network communication mode based on the NVMe protocol by using the kernel bypass, so that the access to the kernel of an operating system in the traditional TCP protocol stack is avoided.

In the implementation part of the present invention, we describe in detail how to perform efficient NVMe-based read/write performance enhancement on SSD and IP networks by using SPDK [2] and DPDK [3] function libraries (with software packages and drivers), respectively

Drawings

FIG. 1 is a basic illustration of the hadoop map-reduce process, in which the storage medium is SSD

FIG. 2 is a comparison illustration of improving I/O performance of storage by core bypass (the left diagram shows a common storage medium access mode, and the right diagram shows a medium access mode by core bypass)

FIG. 3 is a comparison illustration of improving network I/O performance by using kernel bypass technology (the left diagram is a general storage medium access mode, and the right diagram is a medium access mode using kernel bypass technology)

Citations of documents

[1]http://nvmexpress.org/wp-content/uploads/2013/04/NVM_ whitepaper.pdf

[2]https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-develo pment-kit-spdk

[3]https://software.intel.com/en-us/networking/dpdk

Detailed Description

As shown in fig. 1, the present invention is specifically implemented as follows:

1. the whole I/O system optimization process introduced by the patent comprises I/O optimization of map result data written into a storage medium and network I/O system optimization of shuffle data distributed to a reduce process, which are carried out under a hadoop framework. By replacing part of the original I/O function of the hadoop, the I/O acceleration function based on the kernel bypass and the NVMe drive is integrated into the hadoop, so that the performance is improved in the map-reduce calculation process (see attached figures 2 and 3);

2. the optimization process of the I/O system of the patent comprises the following steps;

a. in the hadoop operation initialization stage, detecting whether the distributed system supports NVMe driving, if not, using a standard hadoop I/O mode, wherein pseudo codes are as follows;

b. a core bypass data read-write mechanism (see figure 2) realized by SPDK is adopted in the process of writing map results into storage, codes do not need to be modified, and only a rear-end module for storing I/O needs to be replaced by the implementation supporting core bypass operation realized by SPDK;

c. when data is distributed in the shuffle process, a kernel bypass network I/O mechanism (see figure 3) realized by DPDK is adopted;

d. in the processes of b and c, if I/O errors occur, retry is carried out, the retry number depends on a preset threshold value, and I/O exception handling provided by hadoop is carried out after the retry number is exceeded

3. And (3) data exchange processing: the hadoop engine is based on Java, and for the use of the C/C + + -based SPDK/DPDK function library, data exchange (through Java Native Interface (JNI)) needs to be performed between a Java virtual machine and a heap memory of a Linux process, so that the hadoop can write data into the cache of the SPDK/DPDK and perform further I/O, and a special data conversion module is provided for the hadoop. See pseudo code for data read and write mechanism as follows (taking SSD write as an example):

4. error detection and handling strategy: the read-write operation of the SSD and the IP network is realized by the SPDK/DPDK function library layer realized by C language. Two layers of processing strategies are adopted for possible I/O faults (such as read-write timeout or device state errors): multiple attempts are made at the C language level of calling the SPDK/DPDK function library, when the trial and error exceeds a preset threshold, the error is returned to the Java operation layer in a Java exception mode, the mode avoids extra burden brought by returning the Java operation layer for multiple times in the trial and error process, and the pseudo code is as follows:

Claims

1. a Hadoopmap-Reduce calculation acceleration method based on a kernel bypass technology comprises the following steps:

step one, in a hadoop operation initialization stage, detecting whether a distributed system supports NVMe driving, and if not, using a standard hadoop I/O mode;

step two, adopting a kernel bypass data read-write mechanism realized by SPDK in the process of writing map results into storage, wherein codes do not need to be modified; only the back-end module for storing I/O needs to be replaced by the implementation supporting the core bypass operation realized by SPDK;

thirdly, a kernel bypass network I/O mechanism realized by DPDK is adopted when data are distributed in the shuffle process;

step four, in the step two and the step three, if I/O errors occur, retry is carried out, the retry number depends on a preset threshold value, and I/O exception processing provided by hadoop is carried out after the retry number is exceeded;

the I/O exception handling includes: the read-write operation of the SSD and the IP network is realized in an SPDK/DPDK function library layer realized by C language, and two layers of processing strategies are adopted for the I/O abnormity: and (4) carrying out multiple attempts at the C language level of calling the SPDK/DPDK function library, and returning the error to the Java operation layer in the form of Java exception when the trial and error exceeds a preset threshold.

2. The HadoopMap-Reduce computation acceleration method based on kernel bypass technique as in claim 1, characterized by establishing an efficient storage and network I/O mechanism.

3. The HadoopMap-Reduce computation acceleration method based on kernel bypass technology as claimed in claim 1, characterized in that a data caching and distribution mechanism for device I/O in kernel bypass mode based on Intel NVMe driver is established.

4. The HadoopMap-Reduce computation acceleration method based on kernel bypass technique according to claim 1, characterized in that Apache Hadoop is extended in the way of software plug-in.

5. The Hadoop map-Reduce calculation acceleration method based on kernel bypass technology as claimed in claim 1, wherein I/O error processing is divided into two layers, first, trial and error is performed on the drive processing layer, and when the number of trial and error exceeds a preset threshold, a Java-based exception processing layer of Hadoop is returned.

6. The HadoopMap-Reduce computational acceleration method based on kernel bypass technique as claimed in claim 5, characterized in that the I/O acceleration is implemented using SPDK and DPDK function libraries of open source code provided by Intel.

7. The Hadoop Map-Reduce computation acceleration method based on kernel bypass technology as claimed in claim 5, characterized in that the extended Apache Hadoop is implemented by adopting an extended software interface to add support for device I/O in kernel bypass mode based on Intel NVMe driver.