CN108804040B - Hadoop map-reduce calculation acceleration method based on kernel bypass technology - Google Patents
Hadoop map-reduce calculation acceleration method based on kernel bypass technology Download PDFInfo
- Publication number
- CN108804040B CN108804040B CN201810568335.1A CN201810568335A CN108804040B CN 108804040 B CN108804040 B CN 108804040B CN 201810568335 A CN201810568335 A CN 201810568335A CN 108804040 B CN108804040 B CN 108804040B
- Authority
- CN
- China
- Prior art keywords
- hadoop
- kernel bypass
- method based
- map
- reduce
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a hadoop map-reduce operation acceleration method based on kernel bypass technology (kernel bypass), which comprises the following steps: 1. the read-write speed of the Solid State Disk (SSD) is improved through the kernel bypass technology, and 2, the network is read and written at high speed through the kernel bypass technology. The two high-speed reading and writing technologies can respectively accelerate cache reading and writing of the hadoop map-reduce and network I/O. The Shuffle process is a main process (see the left side of an abstract figure) for consuming cache resources and network bandwidth in map-reduce calculation, and the performance of the data processing process (see the right side of the abstract figure) is effectively improved by integrating the two high-speed reading and writing technologies. Because the map-reduce calculation consists of a plurality of iterations, and each iteration comprises a shuffle process, tuning the performance of the shuffle brings remarkable performance improvement to the whole map-reduce calculation process.
Description
Technical Field
The invention belongs to the technical field of information, and particularly relates to an I/O performance optimization method based on an operating system kernel bypass technology, which is mainly used for improving the operational performance of hadoop map-reduce.
Background
Apache Hadoop as an operation engine for big data processing is widely applied to the fields of enterprises, education, scientific research and the like. As an operation engine for parallel processing, Hadoop is simple and intuitive in a program development model and has good fault-tolerant capability, so that Hadoop can be quickly developed and applied and deployed on massive operation nodes, and the production rate of large data application development is greatly improved. Various software frameworks using Hadoop as an operation engine are also rapidly developed, and for example, Spark, Hive, Mahout and the like cover wide application fields from a distributed data warehouse to machine learning and the like. Hadoop is becoming an increasingly important industry standard for large data and parallel processing.
In the face of more and more application expansion, as a computing engine, Hadoop inevitably faces the technical pressure of performance improvement, so that the performance optimization technology for a Hadoop operation model is continuously explored and researched in the industry and the academia. In this patent, we propose to use Intel NVMe [1] protocol to improve the performance of cache and network I/O in the Hadoop operation process in a kernel bypass manner, thereby improving the overall efficiency of Hadoop map-reduce based operation.
Disclosure of Invention
Aiming at a Hadoop map-reduce operation frame, the patent aims to provide a method for improving the performance of a shuffle process in the Hadoop map-reduce process, so that the overall performance of the Hadoop map-reduce operation is improved.
In order to achieve the purpose, the invention provides a method for improving the I/O performance based on kernel bypass and an Intel NVMe protocol. The hadoop is used for developing big data application, mainly by utilizing the distributed parallel processing capacity of the hadoop, and the distributed parallel processing of the hadoop is mainly based on a map-reduce operation model. The map-reduce consists of the following 3 steps:
map Process: the computing task is divided according to data and is placed on different distributed computing nodes (such as an x86 server node), and multiple nodes perform parallel computing;
2, a Shuffle process, in which operation result data of the map process is stored in a local storage medium (a mechanical disk or a solid-state disk SSD), and then the data is sent to other nodes in a Shuffle form to perform a reduce process (see the map Shuffle in fig. 1);
and 3, a Reduce process, namely summarizing the data sent by each node by a Reduce calculation formula (such as accumulation, multiplication and the like), and finally outputting a result (see the Reduce of the attached figure 1).
The performance improvement made by this patent is: the optimization is performed for the two I/O operations in the shuffle process, which are written to the local storage medium and distribute data to different nodes for the reduce process.
For the improvement of the read-write performance of a storage medium, the efficient read-write of the solid-state disk SSD is carried out by utilizing the read-write mode of NVMe equipment with kernel bypass, so that the additional delay and the memory occupation caused by the kernel of an operating system are avoided;
for improving the network transmission performance, the efficient network data transmission is carried out by using an IP network communication mode based on the NVMe protocol by using the kernel bypass, so that the access to the kernel of an operating system in the traditional TCP protocol stack is avoided.
In the implementation part of the present invention, we describe in detail how to perform efficient NVMe-based read/write performance enhancement on SSD and IP networks by using SPDK [2] and DPDK [3] function libraries (with software packages and drivers), respectively
Drawings
FIG. 1 is a basic illustration of the hadoop map-reduce process, in which the storage medium is SSD
FIG. 2 is a comparison illustration of improving I/O performance of storage by core bypass (the left diagram shows a common storage medium access mode, and the right diagram shows a medium access mode by core bypass)
FIG. 3 is a comparison illustration of improving network I/O performance by using kernel bypass technology (the left diagram is a general storage medium access mode, and the right diagram is a medium access mode using kernel bypass technology)
Citations of documents
[1]http://nvmexpress.org/wp-content/uploads/2013/04/NVM_ whitepaper.pdf
[2]https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-develo pment-kit-spdk
[3]https://software.intel.com/en-us/networking/dpdk
Detailed Description
As shown in fig. 1, the present invention is specifically implemented as follows:
1. the whole I/O system optimization process introduced by the patent comprises I/O optimization of map result data written into a storage medium and network I/O system optimization of shuffle data distributed to a reduce process, which are carried out under a hadoop framework. By replacing part of the original I/O function of the hadoop, the I/O acceleration function based on the kernel bypass and the NVMe drive is integrated into the hadoop, so that the performance is improved in the map-reduce calculation process (see attached figures 2 and 3);
2. the optimization process of the I/O system of the patent comprises the following steps;
a. in the hadoop operation initialization stage, detecting whether the distributed system supports NVMe driving, if not, using a standard hadoop I/O mode, wherein pseudo codes are as follows;
b. a core bypass data read-write mechanism (see figure 2) realized by SPDK is adopted in the process of writing map results into storage, codes do not need to be modified, and only a rear-end module for storing I/O needs to be replaced by the implementation supporting core bypass operation realized by SPDK;
c. when data is distributed in the shuffle process, a kernel bypass network I/O mechanism (see figure 3) realized by DPDK is adopted;
d. in the processes of b and c, if I/O errors occur, retry is carried out, the retry number depends on a preset threshold value, and I/O exception handling provided by hadoop is carried out after the retry number is exceeded
3. And (3) data exchange processing: the hadoop engine is based on Java, and for the use of the C/C + + -based SPDK/DPDK function library, data exchange (through Java Native Interface (JNI)) needs to be performed between a Java virtual machine and a heap memory of a Linux process, so that the hadoop can write data into the cache of the SPDK/DPDK and perform further I/O, and a special data conversion module is provided for the hadoop. See pseudo code for data read and write mechanism as follows (taking SSD write as an example):
4. error detection and handling strategy: the read-write operation of the SSD and the IP network is realized by the SPDK/DPDK function library layer realized by C language. Two layers of processing strategies are adopted for possible I/O faults (such as read-write timeout or device state errors): multiple attempts are made at the C language level of calling the SPDK/DPDK function library, when the trial and error exceeds a preset threshold, the error is returned to the Java operation layer in a Java exception mode, the mode avoids extra burden brought by returning the Java operation layer for multiple times in the trial and error process, and the pseudo code is as follows:
Claims (7)
1. a Hadoopmap-Reduce calculation acceleration method based on a kernel bypass technology comprises the following steps:
step one, in a hadoop operation initialization stage, detecting whether a distributed system supports NVMe driving, and if not, using a standard hadoop I/O mode;
step two, adopting a kernel bypass data read-write mechanism realized by SPDK in the process of writing map results into storage, wherein codes do not need to be modified; only the back-end module for storing I/O needs to be replaced by the implementation supporting the core bypass operation realized by SPDK;
thirdly, a kernel bypass network I/O mechanism realized by DPDK is adopted when data are distributed in the shuffle process;
step four, in the step two and the step three, if I/O errors occur, retry is carried out, the retry number depends on a preset threshold value, and I/O exception processing provided by hadoop is carried out after the retry number is exceeded;
the I/O exception handling includes: the read-write operation of the SSD and the IP network is realized in an SPDK/DPDK function library layer realized by C language, and two layers of processing strategies are adopted for the I/O abnormity: and (4) carrying out multiple attempts at the C language level of calling the SPDK/DPDK function library, and returning the error to the Java operation layer in the form of Java exception when the trial and error exceeds a preset threshold.
2. The HadoopMap-Reduce computation acceleration method based on kernel bypass technique as in claim 1, characterized by establishing an efficient storage and network I/O mechanism.
3. The HadoopMap-Reduce computation acceleration method based on kernel bypass technology as claimed in claim 1, characterized in that a data caching and distribution mechanism for device I/O in kernel bypass mode based on Intel NVMe driver is established.
4. The HadoopMap-Reduce computation acceleration method based on kernel bypass technique according to claim 1, characterized in that Apache Hadoop is extended in the way of software plug-in.
5. The Hadoop map-Reduce calculation acceleration method based on kernel bypass technology as claimed in claim 1, wherein I/O error processing is divided into two layers, first, trial and error is performed on the drive processing layer, and when the number of trial and error exceeds a preset threshold, a Java-based exception processing layer of Hadoop is returned.
6. The HadoopMap-Reduce computational acceleration method based on kernel bypass technique as claimed in claim 5, characterized in that the I/O acceleration is implemented using SPDK and DPDK function libraries of open source code provided by Intel.
7. The Hadoop Map-Reduce computation acceleration method based on kernel bypass technology as claimed in claim 5, characterized in that the extended Apache Hadoop is implemented by adopting an extended software interface to add support for device I/O in kernel bypass mode based on Intel NVMe driver.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810568335.1A CN108804040B (en) | 2018-06-05 | 2018-06-05 | Hadoop map-reduce calculation acceleration method based on kernel bypass technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810568335.1A CN108804040B (en) | 2018-06-05 | 2018-06-05 | Hadoop map-reduce calculation acceleration method based on kernel bypass technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108804040A CN108804040A (en) | 2018-11-13 |
CN108804040B true CN108804040B (en) | 2020-07-07 |
Family
ID=64087152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810568335.1A Active CN108804040B (en) | 2018-06-05 | 2018-06-05 | Hadoop map-reduce calculation acceleration method based on kernel bypass technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804040B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653476A (en) * | 2014-11-12 | 2016-06-08 | 华为技术有限公司 | Communication method between data processor and memory equipment, and related device |
CN106506513A (en) * | 2016-11-21 | 2017-03-15 | 国网四川省电力公司信息通信公司 | Firewall policy data analysis set-up and method based on network traffics |
CN107480080A (en) * | 2017-07-03 | 2017-12-15 | 香港红鸟科技股份有限公司 | A kind of Zero-copy data stream based on RDMA |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170180272A1 (en) * | 2012-10-03 | 2017-06-22 | Tracey Bernath | System and method for accelerating network applications using an enhanced network interface and massively parallel distributed processing |
-
2018
- 2018-06-05 CN CN201810568335.1A patent/CN108804040B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653476A (en) * | 2014-11-12 | 2016-06-08 | 华为技术有限公司 | Communication method between data processor and memory equipment, and related device |
CN106506513A (en) * | 2016-11-21 | 2017-03-15 | 国网四川省电力公司信息通信公司 | Firewall policy data analysis set-up and method based on network traffics |
CN107480080A (en) * | 2017-07-03 | 2017-12-15 | 香港红鸟科技股份有限公司 | A kind of Zero-copy data stream based on RDMA |
Non-Patent Citations (1)
Title |
---|
JVM-Bypass for Efficient Hadoop Shuffling;Yandong Wang等;《2013 IEEE 27th International Symposium on Parallel & Distributed Processing》;20131231;第569-578页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108804040A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10496597B2 (en) | On-chip data partitioning read-write method, system, and device | |
KR102028252B1 (en) | Autonomous memory architecture | |
US10025533B2 (en) | Logical block addresses used for executing host commands | |
Qin et al. | How to apply the geospatial data abstraction library (GDAL) properly to parallel geospatial raster I/O? | |
CN103761988A (en) | SSD (solid state disk) and data movement method | |
CN103647850A (en) | Data processing method, device and system of distributed version control system | |
CN116561051B (en) | Hardware acceleration card and heterogeneous computing system | |
KR20110028212A (en) | Autonomous subsystem architecture | |
CN111444134A (en) | Parallel PME (pulse-modulated emission) accelerated optimization method and system of molecular dynamics simulation software | |
US9342564B2 (en) | Distributed processing apparatus and method for processing large data through hardware acceleration | |
CN115033188A (en) | Storage hardware acceleration module system based on ZNS solid state disk | |
CN112200310B (en) | Intelligent processor, data processing method and storage medium | |
US10061747B2 (en) | Storage of a matrix on a storage compute device | |
CN113869495A (en) | Method, device and equipment for optimizing convolutional weight layout of neural network and readable medium | |
CN104598409A (en) | Method and device for processing input and output requests | |
CN108804040B (en) | Hadoop map-reduce calculation acceleration method based on kernel bypass technology | |
CN116074179B (en) | High expansion node system based on CPU-NPU cooperation and training method | |
Li et al. | Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing | |
US10289447B1 (en) | Parallel process scheduling for efficient data access | |
Ali et al. | A New Merging Numerous Small Files Approach for Hadoop Distributed File System | |
KR20210108487A (en) | Storage Device Behavior Orchestration | |
US9569280B2 (en) | Managing resource collisions in a storage compute device | |
US9183211B1 (en) | Cooperative storage of shared files in a parallel computing system with dynamic block size | |
US11550718B2 (en) | Method and system for condensed cache and acceleration layer integrated in servers | |
US20220107844A1 (en) | Systems, methods, and devices for data propagation in graph processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200410 Address after: 200433, No. 15, No. 323, National Road, Shanghai, Yangpu District (centrally registered) Applicant after: SHANGHAI FUDIAN INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 200082 the 15 level (323) of Guo Ding Road, Yangpu District, Shanghai. Applicant before: SHANGHAI FUDIAN INTELLIGENT TECHNOLOGY Co.,Ltd. Applicant before: Zhao Jisheng Applicant before: Wu Yu |
|
GR01 | Patent grant | ||
GR01 | Patent grant |