CN105653708A - Hadoop matrix processing method and system of heterogeneous cluster - Google Patents

Hadoop matrix processing method and system of heterogeneous cluster Download PDF

Info

Publication number
CN105653708A
CN105653708A CN201511028067.7A CN201511028067A CN105653708A CN 105653708 A CN105653708 A CN 105653708A CN 201511028067 A CN201511028067 A CN 201511028067A CN 105653708 A CN105653708 A CN 105653708A
Authority
CN
China
Prior art keywords
matrix
hadoop
code
isomeric group
group according
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511028067.7A
Other languages
Chinese (zh)
Inventor
刘勇
喻之斌
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201511028067.7A priority Critical patent/CN105653708A/en
Publication of CN105653708A publication Critical patent/CN105653708A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

A Hadoop matrix processing method of a heterogeneous cluster comprises the following steps that a physical cluster is established, and a Master node and multiple Slaver nodes are set; a programming environment in a Java development environment is configured for the Master node and the multiple Slaver nodes respectively, and Map and Reduce codes of a matrix multiplication CUDA version are prewritten; relevant information of a first matrix A and a second matrix B stored in an internal storage are read, and MapReduce matrix multiplication operation is conducted on the stored first matrix A and the stored second matrix B according to the prewritten codes; a control operation result is directly written into a distributed file system HDFS, wherein the A is equal to (aij) and is the m * s matrix, and the B is equal to (bij) and is the s * n matrix. The Hadoop matrix processing method improves the limited multiplication performance of the Hadoop matrix from the perspective of an algorithm, can more deeply improve the performance of programs and effectively improves the efficiency of the matrix multiplication operation.

Description

A kind of Hadoop matrix disposal method of isomeric group and system
Technical field
The invention belongs to technical field of data processing, particularly relate to Hadoop matrix disposal method and the system of a kind of isomeric group.
Background technology
High matrix operation has been widely used in the key areas such as industry, science and technology, and from image procossing, data mining to biological computation etc., matrix multiplication is one of calculating the most important in matrix operation. But the expansion along with matrix scale, carries out matrix multiple and becomes difficulty in the short period of time. Classical matrix multiplication adopts individual node serial processing or GPU parallel processing plan. Although the program improves performance to a certain extent, but and it is not suitable for mass data processing. Hadoop is one can, to the Distributed Architecture of big data processing, be the realization of increasing income that MapReduce programming model is the most popular. Which simplify data distribution, process, calculate and task scheduling, and there is the high characteristic such as fault-tolerant, highly reliable, Highly Scalable and high resource utilization. Programming personnel only needs to write Map and Reduce function, and Hadoop automatically by each node to cluster of task matching, and can execute the task, thus reaches data parallel. Paper (Sun Yuanshuai, old, official is newly equal, Lin Chen) " the big data multiplication treatment process based on Hadoop ", it is proposed to adopt Law of Inner Product and outer produce method to realize the matrix multiple of MapReduce.
But, (1), for mass data processing application, Hadoop performance is unsatisfactory. Mass data processing is applied, and has two features: computation-intensive and data-intensive, and Hadoop is mainly applicable to data-intensive applications; (2) adopt Law of Inner Product MapReduce only an operation just can finish the work, but the middle Output rusults in Map stage is very big, Hadoop framework needed intermediate result is write this local disk in the Map stage, the Shuffle stage needs the intermediate result copying corresponding subregion, therefore, the program seldom uses in actual applications. Outer produce method is when reducing certain concurrent granularity, original operation is divided into two, relatively reduce the data volume of intermediate result, but the output of first operation needs the input as the 2nd operation, now need to wait that first operation completes to carry out the 2nd operation.
Summary of the invention
The present invention is in view of above-mentioned the deficiencies in the prior art, it is provided that a kind of Hadoop matrix disposal method of isomeric group, effectively promotes the efficiency that is multiplied of Hadoop matrix.
Embodiments of the invention provide a kind of Hadoop matrix disposal method of isomeric group, comprise the following steps,
Build a physical cluster, a Master node and multiple Slaver node are set;
Described Master node and multiple Slaver node configure programming environment under Java development environment respectively, Map and the Reduce code of CUDA version and pre-matrix is multiplied;
Read in internal memory and stored the first matrix A and the relevant information of the 2nd matrix B, and according to pre-code of compiling, the first matrix A of described storage and the 2nd matrix B are carried out MapReduce matrix multiple computing;
Control algorithm result directly writes into distributed document system HDFS;
Wherein, described A=(aij) it is the matrix of m �� s, B=(bij) it is the matrix of s �� n.
Preferably, the programming environment under described Java development environment refers to Java development environment JDK, the programming environment CUDA of Hadoop, IntelGPU, JCuda, Ganglia;
Wherein, JCuda provides the API that Java directly accesses CUDA, the CPU of Ganglia monitor in real time cluster, internal memory, network, hard disk utilization ratio.
Preferably, the storage mode of described first matrix A and the 2nd matrix B adopts tlv triple form storage mode, and concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of the first matrix A; bjIt is the jth row of the 2nd matrix B.
Preferably, the computing of MapReduce matrix multiple specifically comprises:
In the Map stage, obtain emit ((i, j), a according to pre-code of compilingi T��bj), whereinIn the Reduce stage, directly obtain the result in Map stage.
Preferably, after described step control algorithm result directly writes into distributed document system HDFS, also comprise step,
Build Web server, the software-hardware configuration information of the acceleration when described physical cluster of display program.
Preferably, if the data amount check of reduce stage processing is zero, the Map stage intermediate input result is directly write into distributed document system HDFS.
Preferably, before the storage mode of described first matrix A and the 2nd matrix B adopts tlv triple form storage mode to store, first described first matrix A and the 2nd matrix B are carried out pre-treatment, gather the relevant information in the first matrix A and the 2nd matrix B according to triple store forms mode.
Embodiments of the invention also provide the Hadoop matrix disposal system of a kind of isomeric group, and described treatment system comprises:
Environment builds unit, for building a physical cluster, and arranges a Master node and multiple Slaver node;
Unit compiled in advance by configuration and code, for being the programming environment under Joint Enterprise Java development environment, Map and the Reduce code of CUDA version and pre-matrix is multiplied;
Storage unit, needs, for storing, the matrix information carrying out multiplication operation;
Actuator unit, for the matrix information stored in reading cells, and carries out MapReduce matrix multiple computing according to pre-code of compiling to described storage matrix;
Output unit, operation result is directly write into distributed document system HDFS by control.
Preferably, described treatment system also comprises the monitoring of performance and display unit, for the software-hardware configuration information of the acceleration when described physical cluster that shows program.
Preferably, adopting tlv triple form storage mode according to pre-information storage mode in described storage unit of two matrixes that described storage matrix carries out MapReduce matrix multiple computing by code of compiling, concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of first matrix; bjIt is the jth row of the 2nd matrix.
In above technical scheme, adopt Master node and multiple Slaver node parallel processing Hadoop matrix multiplication task, and Map and the Reduce code of the CUDA version that is multiplied by pre-matrix, realize GPU Hadoop matrix multiplication task to be accelerated, Hadoop matrix multiple limited capacity is promoted from algorithm angle, can the performance of more profound raising program, effectively improve the efficiency of matrix multiple computing.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the Hadoop matrix disposal method of the isomeric group of an embodiment of the present invention.
Fig. 2 is the structure block diagram of the Hadoop matrix disposal system of the isomeric group of an embodiment of the present invention.
Fig. 3 is the Hadoop matrix disposal system architecture figure of a kind of isomeric group of the present invention.
Embodiment
In order to make technical problem solved by the invention, technical scheme and useful effect clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated. It is to be understood that specific embodiment described herein is only in order to explain the present invention, it is not intended to limit the present invention.
As shown in Figure 1, embodiments of the invention provide the Hadoop matrix multiplication algorithm of a kind of isomeric group, comprise the following steps,
Step S100, builds a physical cluster, arranges a Master node and multiple Slaver node;
Step S200, configures the programming environment under Java development environment respectively on described Master node and multiple Slaver node, Map and the Reduce code of CUDA version and pre-matrix is multiplied;
Step S300, reads in internal memory and has stored the first matrix A and the relevant information of the 2nd matrix B, and according to pre-code of compiling, the first matrix A of described storage and the 2nd matrix B is carried out MapReduce matrix multiple computing;
Step S400, control algorithm result directly writes into distributed document system HDFS;
Wherein, described A=(aij) it is the matrix of m �� s, B=(bij) it is the matrix of s �� n.
Preferably, in step s 200, at each Master and Slaver node deploy and Install and configure Java development environment JDK, the programming environment CUDA of Hadoop, IntelGPU, JCuda, Ganglia. Wherein JCuda provides the API that Java directly accesses CUDA, the CPU of Ganglia monitor in real time cluster, internal memory, network, hard disk utilization ratio etc.
Hadoop adopts Java language programming realization, and GPU is CUDA (intelGPU)/OpenCL (AMDGPU) writes, in order to enable Hadoop task seamless operation in GPU, code reunification must be solved, Hadoop provides Pipes, Streaming two kinds programming interface supports other programming languages, and Java self also has JNI scheme to support other programming languages. CUDAruntime can be connected by JCuda with driverapi and Java, thus realize Java program and call GPU resource.
Pipes adopts the mode of packaging process by Socket transmission " key-value to ", and network transmission can be brought very big expense by this, and program routine difficulty. Streaming adopts the mode of packaging process by stdiostream transmission " key-value to ", and network transmission also becomes main performance bottleneck, but test procedure is simple. The programming of JNI scheme is complicated, and exploitativeness is not high. Therefore, this scheme is considered from programming difficulty, program performance, debugging difficulty, adopts JCuda scheme.
Owing to the Map phase data process of Hadoop is with behavior unit, if so data store adopts two dimension sheet form, need whole matrix reading internal memory when then reading a certain row of matrix, obtain corresponding row again, performance can reduce greatly, therefore Hadoop is for the tlv triple form storage scheme of the calculation process employing table 1 of matrix.
Table 1: the tlv triple form file layout of matrix
rowIndex colIndex value
�� �� ��
i j aij
�� �� ��
MapReduce matrix multiple:
When adopting Law of Inner Product to carry out matrix multiple computing, the calculating of each element in Matrix C does not rely on each other, it is possible to reach the concurrent granularity of m �� n. The flow process of MapRedce is as follows:
Map:
For each element a of matrix Aij, i.e. (ijaij), emit ((i, k), aij),k��[1,n],
For each element b of matrix Bjk, i.e. (jkbjk), emit ((i, k), bjk),i��[1,m]��
Reduce:
For each key:(ik)
Calculate Value:
The work output that can find intermediate data result from calculation process is m �� s �� n A matrix element and s �� n �� m B matrix element, the Output rusults data volume of centre is expanded m doubly relative to original matrix data volume, huge network transmission expense can be brought in the shuffle stage.
Therefore, as a kind of preferred version, in described step S300, the storage mode of the first matrix A of the present invention and the 2nd matrix B adopts tlv triple form storage mode, and concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of the first matrix A; bjIt is the jth row of the 2nd matrix B.
Specifically, the storage mode of described first matrix A and the 2nd matrix B adopts the tlv triple form storage mode shown in table 2. Before the storage mode of described first matrix A and the 2nd matrix B adopts the tlv triple form storage mode shown in table 2 to store, first described first matrix A and the 2nd matrix B are carried out pre-treatment, gather the relevant information in the first matrix A and the 2nd matrix B according to triple store forms mode.
Table 2: pretreated data storage format
rowIndex colIndex value
�� �� ��
i j ai T bj
�� �� ��
Further, in step S300, described MapReduce matrix multiple computing specifically comprises:
In the Map stage, obtain emit ((i, j), a according to pre-code of compilingi T��bj), wherein
In the Reduce stage, directly obtain the result in Map stage, so the Reduce stage is without the need to any operation.
Can find from calculation process, original data volume is very big, but to program and have no effect, and middle Output rusults only m �� n matrix element, interim data only have 1/ (2*s) (wherein s refers to the subscript in matrix A and B) of scheme above, so the shuffle stage network I/O expense that the Map stage writes magnetic disc i/o expense and Reduce all obviously reduces.
Meanwhile, further, MapReduce is optimized by Hadoop, if the data amount check of reduce stage processing is zero, the Map stage intermediate input result is directly write into distributed document system HDFS, so performance will obtain bigger raising.
More preferably, shown in composition graphs 2, at described step S400, after control algorithm result directly writes into distributed document system HDFS, also comprise step S500, build Web server, the software-hardware configuration information of the acceleration when described physical cluster of display program.
Adopting Law of Inner Product realization matrix multiplication operation, middle interim data volume is big, and data, from the source of matrix multiple computing, are carried out pre-treatment by this scheme, when not reducing parallel degree, and the intermediate data result of obvious reduction task.
As shown in Figure 3, embodiments of the invention also provide the Hadoop matrix disposal system of a kind of isomeric group, comprising:
Environment builds unit 001, for building a physical cluster, and arranges a Master node and multiple Slaver node.
Unit 002 compiled in advance by configuration and code, for being the programming environment under Joint Enterprise Java development environment, Map and the Reduce code of CUDA version and pre-matrix is multiplied.
Each Master and Slaver node deploy and Install and configure Java development environment JDK, the programming environment CUDA of Hadoop, IntelGPU, JCuda, Ganglia. Wherein JCuda provides the API that Java directly accesses CUDA, the CPU of Ganglia monitor in real time cluster, internal memory, network, hard disk utilization ratio etc.
Hadoop adopts Java language programming realization, and GPU is CUDA (intelGPU)/OpenCL (AMDGPU) writes, in order to enable Hadoop task seamless operation in GPU, code reunification must be solved, Hadoop provides Pipes, Streaming two kinds programming interface supports other programming languages, and Java self also has JNI scheme to support other programming languages. CUDAruntime can be connected by JCuda with driverapi and Java, thus realize Java program and call GPU resource.
Pipes adopts the mode of packaging process by Socket transmission " key-value to ", and network transmission can be brought very big expense by this, and program routine difficulty. Streaming adopts the mode of packaging process by stdiostream transmission " key-value to ", and network transmission also becomes main performance bottleneck, but test procedure is simple. The programming of JNI scheme is complicated, and exploitativeness is not high. Therefore, this scheme is considered from programming difficulty, program performance, debugging difficulty, adopts JCuda scheme.
Storage unit 003, needs, for storing, the matrix information carrying out multiplication operation;
Preferably, the storage mode of the first matrix A of the present invention and the 2nd matrix B adopts tlv triple form storage mode, and concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of the first matrix A; bjIt is the jth row of the 2nd matrix B.
Actuator unit 004, for the matrix information stored in reading cells, and carries out MapReduce matrix multiple computing according to pre-code of compiling to described storage matrix;
Output unit 005, operation result is directly write into distributed document system HDFS by control.
Further, described treatment system also comprises the monitoring of performance and display unit 006, for the software-hardware configuration information of the acceleration when described physical cluster that shows program.
The Hadoop matrix disposal method of the isomeric group that the embodiment of the present invention provides and system have the following advantages:
(1) program adopts JCuda scheme, has performance better relative to Pipes, Streming, relatively easy relative to JNI programming, simultaneously convenient debugging and test procedure.
(2) matrix multiple computing is done performance optimization according to the performance bottleneck of Hadoop self framework in application program aspect, give GPU process at a large amount of calculating section, it is possible to the performance of more profound raising program simultaneously.
(3) speeding scheme of a kind of Hadoop matrix multiple computing is proposed from system structure angle.
The foregoing is only the better embodiment of the present invention, not in order to limit the present invention, all any amendment, equivalent replacement and improvement etc. done within the spirit and principles in the present invention, all should be included within protection scope of the present invention.

Claims (10)

1. the Hadoop matrix disposal method of an isomeric group, it is characterised in that: comprise the following steps, build a physical cluster, a Master node and multiple Slaver node are set;
Described Master node and multiple Slaver node configure programming environment under Java development environment respectively, Map and the Reduce code of CUDA version and pre-matrix is multiplied;
Read in internal memory and stored the first matrix A and the relevant information of the 2nd matrix B, and according to pre-code of compiling, the first matrix A of described storage and the 2nd matrix B are carried out MapReduce matrix multiple computing;
Control algorithm result directly writes into distributed document system HDFS;
Wherein, described A=(aij) it is the matrix of m �� s, B=(bij) it is the matrix of s �� n.
2. the Hadoop matrix disposal method of isomeric group according to claim 1, it is characterised in that:
Programming environment under described Java development environment refers to Java development environment JDK, the programming environment CUDA of Hadoop, IntelGPU, JCuda, Ganglia;
Wherein, JCuda provides the API that Java directly accesses CUDA, the CPU of Ganglia monitor in real time cluster, internal memory, network, hard disk utilization ratio.
3. the Hadoop matrix disposal method of isomeric group according to claim 1, it is characterised in that:
The storage mode of described first matrix A and the 2nd matrix B adopts tlv triple form storage mode, and concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of the first matrix A; bjIt is the jth row of the 2nd matrix B.
4. the Hadoop matrix disposal method of isomeric group according to claim 3, it is characterised in that: MapReduce matrix multiple computing specifically comprises:
In the Map stage, obtain emit ((i, j), a according to pre-code of compilingi T��bj), wherein
In the Reduce stage, directly obtain the result in Map stage.
5. the Hadoop matrix disposal method of isomeric group according to claim 1, it is characterised in that: after described step control algorithm result directly writes into distributed document system HDFS, also comprise step,
Build Web server, the software-hardware configuration information of the acceleration when described physical cluster of display program.
6. the Hadoop matrix disposal method of isomeric group according to claim 4, it is characterised in that: if the data amount check of reduce stage processing is zero, the Map stage intermediate input result is directly write into distributed document system HDFS.
7. the Hadoop matrix disposal method of isomeric group according to claim 3, it is characterized in that: before the storage mode of described first matrix A and the 2nd matrix B adopts tlv triple form storage mode to store, first described first matrix A and the 2nd matrix B are carried out pre-treatment, gather the relevant information in the first matrix A and the 2nd matrix B according to triple store forms mode.
8. the Hadoop matrix disposal system of an isomeric group, it is characterised in that: described treatment system comprises:
Environment builds unit, for building a physical cluster, and arranges a Master node and multiple Slaver node;
Unit compiled in advance by configuration and code, for being the programming environment under Joint Enterprise Java development environment, Map and the Reduce code of CUDA version and pre-matrix is multiplied;
Storage unit, needs, for storing, the matrix information carrying out multiplication operation;
Actuator unit, for the matrix information stored in reading cells, and carries out MapReduce matrix multiple computing according to pre-code of compiling to described storage matrix;
Output unit, operation result is directly write into distributed document system HDFS by control.
9. the Hadoop matrix disposal system of isomeric group according to claim 8, it is characterised in that:
Described treatment system also comprises the monitoring of performance and display unit, for the software-hardware configuration information of the acceleration when described physical cluster that shows program.
10. the Hadoop matrix disposal system of isomeric group according to claim 8, it is characterised in that:
Adopting tlv triple form storage mode according to pre-information storage mode in described storage unit of two matrixes that described storage matrix carries out MapReduce matrix multiple computing by code of compiling, concrete column information comprises i, j, ai Tbj;
Wherein, ai TIt it is the i-th row of first matrix; bjIt is the jth row of the 2nd matrix.
CN201511028067.7A 2015-12-31 2015-12-31 Hadoop matrix processing method and system of heterogeneous cluster Pending CN105653708A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511028067.7A CN105653708A (en) 2015-12-31 2015-12-31 Hadoop matrix processing method and system of heterogeneous cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511028067.7A CN105653708A (en) 2015-12-31 2015-12-31 Hadoop matrix processing method and system of heterogeneous cluster

Publications (1)

Publication Number Publication Date
CN105653708A true CN105653708A (en) 2016-06-08

Family

ID=56491106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511028067.7A Pending CN105653708A (en) 2015-12-31 2015-12-31 Hadoop matrix processing method and system of heterogeneous cluster

Country Status (1)

Country Link
CN (1) CN105653708A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762915A (en) * 2018-04-19 2018-11-06 上海交通大学 A method of caching RDF data in GPU memories

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246749A (en) * 2013-05-24 2013-08-14 北京立新盈企信息技术有限公司 Matrix data base system for distributed computing and query method thereof
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
CN103246749A (en) * 2013-05-24 2013-08-14 北京立新盈企信息技术有限公司 Matrix data base system for distributed computing and query method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡成玉 等: "基于 MapReduce的高阶矩阵乘法分布式并行算法研究", 《小型微型计算机系统》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762915A (en) * 2018-04-19 2018-11-06 上海交通大学 A method of caching RDF data in GPU memories
CN108762915B (en) * 2018-04-19 2020-11-06 上海交通大学 Method for caching RDF data in GPU memory

Similar Documents

Publication Publication Date Title
Athlur et al. Varuna: scalable, low-cost training of massive deep learning models
Gunarathne et al. Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
Li et al. MapReduce parallel programming model: a state-of-the-art survey
Gu et al. SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Chen et al. Flinkcl: An opencl-based in-memory computing architecture on heterogeneous cpu-gpu clusters for big data
CN105117286B (en) The dispatching method of task and streamlined perform method in MapReduce
US10846284B1 (en) View-based data mart management system
Puri et al. MapReduce algorithms for GIS polygonal overlay processing
CN102137125A (en) Method for processing cross task data in distributive network system
Song et al. Modulo based data placement algorithm for energy consumption optimization of MapReduce system
Miller et al. Open source big data analytics frameworks written in scala
Zhu et al. WolfGraph: The edge-centric graph processing on GPU
Mencagli et al. Harnessing sliding-window execution semantics for parallel stream processing
Cecilia et al. Enhancing GPU parallelism in nature-inspired algorithms
Al Farhan et al. Unstructured computational aerodynamics on many integrated core architecture
Brighen et al. Listing all maximal cliques in large graphs on vertex-centric model
Kang et al. An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
Acevedo et al. A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters
Cunningham et al. Causal set generator and action computer
CN105653708A (en) Hadoop matrix processing method and system of heterogeneous cluster
Geng et al. The importance of efficient fine-grain synchronization for many-core systems
Zhang et al. Lightweight distributed execution engine for large-scale spatial join query processing
Piñeiro et al. A unified framework to improve the interoperability between HPC and Big Data languages and programming models
Huang et al. Survey of external memory large-scale graph processing on a multi-core system
Ruan et al. Hymr: a hybrid mapreduce workflow system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160608

RJ01 Rejection of invention patent application after publication