CN113326125A - Large-scale distributed graph calculation end-to-end acceleration method and device - Google Patents

Large-scale distributed graph calculation end-to-end acceleration method and device Download PDF

Info

Publication number
CN113326125A
CN113326125A CN202110552903.0A CN202110552903A CN113326125A CN 113326125 A CN113326125 A CN 113326125A CN 202110552903 A CN202110552903 A CN 202110552903A CN 113326125 A CN113326125 A CN 113326125A
Authority
CN
China
Prior art keywords
algorithm
graph
load balancing
task
radix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110552903.0A
Other languages
Chinese (zh)
Other versions
CN113326125B (en
Inventor
李丹
刘天峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110552903.0A priority Critical patent/CN113326125B/en
Publication of CN113326125A publication Critical patent/CN113326125A/en
Application granted granted Critical
Publication of CN113326125B publication Critical patent/CN113326125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a large-scale distributed graph computation end-to-end acceleration method and a device, wherein the method comprises the steps of carrying out task division on distributed graph computation to obtain a model selection task, a vertex distribution task and an adjacent linked list construction task; selecting a corresponding information flow mode to calculate a model selection task; dividing the vertexes into different graph divisions according to the end-to-end division indexes, and then distributing the vertexes through a streaming block division algorithm with an optimal threshold value; and expanding the load balancing radix sequencing algorithm to obtain a NUMA-aware load balancing radix sequencing algorithm, and converting the edge array into an adjacent linked list by using a distributed sequencing algorithm on the data format of the underlying graph through the NUMA-aware load balancing radix sequencing algorithm. The acceleration scheme taking the end-to-end time as an optimization target can greatly accelerate the end-to-end graph calculation processing performance.

Description

Large-scale distributed graph calculation end-to-end acceleration method and device
Technical Field
The invention relates to the technical field of distributed computing, in particular to a method and a device for accelerating computing of a large-scale distributed graph from end to end.
Background
In the big data era, applications such as social networks, internet of things and e-commerce generate a great deal of data, which is generally organized into a graph format and continuously increases, and has grown to the TB level. In order to efficiently process such large-scale graph data, a large number of distributed graph computing systems are proposed.
The process flow of a distributed graph computing system generally includes two phases. The first stage is a pre-treatment stage: naturally occurring maps are large and irregular and require pre-processing to perform a particular map algorithm. In the preprocessing stage, the format of the input graph needs to be converted and the graph needs to be divided into different machines. The second phase is the algorithm execution phase: a specific graph algorithm is executed on the preprocessed graph. Most graph computing systems primarily optimize the efficiency of the algorithm execution phase, without concern for pre-processing phase performance, resulting in very long end-to-end processing times.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide an end-to-end acceleration method for large-scale distributed graph computation, which uses an end-to-end time as an acceleration scheme of an optimization target, and can greatly accelerate the end-to-end graph computation processing performance.
Another objective of the present invention is to provide an end-to-end acceleration apparatus for large-scale distributed graph computation.
In order to achieve the above object, an embodiment of an aspect of the present invention provides an end-to-end acceleration method for large-scale distributed graph computation, including:
performing task division on the distributed graph calculation to obtain a model selection task, a vertex distribution task and an adjacency linked list construction task;
selecting a corresponding information flow mode to calculate a model selection task;
dividing the vertexes into different graph divisions according to the end-to-end division indexes, and then distributing the vertexes through a streaming block division algorithm with an optimal threshold value;
and expanding the load balancing radix sequencing algorithm to obtain a NUMA-aware load balancing radix sequencing algorithm, and converting the edge array into an adjacent linked list by using a distributed sequencing algorithm on the data format of the underlying graph through the NUMA-aware load balancing radix sequencing algorithm.
According to the large-scale distributed graph calculation end-to-end acceleration method, the large-scale distributed graph calculation end-to-end acceleration device takes end-to-end time as an optimization target, and a static mode is proposed based on theoretical analysis to reduce a data preprocessing flow; providing a more balanced end-to-end partition index and a streaming block partition algorithm; a faster and more efficient distributed sorting algorithm accelerated sorting process is provided.
In addition, the large-scale distributed graph computation end-to-end acceleration method according to the above embodiment of the present invention may further have the following additional technical features:
further, the information flow mode comprises a push mode and a pull mode, wherein the push mode is that each vertex pushes the updated information to a target vertex through an outgoing edge; the pull mode is that each vertex pulls the updated information from the source vertex to itself through an incoming edge.
Further, the end-to-end division index is:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
Further, the optimal threshold value of the algorithm is searched through the dichotomy.
Further, the load balancing radix ranking algorithm obtained by expanding the load balancing radix ranking algorithm to obtain NUMA perception includes:
shared memory communication is used and allocated in a particular NUMA memory for different threads.
In order to achieve the above object, another embodiment of the present invention provides an end-to-end acceleration apparatus for large-scale distributed graph computation, including:
the division module is used for carrying out task division on the distributed graph calculation to obtain a model selection task, a vertex distribution task and an adjacency linked list construction task;
the selection module is used for selecting the corresponding information flow mode to calculate the model selection task;
the distribution module is used for dividing the vertexes into different graph divisions according to the end-to-end division indexes and distributing the vertexes through a streaming block division algorithm of an optimal threshold value;
the building module is used for expanding the load balancing radix sorting algorithm to obtain the NUMA-aware load balancing radix sorting algorithm, and the NUMA-aware load balancing radix sorting algorithm is used for converting the edge array into the adjacent linked list by using the distributed sorting algorithm in the data format of the underlying graph.
The large-scale distributed graph calculation end-to-end accelerating device takes end-to-end time as an optimization target, and reduces data preprocessing flow by proposing a static mode based on theoretical analysis; providing a more balanced end-to-end partition index and a streaming block partition algorithm; a faster and more efficient distributed sorting algorithm accelerated sorting process is provided.
In addition, the large-scale distributed graph computation end-to-end acceleration apparatus according to the above embodiment of the present invention may further have the following additional technical features:
further, the information flow mode comprises a push mode and a pull mode, wherein the push mode is that each vertex pushes the updated information to a target vertex through an outgoing edge; the pull mode is that each vertex pulls the updated information from the source vertex to itself through an incoming edge.
Further, the end-to-end division index is:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
Further, the optimal threshold value of the algorithm is searched through the dichotomy.
Further, the load balancing radix ranking algorithm obtained by expanding the load balancing radix ranking algorithm to obtain NUMA perception includes:
shared memory communication is used and allocated in a particular NUMA memory for different threads.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a large-scale distributed graph computation end-to-end acceleration method according to one embodiment of the invention;
FIG. 2 is a block partitioning algorithm for optimal threshold according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a NUMA aware load balancing cardinality ordering algorithm according to one embodiment of the invention;
FIG. 4 is a schematic diagram of a distributed NUMA aware load balancing cardinality ordering algorithm, according to one embodiment of the invention;
FIG. 5 is a block diagram of a large-scale distributed graph computation end-to-end acceleration apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for end-to-end acceleration of large-scale distributed graph computation according to an embodiment of the present invention with reference to the accompanying drawings.
First, a proposed large-scale distributed graph computation end-to-end acceleration method according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a flow diagram of a large-scale distributed graph computation end-to-end acceleration method according to one embodiment of the invention.
As shown in fig. 1, the large-scale distributed graph computation end-to-end acceleration method includes the following steps:
and step S1, performing task division on the distributed graph calculation to obtain a model selection task, a vertex distribution task and an adjacency linked list construction task.
Distributed graph computation typically involves two phases of preprocessing and algorithm execution, and the lack of attention to the preprocessing phase in existing schemes results in very long end-to-end processing times. The pre-processing stage can be divided into three tasks: the method comprises a model selection task, a vertex distribution task and an adjacency linked list construction task.
In step S2, the corresponding information flow pattern is selected to calculate the model selection task.
In the task of model selection, the characteristics of an algorithm and an information flow mode are fully considered, and a static mode is provided to reduce the workload of preprocessing.
The first task of the pre-processing is to select the information flow pattern used by the algorithm. Two information flow modes are used, one is a push (push) mode, namely each vertex pushes updated information to a target vertex through an outgoing edge; the other is the pull mode, where each vertex pulls updated information from the source vertex to itself through an incoming edge. The existing graph algorithms can be divided into two types, one is an Always-Active-Style (Always-Active-Style) graph algorithm, and the other is a Traversal-Style (Traversal-Style) graph algorithm. It can be demonstrated that for the always-active type graph algorithm, the time of the push mode is strictly less than that of the pull mode; for a traversal-type graph algorithm, the time of the pull mode is strictly less than the time of the push mode. Based on the above theorem, we can use a specific pattern for a specific algorithm.
And step S3, dividing the vertexes into different graph divisions according to the end-to-end division indexes, and then distributing the vertexes through a streaming block division algorithm of an optimal threshold value.
In the vertex distribution task, a more representative division index is provided according to the characteristics of an end-to-end task; and then a streaming block partitioning algorithm with theoretical guarantee is provided to enable end-to-end task partitioning to be more balanced.
The second task is to divide the vertices onto different graph partitions, making the workload on each slice as equal as possible. First, a more balanced partitioning formula is proposed as follows:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
This formula takes into account the communication load and the computation load of the preprocessing phase and the algorithm execution phase. And then a streaming block partitioning algorithm of an optimal threshold value is proposed. The chunking partitioning (chunking partitioning) algorithm is a partitioning algorithm with the lowest preprocessing cost known at present, but the existing chunking partitioning algorithm has the problem of load imbalance. As shown in fig. 2, the optimal partitioning strategy is found by searching for the optimal threshold. It can be shown that the function of this search algorithm is a non-decreasing function and that the optimal threshold value is exactly the point of change of the function value. Based on the two properties, the optimal threshold value can be found by using a binary search algorithm efficiently.
And step S4, the load balancing radix sorting algorithm is expanded to obtain a NUMA-aware load balancing radix sorting algorithm, and the edge array is converted into an adjacent linked list by the NUMA-aware load balancing radix sorting algorithm by using a distributed sorting algorithm in the data format of the underlying graph.
In the task of constructing the adjacency linked list, a load balancing radix sorting algorithm is used. And the algorithm is expanded to a distributed scene by utilizing the characteristics of the graph calculation and the overhead of distributed communication is greatly reduced.
The third task is to convert the edge array into the adjacency linked list by using a distributed sorting algorithm in the data format of the underlying graph, which is the most time-consuming task in the preprocessing stage. For the sorting in the machine, the existing load balancing radix sorting algorithm is expanded to obtain a load balancing radix sorting algorithm perceived by NUMA, as shown in fig. 3, so that the load balancing radix sorting algorithm is more suitable for the NUMA architecture of the current server. Extensions include two aspects, the first is to use shared memory communication, and the second is to allocate in a particular NUMA memory for different threads. Then, for inter-machine ordering, the NUMA-aware load balancing cardinality ordering algorithm is extended to a distributed scenario, as shown in fig. 4. The characteristics of graph calculation, namely the characteristics that partial sequencing results are known before sequencing, are fully utilized. The data can be partially sorted before being transmitted so as to reduce the transmission amount of the data.
According to the large-scale distributed graph calculation end-to-end acceleration method provided by the embodiment of the invention, end-to-end time is taken as an optimization target, and a static mode is provided based on theoretical analysis to reduce a data preprocessing flow; providing a more balanced end-to-end partition index and a streaming block partition algorithm; a faster and more efficient distributed sorting algorithm accelerated sorting process is provided.
Next, a description is given, with reference to the drawings, of a large-scale distributed graph computation end-to-end acceleration apparatus proposed according to an embodiment of the present invention.
FIG. 5 is a block diagram of a large-scale distributed graph computation end-to-end acceleration apparatus according to an embodiment of the present invention.
As shown in fig. 5, the large-scale distributed graph computation end-to-end acceleration apparatus includes: a partitioning module 501, a selection module 502, an assignment module 503, and a construction module 504.
The partitioning module 501 is configured to perform task partitioning on the distributed graph computation, so as to obtain a model selection task, a vertex allocation task, and an adjacency linked list construction task.
And the selection module 502 is used for selecting the corresponding information flow mode to calculate the model selection task.
The allocating module 503 is configured to divide the vertex into different graph partitions according to the end-to-end division index, and allocate the vertex through a streaming block division algorithm with an optimal threshold.
The building module 504 is configured to expand the load balancing radix ranking algorithm to obtain a NUMA-aware load balancing radix ranking algorithm, and convert the edge array into the adjacency linked list using a distributed ranking algorithm for the underlying graph data format through the NUMA-aware load balancing radix ranking algorithm.
Furthermore, the information flow mode comprises a push mode and a pull mode, wherein the push mode is that each vertex pushes the updated information to the target vertex through an outgoing edge; the pull mode is that each vertex pulls the updated information from the source vertex to itself through an incoming edge.
Further, the end-to-end division index is:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
Further, the optimal threshold value of the algorithm is searched through the dichotomy.
Further, the load balancing radix ranking algorithm obtained by expanding the load balancing radix ranking algorithm to obtain NUMA perception includes:
shared memory communication is used and allocated in a particular NUMA memory for different threads.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
According to the large-scale distributed graph calculation end-to-end accelerating device provided by the embodiment of the invention, end-to-end time is taken as an optimization target, and a static mode is provided based on theoretical analysis to reduce a data preprocessing flow; providing a more balanced end-to-end partition index and a streaming block partition algorithm; a faster and more efficient distributed sorting algorithm accelerated sorting process is provided.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An end-to-end acceleration method for large-scale distributed graph computation is characterized by comprising the following steps:
performing task division on the distributed graph calculation to obtain a model selection task, a vertex distribution task and an adjacency linked list construction task;
selecting a corresponding information flow mode to calculate a model selection task;
dividing the vertexes into different graph divisions according to the end-to-end division indexes, and then distributing the vertexes through a streaming block division algorithm with an optimal threshold value;
and expanding the load balancing radix sequencing algorithm to obtain a NUMA-aware load balancing radix sequencing algorithm, and converting the edge array into an adjacent linked list by using a distributed sequencing algorithm on the data format of the underlying graph through the NUMA-aware load balancing radix sequencing algorithm.
2. The method of claim 1, wherein the information flow patterns comprise a push pattern and a pull pattern, wherein the push pattern is used for pushing each vertex to the destination vertex through an outgoing edge after the updated information is pushed; the pull mode is that each vertex pulls the updated information from the source vertex to itself through an incoming edge.
3. The method of claim 1, wherein the end-to-end partition index is:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
4. The method of claim 1, wherein the optimal threshold value of the algorithm is found by a dichotomy.
5. The method of claim 1, wherein expanding the load balancing radix ranking algorithm to a NUMA aware load balancing radix ranking algorithm comprises:
shared memory communication is used and allocated in a particular NUMA memory for different threads.
6. An end-to-end acceleration apparatus for large-scale distributed graph computation, comprising:
the division module is used for carrying out task division on the distributed graph calculation to obtain a model selection task, a vertex distribution task and an adjacency linked list construction task;
the selection module is used for selecting the corresponding information flow mode to calculate the model selection task;
the distribution module is used for dividing the vertexes into different graph divisions according to the end-to-end division indexes and distributing the vertexes through a streaming block division algorithm of an optimal threshold value;
the building module is used for expanding the load balancing radix sorting algorithm to obtain the NUMA-aware load balancing radix sorting algorithm, and the NUMA-aware load balancing radix sorting algorithm is used for converting the edge array into the adjacent linked list by using the distributed sorting algorithm in the data format of the underlying graph.
7. The apparatus of claim 6, wherein the information flow patterns comprise a push pattern and a pull pattern, wherein the push pattern is to push each vertex to the destination vertex through an outgoing edge; the pull mode is that each vertex pulls the updated information from the source vertex to itself through an incoming edge.
8. The apparatus of claim 6, wherein the end-to-end partition index is:
(1+η+θ(K-1))*E(Pi)+η(K-1)*V(Pi)
where η is a variable parameter to balance the weight of preprocessing and algorithm execution, θ is the communication ratio in the distributed ranking algorithm, K is the division of the entire graph into K divisions, E (P)i) To divide into PiThe number of edges of all vertices, V (P)i) To divide into PiThe number of all vertices above.
9. The apparatus of claim 6, wherein the optimal threshold value of the algorithm is found by a dichotomy.
10. The apparatus of claim 6, wherein the load balancing radix ranking algorithm that is extended to NUMA aware load balancing radix ranking algorithm comprises:
shared memory communication is used and allocated in a particular NUMA memory for different threads.
CN202110552903.0A 2021-05-20 2021-05-20 Large-scale distributed graph calculation end-to-end acceleration method and device Active CN113326125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110552903.0A CN113326125B (en) 2021-05-20 2021-05-20 Large-scale distributed graph calculation end-to-end acceleration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110552903.0A CN113326125B (en) 2021-05-20 2021-05-20 Large-scale distributed graph calculation end-to-end acceleration method and device

Publications (2)

Publication Number Publication Date
CN113326125A true CN113326125A (en) 2021-08-31
CN113326125B CN113326125B (en) 2023-03-24

Family

ID=77416134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110552903.0A Active CN113326125B (en) 2021-05-20 2021-05-20 Large-scale distributed graph calculation end-to-end acceleration method and device

Country Status (1)

Country Link
CN (1) CN113326125B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097321A1 (en) * 2011-10-17 2013-04-18 Yahoo! Inc. Method and system for work load balancing
CN104780213A (en) * 2015-04-17 2015-07-15 华中科技大学 Load dynamic optimization method for principal and subordinate distributed graph manipulation system
CN104954823A (en) * 2014-03-31 2015-09-30 华为技术有限公司 Image calculation pretreatment device, method thereof and system thereof
CN105787020A (en) * 2016-02-24 2016-07-20 鄞州浙江清华长三角研究院创新中心 Graph data partitioning method and device
CN109919826A (en) * 2019-02-02 2019-06-21 西安邮电大学 A kind of diagram data compression method and figure computation accelerator for figure computation accelerator
US20190278760A1 (en) * 2008-11-14 2019-09-12 Georgetown University Process and Framework For Facilitating Information Sharing Using a Distributed Hypergraph
CN110245135A (en) * 2019-05-05 2019-09-17 华中科技大学 A kind of extensive streaming diagram data update method based on NUMA architecture
CN111209106A (en) * 2019-12-25 2020-05-29 北京航空航天大学杭州创新研究院 Streaming graph partitioning method and system based on cache mechanism
CN111581443A (en) * 2020-04-16 2020-08-25 南方科技大学 Distributed graph calculation method, terminal, system and storage medium
US20210081347A1 (en) * 2019-09-17 2021-03-18 Huazhong University Of Science And Technology Graph processing optimization method based on multi-fpga accelerator interconnection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190278760A1 (en) * 2008-11-14 2019-09-12 Georgetown University Process and Framework For Facilitating Information Sharing Using a Distributed Hypergraph
US20130097321A1 (en) * 2011-10-17 2013-04-18 Yahoo! Inc. Method and system for work load balancing
CN104954823A (en) * 2014-03-31 2015-09-30 华为技术有限公司 Image calculation pretreatment device, method thereof and system thereof
CN104780213A (en) * 2015-04-17 2015-07-15 华中科技大学 Load dynamic optimization method for principal and subordinate distributed graph manipulation system
CN105787020A (en) * 2016-02-24 2016-07-20 鄞州浙江清华长三角研究院创新中心 Graph data partitioning method and device
CN109919826A (en) * 2019-02-02 2019-06-21 西安邮电大学 A kind of diagram data compression method and figure computation accelerator for figure computation accelerator
CN110245135A (en) * 2019-05-05 2019-09-17 华中科技大学 A kind of extensive streaming diagram data update method based on NUMA architecture
US20210081347A1 (en) * 2019-09-17 2021-03-18 Huazhong University Of Science And Technology Graph processing optimization method based on multi-fpga accelerator interconnection
CN111209106A (en) * 2019-12-25 2020-05-29 北京航空航天大学杭州创新研究院 Streaming graph partitioning method and system based on cache mechanism
CN111581443A (en) * 2020-04-16 2020-08-25 南方科技大学 Distributed graph calculation method, terminal, system and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
殷晓波等: "一种松弛的优化均衡流式图划分算法研究", 《计算机科学》 *
王童童等: "分布式图处理系统技术综述", 《软件学报》 *
罗冬梅: "面向分布式图计算的平衡图划分算法", 《信息与电脑(理论版)》 *

Also Published As

Publication number Publication date
CN113326125B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN108566659B (en) 5G network slice online mapping method based on reliability
Nabi et al. Resource assignment in vehicular clouds
CN106250233B (en) MapReduce performance optimization system and optimization method
Schlag et al. Scalable edge partitioning
CN112148492A (en) Service deployment and resource allocation method considering multi-user mobility
CN114418127A (en) Machine learning calculation optimization method and platform
Xu et al. Computational experience with a software framework for parallel integer programming
Koh et al. MapReduce skyline query processing with partitioning and distributed dominance tests
CN111538867A (en) Method and system for dividing bounded incremental graph
Badri et al. A sample average approximation-based parallel algorithm for application placement in edge computing systems
CN113326125B (en) Large-scale distributed graph calculation end-to-end acceleration method and device
WO2015055502A2 (en) Method of partitioning storage in a distributed data storage system and corresponding device
Choo et al. Reliable vehicle selection algorithm with dynamic mobility of vehicle in vehicular cloud system
CN116303763A (en) Distributed graph database incremental graph partitioning method and system based on vertex degree
Abdolazimi et al. Connected components of big graphs in fixed mapreduce rounds
Guinand et al. Sensitivity analysis of tree scheduling on two machines with communication delays
Herrera et al. Dynamic and hierarchical load-balancing techniques applied to parallel branch-and-bound methods
Menouer et al. Towards a parallel constraint solver for cloud computing environments
CN113157431A (en) Computing task copy distribution method for edge network application environment
CN110188925A (en) A kind of time domain continuous type space crowdsourcing method for allocating tasks
CN111737531B (en) Application-driven graph division adjusting method and system
Karanik et al. Edge Service Allocation Based on Clustering Techniques
CN117349031B (en) Distributed super computing resource scheduling analysis method, system, terminal and medium
Shahin Using heavy clique base coarsening to enhance virtual network embedding
Cavallo et al. Fragmenting Big Data to boost the performance of MapReduce in geographical computing contexts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant