WO2021208174A1 - Distributed-type graph computation method, terminal, system, and storage medium - Google Patents

Distributed-type graph computation method, terminal, system, and storage medium Download PDF

Info

Publication number
WO2021208174A1
WO2021208174A1 PCT/CN2020/090238 CN2020090238W WO2021208174A1 WO 2021208174 A1 WO2021208174 A1 WO 2021208174A1 CN 2020090238 W CN2020090238 W CN 2020090238W WO 2021208174 A1 WO2021208174 A1 WO 2021208174A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
data
distributed
preprocessing
algorithm
Prior art date
Application number
PCT/CN2020/090238
Other languages
French (fr)
Chinese (zh)
Inventor
华井雅俊
泽奥多洛保罗斯乔治斯
Original Assignee
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南方科技大学 filed Critical 南方科技大学
Publication of WO2021208174A1 publication Critical patent/WO2021208174A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to, but are not limited to, the field of graph computing technology, and in particular, to a distributed graph computing method, terminal, system, and storage medium.
  • Graph is an abstract data structure used to represent the association relationship between objects. It is described by using vertices (Vertex) and edges (Edge). The vertices represent objects, and the edges represent the relationships between objects. Based on this, data that can be abstracted into graphs is graph data.
  • Graph computing is a process in which graphs are used as data models to express and solve problems.
  • distributed computing is used to analyze large-scale graph data.
  • the large-scale graph is divided into several subgraphs, and multiple slave nodes are used for calculation, which can effectively utilize multiple computing resources.
  • high-quality partitioning methods consume a lot of time during the calculation, which leads to higher energy consumption in the partitioning phase.
  • high-speed generation of partitions will result in low-quality partitions, causing serious performance losses.
  • the embodiment of the application provides a distributed graph calculation method, terminal, system and storage medium, which uses graph data preprocessing, and when large-scale graph data analysis is performed, the graph data is only transmitted once, which can segment the graph with high quality and efficiency. Data, increase the speed of distributed graph calculation and reduce energy consumption.
  • an embodiment of the present application provides a distributed graph calculation method, including:
  • the distributed architecture and the first intermediate preprocessing graph the first division graph is obtained
  • first distributed graph analysis data is obtained.
  • the distributed graph calculation method further includes:
  • a second division graph is obtained according to the graph division algorithm, the distributed architecture, and the first intermediate preprocessing graph;
  • the distributed graph calculation method further includes:
  • first graph data is not the same as the second graph data, obtaining difference data between the second graph data and the first graph data;
  • a third distributed graph analysis data is obtained.
  • the difference map preprocessing algorithm includes:
  • a second intermediate preprocessed graph is obtained according to the difference data and the incremental graph preprocessing algorithm.
  • the incremental graph preprocessing algorithm further includes:
  • an incremental graph edge sort is obtained.
  • the incremental graph edge sorting algorithm is applied to the main computing node, and the incremental graph edge sorting algorithm includes:
  • the difference map preprocessing algorithm further includes:
  • the second graph data after removing the decremented data and the graph preprocessing algorithm obtain a second intermediate preprocessed graph.
  • the distributed graph preprocessing algorithm further includes preprocessing graph edge sorting
  • the preprocessing graph edge sorting includes:
  • the first intermediate preprocessed graph is obtained.
  • the obtaining the first intermediate preprocessing graph according to the edge data and the vertex data includes:
  • the priority queue is obtained
  • the first intermediate preprocessing diagram includes:
  • the starting vertex ID of the edge and the ending vertex ID of the edge are stored in binary format.
  • the graph division algorithm includes:
  • node configuration information includes one or more of the number of nodes, node specifications, and node performance
  • an embodiment of the present application provides a terminal, including: a first memory, a first processor, and a computer program stored on the first memory and running on the first processor, the first processor Realize when executing the program:
  • an embodiment of the present application provides a distributed graph computing system, including a first distributed computing device and a second distributed computing device;
  • the first distributed computing device includes: a second memory, a second processor, and a first computer program that is stored on the second memory and can run on the second processor; the second processor executes the first computer program
  • a computer program is implemented: the distributed graph calculation method described in the first aspect
  • the second distributed computing device includes: a third memory, a third processor, and a second computer program that is stored on the third memory and can run on the third processor; the third processor executes the first
  • the computer program implements the distributed graph calculation method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to:
  • the embodiment of the application converts the original graph data into computer-readable intermediate graph data based on the preprocessing of graph data, graph division algorithm and incremental graph edge sorting algorithm, which enables the subsequent graph division to be carried out quickly, and also provides High-quality graph partitioning greatly reduces communication overhead and speeds up the calculation and analysis of distributed graphs.
  • FIG. 1 is a schematic flowchart of a distributed graph calculation method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a distributed graph calculation method provided by another embodiment of the application.
  • FIG. 3 is a schematic flowchart of a distributed graph computing method provided by another embodiment of this application.
  • FIG. 4 is a schematic structural diagram of a graph partition algorithm provided by an embodiment of this application.
  • FIG. 5 is a schematic structural diagram of an incremental graph edge sorting algorithm provided by an embodiment of the application.
  • graph In the distributed graph computing technology of known technology, graph is a basic and ubiquitous abstract concept, which is widely used in modeling various problems in the real world.
  • the vertices in the graph represent users, and the edges represent friendship relationships between users; in e-commerce services, the vertices represent users and products, and the edges represent purchase history.
  • graph data has been growing naturally.
  • one of the world's largest online social network services already contains about one trillion friendships.
  • the embodiments of the present application provide a distributed graph computing method, terminal, system, and storage medium, which can convert original graph data into computer-readable intermediate graph data, which enables the subsequent graph division to be carried out quickly, while also providing The high-quality graph partition is created, and the generated high-quality partition greatly reduces the communication overhead, and accelerates the calculation and analysis speed of the distributed graph.
  • the terminal may be a mobile terminal device or a non-mobile terminal device.
  • Mobile terminal devices can be mobile phones, tablet computers, laptops, palmtop computers, vehicle-mounted terminal devices, wearable devices, ultra-mobile personal computers, netbooks, digital cameras, video cameras or personal digital assistants, etc.
  • non-mobile terminal devices can be personal computers, Workstations, servers, televisions, teller machines, self-service machines, surveillance cameras or box cameras, etc.
  • An embodiment of the present application discloses a distributed graph calculation method.
  • Fig. 1 is a flowchart of a distributed graph calculation method.
  • the calculation method shown in Fig. 1 at least includes the following steps:
  • Step S100 Obtain the data of the first image
  • Step S101 graph preprocessing algorithm
  • Step S102 Obtain a first intermediate preprocessing map
  • Step S103 Obtain distributed architecture information
  • Step S104 graph partition algorithm
  • Step S105 the first division map
  • Step S106 Distributed graph analysis.
  • the graph preprocessing algorithm is used to form the first intermediate preprocessing graph.
  • the graph preprocessing algorithm uses graph edge sorting and converts the first intermediate preprocessing graph into a computer-readable binary graph format.
  • the graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the first intermediate preprocessing graph, and combine the distributed architecture information to generate the first partition graph. Performing distributed graph analysis according to the first division graph can obtain high-quality and high-efficiency distributed graph calculations.
  • the first intermediate preprocessing image is in a computer-readable binary image format.
  • Each box unit represents a 32-bit or 64-bit integer. Every two boxes store the starting vertex ID and ending vertex ID of the edge.
  • the graph data can be read from the machine without any communication overhead.
  • the graph preprocessing algorithm converts the first graph data into a first intermediate preprocess graph.
  • First convert the first image data into the first intermediate preprocessing image.
  • the first intermediate preprocessing graph is a computer-readable binary graph format and is expressed as an edge sequence.
  • the expression of the conversion algorithm is:
  • E ⁇ is an edge sequence E sorted by the sorting function ⁇ :E ⁇ N.
  • the expression of the graph preprocessing algorithm is:
  • V(E) is a set of vertices of edge E.
  • Fig. 2 is another flowchart of the distributed graph calculation method.
  • the calculation method shown in Fig. 2 includes at least the following steps:
  • Step S200 Obtain the second image data
  • Step S201 Compare the data of the second image with the data of the first image
  • Step S202 the second image data is the same as the first image data
  • Step S203 Obtain distributed architecture information
  • Step S204 the first intermediate preprocessing map
  • Step S205 graph partition algorithm
  • Step S206 the second division map
  • Step S207 Distributed graph analysis.
  • the first intermediate preprocessing graph is used for analysis.
  • the first intermediate preprocessing graph may be in a computer-readable binary graph format.
  • the graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the first intermediate preprocessing graph and combine the distributed architecture information to generate the second partition graph.
  • Performing distributed graph analysis according to the second division graph can obtain high-quality and high-efficiency distributed graph calculations. In the calculation process, there is no need to repeat the data preprocessing process, which improves the efficiency of the calculation.
  • Fig. 3 is another flowchart of the distributed graph calculation method.
  • the calculation method shown in Fig. 3 includes at least the following steps:
  • Step S300 the data of the second picture is different from the data of the first picture
  • Step S301 Obtain a first division map
  • Step S302 graph preprocessing algorithm
  • Step S303 the second intermediate processing diagram
  • Step S304 Obtain distributed architecture information
  • Step S305 Obtain the change data of the first image data
  • Step S306 graph partition algorithm
  • Step S307 the third division map.
  • the second intermediate preprocessing graph is obtained according to the first division graph, the change data of the first graph data and the graph preprocessing algorithm.
  • the second intermediate preprocessing image is a computer-readable binary image format.
  • the graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the second intermediate preprocessing graph and combine the distributed architecture information to generate the third partition graph.
  • the first image data includes users, products, and purchase history.
  • the users and products are represented by the vertices of the graph, and the purchase history is represented by the edges.
  • the graph preprocessing algorithm converts the first graph data into the second intermediate processing graph, so that the graph division algorithm can immediately generate high-quality divisions. After that, perform distributed graph analysis. For example, discovering user preferences and predicting products that may be purchased, so as to make corresponding recommendations. Due to the purchase history, new users and new product updates, the graph data will change periodically, so repeated analysis is required.
  • the vertex data of at least one first graph data is obtained, the priority queue is obtained according to the graph edge sorting algorithm, and the first intermediate preprocessing graph is obtained according to the breadth first search (BFS) and the priority queue .
  • BFS breadth first search
  • Priority queue sorting is required before breadth-first search.
  • the expression of the priority queue sorting on the edge of the graph is:
  • D[v] is the number of unvisited vertices of v in the breadth-first search process
  • M[v] is the order of the largest edge among the adjacent edges of v during BFS (if the edges are not already sorted, then M[v] is 0).
  • the vertices are sorted in ascending order.
  • the graph preprocessing algorithm includes an incremental graph preprocessing algorithm.
  • Incremental graph preprocessing algorithms include incremental graph edge sorting algorithms.
  • FIG 4 is a structural diagram of the graph partitioning algorithm.
  • the computing node obtains the number of broadcast edges from the distributed file system through the network and obtains the node configuration from the infrastructure. According to the number of edges and node configuration, each node finds a cross pointer to determine the starting point and ending point for dividing the graph data. The pointer is transferred to the file system via the network. After that, the distributed file system divides the edge into multiple partitions and sends these partitions back to the computing node. Efficiently forward partitions by dividing data into blocks. Finally, each node obtains the partition before starting the distributed graph calculation. In the existing method, the huge entire graph data is transmitted twice via the network. However, the method of the present application only transmits the graph data once, because it can calculate the partition using only metadata (ie, the number of edges and node configuration). Therefore, communication overhead can be saved and the work efficiency of each node in distributed graph calculation can be improved.
  • the graph partition algorithm needs to use a distributed file system, the graph partition becomes faster, node configuration information, compute nodes, calculate split pointers, and obtain partitions.
  • the graph partition algorithm obtains the forward pointer and the forward chunk through the network broadcast edge number during calculation.
  • the node configuration information includes, for example, the number of CPUs, CPU specifications, memory size, network performance, node reliability, and so on.
  • the edge sequence is divided, so that the workload of each node in the process of distributed graph calculation and analysis is balanced.
  • the graph partitioning algorithm is executed on the cloud infrastructure.
  • the computing node is a virtual machine, and the network is a virtual network.
  • Distributed file systems are usually located in different clusters or data centers. Therefore, the delay and bandwidth of the network are usually limited.
  • the algorithm obtains the node configuration of the virtual host, and the node configuration of each virtual host may be different.
  • Each node takes into account the differences in specifications, and splits the data in such a way that in the process of distributed graph analysis, the workload among the virtual hosts becomes balanced. The efficiency of moving large graph data from the file system to the virtual host is improved.
  • the computing power therein is delivered in the pay-as-you-go model, saving computing power directly reduces the payment cost of graph analysis.
  • the distributed graph computing method when used on a private cluster, the graph data only needs to be transmitted twice, which will result in a private cluster.
  • the use of this distributed graph computing method can reduce energy costs, so that the graph data can be compared. Perform a more economical analysis.
  • the distributed graph calculation method can be used in page ranking (PageRank) calculation, because more iterations can be performed, so that a more accurate ranking can be obtained.
  • PageRank page ranking
  • the distributed graph calculation method can be used in top-k type algorithms (such as top-k similarity analysis or top-k graph pattern matching), and more results can be obtained (k can be increased) .
  • the distributed graph computing method can be used in graph-based machine learning. Since the distributed graph calculation method can obtain calculation results more quickly, more time can be used in the learning phase in the process of machine learning, and the prediction task will become more accurate.
  • the distributed graph computing method enables real-time analysis and data-driven analysis. Make graph analysis more interactive.
  • FIG. 5 is a schematic diagram of the structure of the incremental graph edge sorting algorithm.
  • the incremental graph edge sorting algorithm is implemented in a distributed computing manner.
  • An embodiment of the incremental graph edge sorting algorithm uses a master-slave architecture, which includes a master computing node and a slave computing node.
  • the changed graph data is broadcast to the subordinate computing nodes.
  • each local optimal search algorithm obtains changed graph data and sorted partitions in its nodes.
  • the algorithm calculates the approximate solution of the optimization problem of the partitioned graph locally and in parallel.
  • the main computing node collects local solutions and calculates optimized solutions.
  • the optimized local solution is broadcast to the slave node, so that the slave node obtains the local optimal order with the smallest increment of the objective function.
  • the master node distributes the graph difference data to the slave nodes.
  • the slave node obtains the local optimal solution according to the partition graph preprocessed in the last iteration and the local optimal solution search algorithm, and sends the local optimal solution to the master node.
  • the master node collects the local optimal solution, it calculates the optimal solution, then uses the optimal solution to calculate the local optimal solution, and sends the local optimal solution to the slave computing node.
  • the subsequent calculation process is performed after the data is removed.
  • an incremental graph preprocessing algorithm is used for calculation.
  • the expression of the incremental graph preprocessing algorithm is:
  • the incremental graph preprocessing algorithm only processes a part of the entire graph, that is, only scans the adjacent edges of the starting vertex and the ending vertex of the new edge. Then, a new edge order is calculated to minimize the increment of the objective function.
  • Using the incremental graph preprocessing algorithm can reduce the complexity of the calculation when the first graph data is updated, thereby reducing energy consumption.
  • the present application provides a terminal for executing a distributed graph calculation method.
  • the present application provides a distributed graph computing system for executing a distributed graph computing method.
  • the present application provides a computer-readable medium for executing a distributed graph computing method.
  • the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
  • Information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .

Abstract

A distributed-type graph computation method, a terminal, a system, and a storage medium. On the basis of pre-processing and graph partitioning algorithms for graph data as well as an incremental graph edge sorting algorithm, original graph data is converted into computer-readable intermediate graph data, which allows rapid execution of subsequent graph partitioning, and high quality graph partitioning is further provided, with generated high quality partitions significantly reducing communication overhead, quickening distributed graph computation and analysis speeds.

Description

分布式图计算方法、终端、系统及存储介质Distributed graph computing method, terminal, system and storage medium 技术领域Technical field
本申请实施例涉及但不限于图计算技术领域,尤其涉及一种分布式图计算方法、终端、系统及存储介质。The embodiments of the present application relate to, but are not limited to, the field of graph computing technology, and in particular, to a distributed graph computing method, terminal, system, and storage medium.
背景技术Background technique
随着对数据分析的需求持续增长,例如深入挖掘数据关系,因此,在许多领域中,大规模图计算受到广泛关注。图(Graph)是用于表示对象之间关联关系的一种抽象数据结构,使用顶点(Vertex)和边(Edge)进行描述,顶点表示对象,边表示对象之间的关系。基于此,可抽象成用图描述的数据即为图数据。图计算,便是以图作为数据模型来表达问题并予以解决的这一过程。As the demand for data analysis continues to grow, such as in-depth mining of data relationships, large-scale graph computing has received widespread attention in many fields. Graph is an abstract data structure used to represent the association relationship between objects. It is described by using vertices (Vertex) and edges (Edge). The vertices represent objects, and the edges represent the relationships between objects. Based on this, data that can be abstracted into graphs is graph data. Graph computing is a process in which graphs are used as data models to express and solve problems.
目前,随着图的规模一直增长,分布式计算被用于分析大规模图数据。使用分布式图计算时,将大规模图分成若干个子图,通过多个从节点进行计算,可以有效地利用多个计算资源。然而,在进行分布式图计算时,高质量的分区方法在计算时耗费了大量时间,导致分区阶段的能耗较高。相对地,高速生成分区会导致低质量分区,造成严重的性能损失。At present, as the scale of graphs continues to grow, distributed computing is used to analyze large-scale graph data. When using distributed graph computing, the large-scale graph is divided into several subgraphs, and multiple slave nodes are used for calculation, which can effectively utilize multiple computing resources. However, when performing distributed graph calculations, high-quality partitioning methods consume a lot of time during the calculation, which leads to higher energy consumption in the partitioning phase. In contrast, high-speed generation of partitions will result in low-quality partitions, causing serious performance losses.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this article. This summary is not intended to limit the scope of protection of the claims.
本申请实施例提供了一种分布式图计算方法、终端、系统及存储介质,使用图数据预处理,在进行大规模图数据分析时,只传输一次图形数据,能够高质量并且高效的分割图数据,提高分布式图计算的速度,降低能耗。The embodiment of the application provides a distributed graph calculation method, terminal, system and storage medium, which uses graph data preprocessing, and when large-scale graph data analysis is performed, the graph data is only transmitted once, which can segment the graph with high quality and efficiency. Data, increase the speed of distributed graph calculation and reduce energy consumption.
第一方面,本申请实施例提供了一种分布式图计算方法,包括:In the first aspect, an embodiment of the present application provides a distributed graph calculation method, including:
获取第一图数据;Obtain the data of the first image;
根据图预处理算法及所述第一图数据,得到第一中间预处理图;Obtain a first intermediate preprocessing image according to the image preprocessing algorithm and the first image data;
根据图划分算法,分布式构架及所述第一中间预处理图,得到第一划分图;According to the graph division algorithm, the distributed architecture and the first intermediate preprocessing graph, the first division graph is obtained;
根据所述第一划分图,得到第一分布式图分析数据。According to the first division graph, first distributed graph analysis data is obtained.
具体地,所述分布式图计算方法还包括:Specifically, the distributed graph calculation method further includes:
获取第二图数据;Obtain the data of the second image;
若所述第一图数据与所述第二图数据相同,则根据所述图划分算法,所述分布式构架及所述第一中间预处理图,得到第二划分图;If the first graph data is the same as the second graph data, a second division graph is obtained according to the graph division algorithm, the distributed architecture, and the first intermediate preprocessing graph;
根据所述第二划分图,得到第二分布式图分析数据。According to the second division graph, a second distributed graph analysis data is obtained.
具体地,所述分布式图计算方法还包括:Specifically, the distributed graph calculation method further includes:
获取第二图数据;Obtain the data of the second image;
若所述第一图数据与所述第二图数据不相同,获取所述第二图数据与所述第一图数据的差异数据;If the first graph data is not the same as the second graph data, obtaining difference data between the second graph data and the first graph data;
根据所述第一划分图,所述差异数据及差异图预处理算法得到第二中间预处理图;Obtaining a second intermediate preprocessing map according to the first division map, the difference data and the difference map preprocessing algorithm;
根据所述图划分算法,所述分布式构架及所述第二中间预处理图,得到第三划分图;According to the graph division algorithm, the distributed architecture and the second intermediate preprocessing graph, a third division graph is obtained;
根据所述第三划分图,得到第三分布式图分析数据。According to the third division graph, a third distributed graph analysis data is obtained.
具体地,所述差异图预处理算法包括:Specifically, the difference map preprocessing algorithm includes:
若所述第二图数据与所述第一图数据的差异数据为增量数据,则根据所述差异数据以及增量式图预处理算法,得到第二中间预处理图。If the difference data between the second graph data and the first graph data is incremental data, a second intermediate preprocessed graph is obtained according to the difference data and the incremental graph preprocessing algorithm.
具体地,所述增量式图预处理算法还包括:Specifically, the incremental graph preprocessing algorithm further includes:
根据所述第二图数据,得到起始顶点与终止顶点间的邻边;Obtain the adjacent edges between the start vertex and the end vertex according to the second graph data;
根据增量式图边排序算法及所述起始顶点与所述终止顶点间的邻边,得到增量式图边排序。According to the incremental graph edge sorting algorithm and the adjacent edges between the start vertex and the end vertex, an incremental graph edge sort is obtained.
具体地,所述增量式图边排序算法应用于主计算节点,所述增量式图边排序算法包括:Specifically, the incremental graph edge sorting algorithm is applied to the main computing node, and the incremental graph edge sorting algorithm includes:
发送所述第二图数据至第一从属计算节点;Sending the second graph data to the first subordinate computing node;
获取所述第一从属计算节点发送的局部解;Acquiring a local solution sent by the first subordinate computing node;
根据所述局部解,得到优化解;According to the local solution, an optimized solution is obtained;
根据所述优化解,得到局部优化解;According to the optimized solution, a local optimized solution is obtained;
发送所述局部优化解至所述第一从属计算节点。Sending the local optimization solution to the first subordinate computing node.
具体地,所述差异图预处理算法还包括:Specifically, the difference map preprocessing algorithm further includes:
若所述第二图数据与所述第一图数据的差异数据为减量数据,则移除所述减量数据;If the difference data between the second graph data and the first graph data is decrement data, remove the decrement data;
根据所述第一图数据,移除所述减量数据后的第二图数据以及所述图预处理算法得到第二中间预处理图。According to the first graph data, the second graph data after removing the decremented data and the graph preprocessing algorithm obtain a second intermediate preprocessed graph.
具体地,所述分布式图预处理算法还包括预处理图边排序;Specifically, the distributed graph preprocessing algorithm further includes preprocessing graph edge sorting;
所述预处理图边排序包括:The preprocessing graph edge sorting includes:
根据所述第一图数据,得到所述第一图数据的边数据以及所述第一图数据的顶点数据;Obtaining edge data of the first graph data and vertex data of the first graph data according to the first graph data;
根据所述第一图数据的所述边数据以及所述第一图数据的所述顶点数据,得到所述第一中间预处理图。According to the edge data of the first graph data and the vertex data of the first graph data, the first intermediate preprocessed graph is obtained.
具体地,所述根据所述边数据以及所述顶点数据,得到所述第一中间预处理图包括:Specifically, the obtaining the first intermediate preprocessing graph according to the edge data and the vertex data includes:
获取所述第一图数据的第一顶点数据;Acquiring first vertex data of the first graph data;
根据所述预处理图边排序,得到优先级队列;According to the edge sorting of the preprocessing graph, the priority queue is obtained;
根据宽度优先搜索以及所述优先级队列得到所述第一中间预处理图。Obtain the first intermediate preprocessing map according to the breadth first search and the priority queue.
具体地,所述第一中间预处理图包括:Specifically, the first intermediate preprocessing diagram includes:
以二进制格式存储的边的起始顶点ID和边的终止顶点ID。The starting vertex ID of the edge and the ending vertex ID of the edge are stored in binary format.
具体地,所述图划分算法包括:Specifically, the graph division algorithm includes:
获取所述分布式构架的节点及节点配置信息,所述节点配置信息包括节点数量、节点规格、节点性能中的一种或多种;Acquiring nodes and node configuration information of the distributed architecture, where the node configuration information includes one or more of the number of nodes, node specifications, and node performance;
获取所述第一中间预处理图;Acquiring the first intermediate preprocessing map;
根据所述第一中间预处理图,得到所述第一中间预处理图的边数据;Obtaining the edge data of the first intermediate preprocessing graph according to the first intermediate preprocessing graph;
根据所述分布式构架的所述节点配置信息及所述第一中间预处理图的所述边数据,得到所述第一划分图;Obtaining the first partition graph according to the node configuration information of the distributed architecture and the edge data of the first intermediate preprocessing graph;
发送所述第一划分图至所述分布式构架的节点。Send the first partition graph to the nodes of the distributed architecture.
第二方面,本申请实施例提供了一种终端,包括:第一存储器、第一处理器及存储在第一存储器上并可在第一处理器上运行的计算机程序,所述第一处理器执行所述程序时实现:In a second aspect, an embodiment of the present application provides a terminal, including: a first memory, a first processor, and a computer program stored on the first memory and running on the first processor, the first processor Realize when executing the program:
如第一方面所述的分布式图计算方法。The distributed graph calculation method as described in the first aspect.
第三方面,本申请实施例提供了一种分布式图计算系统,包括第一分布式计算装置与第二分布式计算装置;In the third aspect, an embodiment of the present application provides a distributed graph computing system, including a first distributed computing device and a second distributed computing device;
所述第一分布式计算装置包括:第二存储器、第二处理器及存储在第二存储器上并可在第二处理器上运行的第一计算机程序;所述第二处理器执行所述第一计算机程序时实现:第一方面所述的分布式图计算方法;The first distributed computing device includes: a second memory, a second processor, and a first computer program that is stored on the second memory and can run on the second processor; the second processor executes the first computer program A computer program is implemented: the distributed graph calculation method described in the first aspect;
所述第二分布式计算装置包括:第三存储器、第三处理器及存储在第三存储器上并可在第三处理器上运行的第而计算机程序;所述第三处理器执行所述第而计算机程序时实现:第一方面所述的分布式图计算方法。The second distributed computing device includes: a third memory, a third processor, and a second computer program that is stored on the third memory and can run on the third processor; the third processor executes the first The computer program implements the distributed graph calculation method described in the first aspect.
第四方面,本申请实施例提供了计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to:
执行第一方面所述的分布式图计算方法。Perform the distributed graph calculation method described in the first aspect.
本申请实施例基于图数据的预处理、图划分算法及增量图边排序算法,将原始图数据转换为计算机可读的中间图数据,这使得随后的图划分能够快速进行,同时还提供了高质量的图划分,使生成的高质量分区大大减少了通信开销,加快了分布式图的计算和分析速度。The embodiment of the application converts the original graph data into computer-readable intermediate graph data based on the preprocessing of graph data, graph division algorithm and incremental graph edge sorting algorithm, which enables the subsequent graph division to be carried out quickly, and also provides High-quality graph partitioning greatly reduces communication overhead and speeds up the calculation and analysis of distributed graphs.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be described in the following description, and partly become obvious from the description, or understood by implementing the present application. The purpose and other advantages of the application can be realized and obtained through the structures specifically pointed out in the description, claims and drawings.
附图说明Description of the drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
图1为本申请一个实施例提供的分布式图计算方法的流程示意图;FIG. 1 is a schematic flowchart of a distributed graph calculation method provided by an embodiment of the application;
图2为本申请另一个实施例提供的分布式图计算方法的流程示意图;2 is a schematic flowchart of a distributed graph calculation method provided by another embodiment of the application;
图3为本申请另一个实施例提供的分布式图计算方法的流程示意图;FIG. 3 is a schematic flowchart of a distributed graph computing method provided by another embodiment of this application;
图4为本申请一个实施例提供的图划分算法的结构示意图;FIG. 4 is a schematic structural diagram of a graph partition algorithm provided by an embodiment of this application;
图5为本申请一个实施例提供的增量图边排序算法的结构示意图。FIG. 5 is a schematic structural diagram of an incremental graph edge sorting algorithm provided by an embodiment of the application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be performed in a different order from the module division in the device or the sequence in the flowchart. Steps shown or described. The terms "first", "second", etc. in the specification and claims and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
已知技术的分布式图计算技术中,图是一种基本且普遍存在的抽象概念,它被广泛应用于现实世界中各种问题的建模。例如,在线社交网络服务中,图中的顶点表示用户,而其边则表示用户间的友谊关系;在电子商务服务中,顶点表示用户和产品,而边表示购买历史。在现实世界中,图数据一直在自然增长,例如,世界上最大的在线社交网络服务之一已经包含了大约一万亿的好友关系。对于这样大规模的图,利用多种计算资源(如高性能计算平台,云计算等)对图进行分析并深入了解其特性是一种重要的方法(即分布式图计算)。通过分布式图计算来分析大规模图数据通常是一件耗时且开销高昂的事情。In the distributed graph computing technology of known technology, graph is a basic and ubiquitous abstract concept, which is widely used in modeling various problems in the real world. For example, in an online social network service, the vertices in the graph represent users, and the edges represent friendship relationships between users; in e-commerce services, the vertices represent users and products, and the edges represent purchase history. In the real world, graph data has been growing naturally. For example, one of the world's largest online social network services already contains about one trillion friendships. For such a large-scale graph, it is an important method to use multiple computing resources (such as high-performance computing platforms, cloud computing, etc.) to analyze the graph and to have a deep understanding of its characteristics (ie, distributed graph computing). Analyzing large-scale graph data through distributed graph computing is usually a time-consuming and expensive task.
基于此,本申请实施例提供了分布式图计算方法、终端、系统及存储介质,能够将原始图数据转换为计算机可读的中间图数据,这使得随后的图划分能够快速进行,同时还提供了高质量的图划分,使生成的高质量分区大大减少了通信开销,加快了分布式图的计算和分析速度。Based on this, the embodiments of the present application provide a distributed graph computing method, terminal, system, and storage medium, which can convert original graph data into computer-readable intermediate graph data, which enables the subsequent graph division to be carried out quickly, while also providing The high-quality graph partition is created, and the generated high-quality partition greatly reduces the communication overhead, and accelerates the calculation and analysis speed of the distributed graph.
需要说明的是,下列多种实施例中,终端可以为移动终端设备,也可以为非移动终端设备。移动终端设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载终端设备、可穿戴设备、超级移动个人计算机、上网本、数码相机、摄像机或者个人数字助理等;非移动终端设备可以为个人计算机、工作站、服务器、电视机、柜员机、自助机、监控摄像机或者枪机等。It should be noted that, in the following various embodiments, the terminal may be a mobile terminal device or a non-mobile terminal device. Mobile terminal devices can be mobile phones, tablet computers, laptops, palmtop computers, vehicle-mounted terminal devices, wearable devices, ultra-mobile personal computers, netbooks, digital cameras, video cameras or personal digital assistants, etc.; non-mobile terminal devices can be personal computers, Workstations, servers, televisions, teller machines, self-service machines, surveillance cameras or box cameras, etc.
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
本申请一实施例公开了一种分布式图计算方法。An embodiment of the present application discloses a distributed graph calculation method.
图1为分布式图计算方法的流程图,如图1所示的计算方法,至少包括以下步骤:Fig. 1 is a flowchart of a distributed graph calculation method. The calculation method shown in Fig. 1 at least includes the following steps:
步骤S100:获取第一图数据;Step S100: Obtain the data of the first image;
步骤S101:图预处理算法;Step S101: graph preprocessing algorithm;
步骤S102:获取第一中间预处理图;Step S102: Obtain a first intermediate preprocessing map;
步骤S103:获取分布式构架信息;Step S103: Obtain distributed architecture information;
步骤S104:图划分算法;Step S104: graph partition algorithm;
步骤S105:第一划分图;Step S105: the first division map;
步骤S106:分布式图分析。Step S106: Distributed graph analysis.
在一实施例中,获取第一图数据后,经过图预处理算法,形成第一中间预处理图。图预处理算法使用图边排序,将第一中间预处理图为计算机可读的二进制图格式。应用图划分算法跳过图元素的冗余数据置乱来利用第一中间预处理图,并结合分布式架构信息,生成第一划分图。根据第一划分图进行分布式图分析,可以获得高质量与高效率的分布式图的计算。In one embodiment, after obtaining the first graph data, the graph preprocessing algorithm is used to form the first intermediate preprocessing graph. The graph preprocessing algorithm uses graph edge sorting and converts the first intermediate preprocessing graph into a computer-readable binary graph format. The graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the first intermediate preprocessing graph, and combine the distributed architecture information to generate the first partition graph. Performing distributed graph analysis according to the first division graph can obtain high-quality and high-efficiency distributed graph calculations.
在一实施例中,第一中间预处理图为计算机可读的二进制图格式。通过使用连续二进制格式来连续存储结果。每个框单位表示32位或64位整数。每两个框存储边的起始顶点ID和终止顶点ID。图数据可以从机器中读取,而无需任何通信开销。In an embodiment, the first intermediate preprocessing image is in a computer-readable binary image format. Store the results continuously by using a continuous binary format. Each box unit represents a 32-bit or 64-bit integer. Every two boxes store the starting vertex ID and ending vertex ID of the edge. The graph data can be read from the machine without any communication overhead.
在一实施例中,图预处理算法将第一图数据转换成第一中间预处理图。首先将第一图数据转换成第一中间预处理图。第一中间预处理图为计算机可读的二进制图格式,并表示为一个边序列。转换算法的表达式为:In one embodiment, the graph preprocessing algorithm converts the first graph data into a first intermediate preprocess graph. First, convert the first image data into the first intermediate preprocessing image. The first intermediate preprocessing graph is a computer-readable binary graph format and is expressed as an edge sequence. The expression of the conversion algorithm is:
{E φ[0],E φ[1],…E φ[|E|-1]}, {E φ [0],E φ [1],…E φ [|E|-1]},
其中E φ是一个由排序函数φ:E→N排序后的边序列E。 Where E φ is an edge sequence E sorted by the sorting function φ:E→N.
在一实施例中,图预处理算法的表达式为:In an embodiment, the expression of the graph preprocessing algorithm is:
Figure PCTCN2020090238-appb-000001
Figure PCTCN2020090238-appb-000001
其中V(E)是边E的一组顶点。Where V(E) is a set of vertices of edge E.
图2为分布式图计算方法的另一流程图,如图2所示的计算方法,至少包括以下步骤:Fig. 2 is another flowchart of the distributed graph calculation method. The calculation method shown in Fig. 2 includes at least the following steps:
步骤S200:获取第二图数据;Step S200: Obtain the second image data;
步骤S201:比较第二图数据与第一图数据;Step S201: Compare the data of the second image with the data of the first image;
步骤S202:第二图数据与第一图数据相同;Step S202: the second image data is the same as the first image data;
步骤S203:获取分布式构架信息;Step S203: Obtain distributed architecture information;
步骤S204:第一中间预处理图;Step S204: the first intermediate preprocessing map;
步骤S205:图划分算法;Step S205: graph partition algorithm;
步骤S206:第二划分图;Step S206: the second division map;
步骤S207:分布式图分析。Step S207: Distributed graph analysis.
在一实施例中,获取第二图数据后,与第一图数据进行比较。在第二图数据与第一图数据相同时,应用第一中间预处理图进行分析。第一中间预处理图可以为计算机可读的二进制图格式。应用图划分算法跳过图元素的冗余数据置乱来利用第一中间预处理图,并结合分布式架构信息,生成第二划分图。根据第二划分图进行分布式图分析,可以获得高质量与高效率的分布式图的计算。在计算过程中,不需要重复进行数据的预处理过程,提高了计算时的效率。In one embodiment, after obtaining the data of the second graph, it is compared with the data of the first graph. When the data in the second graph is the same as the data in the first graph, the first intermediate preprocessing graph is used for analysis. The first intermediate preprocessing graph may be in a computer-readable binary graph format. The graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the first intermediate preprocessing graph and combine the distributed architecture information to generate the second partition graph. Performing distributed graph analysis according to the second division graph can obtain high-quality and high-efficiency distributed graph calculations. In the calculation process, there is no need to repeat the data preprocessing process, which improves the efficiency of the calculation.
图3为分布式图计算方法的另一流程图,如图3所示的计算方法,至少包括以下步骤:Fig. 3 is another flowchart of the distributed graph calculation method. The calculation method shown in Fig. 3 includes at least the following steps:
步骤S300:第二图数据与第一图数据不相同;Step S300: the data of the second picture is different from the data of the first picture;
步骤S301:获取第一划分图;Step S301: Obtain a first division map;
步骤S302:图预处理算法;Step S302: graph preprocessing algorithm;
步骤S303:第二中间处理图;Step S303: the second intermediate processing diagram;
步骤S304:获取分布式构架信息;Step S304: Obtain distributed architecture information;
步骤S305:获取第一图数据的变化数据;Step S305: Obtain the change data of the first image data;
步骤S306:图划分算法;Step S306: graph partition algorithm;
步骤S307:第三划分图。Step S307: the third division map.
在一实施例中,若第一图数据与第二图数据不相同,则根据第一划分图,第一图数据的变化数据及图预处理算法得到第二中间预处理图。第二中间预处理图为计算机可读的二进制图格式。应用图划分算法跳过图元素的冗余数据置乱来利用第二中间预处理图,并结合分布式架构信息,生成第三划分图。In one embodiment, if the first graph data is different from the second graph data, the second intermediate preprocessing graph is obtained according to the first division graph, the change data of the first graph data and the graph preprocessing algorithm. The second intermediate preprocessing image is a computer-readable binary image format. The graph partition algorithm is applied to skip the redundant data scrambling of the graph elements to use the second intermediate preprocessing graph and combine the distributed architecture information to generate the third partition graph.
在一实施例中,以电子商务推荐系统为例,第一图数据包括用户、产品和购买历史。用户和产品由图的顶点表示,而购买历史则由边表示。图预处理算法将第一图数据转换成第二中间处理图,从而使得图划分算法能够立即生成高质量的划分。之后,执行分布式图分析。例如,发现用户偏好并预测可能会购买的产品,从而进行相应的推荐。由于购买历史、新用户和新产品的更新,图数据会发生周期性改变,因此需要进行重复的分析。In an embodiment, taking an e-commerce recommendation system as an example, the first image data includes users, products, and purchase history. The users and products are represented by the vertices of the graph, and the purchase history is represented by the edges. The graph preprocessing algorithm converts the first graph data into the second intermediate processing graph, so that the graph division algorithm can immediately generate high-quality divisions. After that, perform distributed graph analysis. For example, discovering user preferences and predicting products that may be purchased, so as to make corresponding recommendations. Due to the purchase history, new users and new product updates, the graph data will change periodically, so repeated analysis is required.
在一实施例中,获取至少一个第一图数据的顶点数据,根据图边排序算法,得到优先级队列,根据宽度优先搜索(Breadth First Search,BFS)以及优先级队列得到第一中间预处理图。在进行宽度优先搜索前需进行优先级队列排序。图边优先级队列排序的表达式为:In one embodiment, the vertex data of at least one first graph data is obtained, the priority queue is obtained according to the graph edge sorting algorithm, and the first intermediate preprocessing graph is obtained according to the breadth first search (BFS) and the priority queue . Priority queue sorting is required before breadth-first search. The expression of the priority queue sorting on the edge of the graph is:
p(v):=|E|·D[v]-M[v],p(v):=|E|·D[v]-M[v],
其中,D[v]是v在宽度优先搜索过程中未访问顶点的数量;M[v]是BFS期间v的相邻边中的最大边的排序(如果边尚未排序,则M[v]为0)。基于p,顶点按升序排序。Among them, D[v] is the number of unvisited vertices of v in the breadth-first search process; M[v] is the order of the largest edge among the adjacent edges of v during BFS (if the edges are not already sorted, then M[v] is 0). Based on p, the vertices are sorted in ascending order.
在一实施例中,图预处理算法包括增量图预处理算法。增量式图预处理算法包括增量式图边排序算法。In an embodiment, the graph preprocessing algorithm includes an incremental graph preprocessing algorithm. Incremental graph preprocessing algorithms include incremental graph edge sorting algorithms.
图4为图划分算法的结构图。如图4所示,计算节点通过网络从分布式文件系统获得广播的边数并从基础架构获得节点配置。根据边数和节点配置,每个节点发现叉式指针以确定用于划分图数据的起始点和结束点。指针通过网络传送到文件系统。之后,分布式文件系统将边分割成多个分区,并将这些分区发回计算节点。通过对数据进行分块来高效转发分区。最后,每个节点在开始分布式图计算之前获得分区。现有方法中,巨大的整个图数据经由网络传送两次。而本申请的方法只传输一次图形数据,因为它可以仅使用元数据(即边数和节点配置)来计算分区。因此,可以节省通信开销,提升分布式图计算时每个节点的工作效率。Figure 4 is a structural diagram of the graph partitioning algorithm. As shown in Figure 4, the computing node obtains the number of broadcast edges from the distributed file system through the network and obtains the node configuration from the infrastructure. According to the number of edges and node configuration, each node finds a cross pointer to determine the starting point and ending point for dividing the graph data. The pointer is transferred to the file system via the network. After that, the distributed file system divides the edge into multiple partitions and sends these partitions back to the computing node. Efficiently forward partitions by dividing data into blocks. Finally, each node obtains the partition before starting the distributed graph calculation. In the existing method, the huge entire graph data is transmitted twice via the network. However, the method of the present application only transmits the graph data once, because it can calculate the partition using only metadata (ie, the number of edges and node configuration). Therefore, communication overhead can be saved and the work efficiency of each node in distributed graph calculation can be improved.
在一实施例中,图划分算法需使用分布式文件系统、图的分区变快、节点配置信息、计算节点、计算分裂指针及获取分区。图划分算法在计算时通过网络广播边数获取前向指针及前向组块。In one embodiment, the graph partition algorithm needs to use a distributed file system, the graph partition becomes faster, node configuration information, compute nodes, calculate split pointers, and obtain partitions. The graph partition algorithm obtains the forward pointer and the forward chunk through the network broadcast edge number during calculation.
在一实施例中,节点配置信息包括如CPU数量,CPU规格,内存大小,网络性能,节点可靠性等。In an embodiment, the node configuration information includes, for example, the number of CPUs, CPU specifications, memory size, network performance, node reliability, and so on.
在一实施例中,通过计算拆分顶点,对边序列进行划分,使分布式图计算和分析过程中的各个节点工作量达到平衡。In an embodiment, by calculating split vertices, the edge sequence is divided, so that the workload of each node in the process of distributed graph calculation and analysis is balanced.
在一实施例中,在云基础架构之上执行图划分算法。计算节点是虚拟机,网络是虚拟网络。分布式文件系统通常位于不同的集群或数据中心。因此,网络的延迟和带宽通常是有限的。该算法获得虚拟主机的节点配置,每个虚拟主机的节点配置可能不同。每个节点都考虑到规范的差异,并以这样一种方式对数据进行拆分,即在分布式图分析过程中,各虚拟主机间的工作负载变得均衡。大型图数 据从文件系统到虚拟主机的移动的效率提升。In one embodiment, the graph partitioning algorithm is executed on the cloud infrastructure. The computing node is a virtual machine, and the network is a virtual network. Distributed file systems are usually located in different clusters or data centers. Therefore, the delay and bandwidth of the network are usually limited. The algorithm obtains the node configuration of the virtual host, and the node configuration of each virtual host may be different. Each node takes into account the differences in specifications, and splits the data in such a way that in the process of distributed graph analysis, the workload among the virtual hosts becomes balanced. The efficiency of moving large graph data from the file system to the virtual host is improved.
在一实施例中,当本分布式图计算方法部署在公有云上时,其中的计算能力是在按需付费模型中交付的,节省计算能力直接降低了图分析的支付成本。In one embodiment, when the distributed graph computing method is deployed on a public cloud, the computing power therein is delivered in the pay-as-you-go model, saving computing power directly reduces the payment cost of graph analysis.
在一实施例中,当在私有集群上使用分布式图计算方法时,因图数据仅需要传输两次,会造成私有集群,使用本分布式图计算方法可以降低能源成本,从而可对图数据进行更经济的分析。In one embodiment, when the distributed graph computing method is used on a private cluster, the graph data only needs to be transmitted twice, which will result in a private cluster. The use of this distributed graph computing method can reduce energy costs, so that the graph data can be compared. Perform a more economical analysis.
在一实施例中,本分布式图计算方法可以用于在网页排名(PageRank)计算中,因为可以进行更多迭代,从而可以获得更精确的排名。In an embodiment, the distributed graph calculation method can be used in page ranking (PageRank) calculation, because more iterations can be performed, so that a more accurate ranking can be obtained.
在一实施例中,本分布式图计算方法可以用于top-k类型的算法中(如top-k相似性分析或top-k图模式匹配),可以得到更多的结果(k可以增加)。In an embodiment, the distributed graph calculation method can be used in top-k type algorithms (such as top-k similarity analysis or top-k graph pattern matching), and more results can be obtained (k can be increased) .
在一实施例中,本分布式图计算方法可以用于基于图的机器学习中。由于本分布式图计算方法可以更加快速的获得计算结果,在机器学习的过程中可以将更多时间用于学习阶段,预测任务会变得更加准确。In an embodiment, the distributed graph computing method can be used in graph-based machine learning. Since the distributed graph calculation method can obtain calculation results more quickly, more time can be used in the learning phase in the process of machine learning, and the prediction task will become more accurate.
在一实施例中,本分布式图计算方法使可以进行实时分析和数据驱动分析。使得图分析具有更强的交互性。In one embodiment, the distributed graph computing method enables real-time analysis and data-driven analysis. Make graph analysis more interactive.
图5为增量图边排序算法的结构示意图。如图5所示,增量式图边排序算法以分布式计算方式实现。增量图边排序算法的一个实施例使用主从架构,即包含主计算节点与从计算节点。首先,改变后的图数据被广播到从属计算节点。第二,每个局部最优搜索算法在其节点中获取改变的图数据和已排序的分区。该算法局部地并以并行的方式计算已分区图的优化问题的近似解。第三,主计算节点收集局部解并计算优化解。最后,将优化局部解广播到从节点,使从节点获得目标函数的增量最小的局部最佳排序。Figure 5 is a schematic diagram of the structure of the incremental graph edge sorting algorithm. As shown in Figure 5, the incremental graph edge sorting algorithm is implemented in a distributed computing manner. An embodiment of the incremental graph edge sorting algorithm uses a master-slave architecture, which includes a master computing node and a slave computing node. First, the changed graph data is broadcast to the subordinate computing nodes. Second, each local optimal search algorithm obtains changed graph data and sorted partitions in its nodes. The algorithm calculates the approximate solution of the optimization problem of the partitioned graph locally and in parallel. Third, the main computing node collects local solutions and calculates optimized solutions. Finally, the optimized local solution is broadcast to the slave node, so that the slave node obtains the local optimal order with the smallest increment of the objective function.
在一实施例中获取图差异数据后,主节点将图差异数据分发至从节点。从节点根据最后一迭代中预处理的划分图,及局部最优解搜索算法,获得局部最优解,将局部最优解发送至主节点。主节点在收集到局部最优解后,计算得出最优解,再用最优解计算得出局部最优解,并将局部最优解发送至从计算节点。After obtaining the graph difference data in an embodiment, the master node distributes the graph difference data to the slave nodes. The slave node obtains the local optimal solution according to the partition graph preprocessed in the last iteration and the local optimal solution search algorithm, and sends the local optimal solution to the master node. After the master node collects the local optimal solution, it calculates the optimal solution, then uses the optimal solution to calculate the local optimal solution, and sends the local optimal solution to the slave computing node.
在一实施例中,若第一图数据的变化数据为数据移除,则移除数据后进行后续计算过程。In one embodiment, if the change data of the data in the first image is data removal, the subsequent calculation process is performed after the data is removed.
在一实施例中,若第一图数据的变化数据为数据增加,则使用增量式图预处理算法进行计算。增量式图预处理算法的表达式为:In one embodiment, if the change data of the first graph data is data increase, an incremental graph preprocessing algorithm is used for calculation. The expression of the incremental graph preprocessing algorithm is:
Figure PCTCN2020090238-appb-000002
Figure PCTCN2020090238-appb-000002
其中:
Figure PCTCN2020090238-appb-000003
in:
Figure PCTCN2020090238-appb-000003
由于图的数据量较大,从头计算新排序耗时较长,增量式图预处理算法只处理了整个图的一部分,即只扫描新边的起始顶点和终止顶点的邻边。然后,计算一个新的边排序,使目标函数的增量最小。使用增量式图预处理算法可以在第一图数据更新时,降低计算的复杂度,进而减少能耗。Due to the large amount of data in the graph, it takes a long time to calculate a new sort from scratch. The incremental graph preprocessing algorithm only processes a part of the entire graph, that is, only scans the adjacent edges of the starting vertex and the ending vertex of the new edge. Then, a new edge order is calculated to minimize the increment of the objective function. Using the incremental graph preprocessing algorithm can reduce the complexity of the calculation when the first graph data is updated, thereby reducing energy consumption.
在一实施例中,本申请提供了一种终端,用于执行分布式图计算方法。In an embodiment, the present application provides a terminal for executing a distributed graph calculation method.
在一实施例中,本申请提供了一种分布式图计算系统,用于执行分布式图计算方法。In an embodiment, the present application provides a distributed graph computing system for executing a distributed graph computing method.
在一实施例中,本申请提供了一种计算机可读介质,用于执行分布式图计算方法。In an embodiment, the present application provides a computer-readable medium for executing a distributed graph computing method.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信 号中的其他数据,并且可包括任何信息递送介质。A person of ordinary skill in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or non-transitory medium) and a communication medium (or transitory medium). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a detailed description of the preferred implementation of the application, but the application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without departing from the spirit of the application. Equivalent modifications or replacements are all included in the scope defined by the claims of this application.

Claims (14)

  1. 一种分布式图计算方法,包括:A distributed graph calculation method, including:
    获取第一图数据;Obtain the data of the first image;
    根据图预处理算法及所述第一图数据,得到第一中间预处理图;Obtain a first intermediate preprocessing image according to the image preprocessing algorithm and the first image data;
    根据图划分算法,分布式构架及所述第一中间预处理图,得到第一划分图;According to the graph division algorithm, the distributed architecture and the first intermediate preprocessing graph, the first division graph is obtained;
    根据所述第一划分图,得到第一分布式图分析数据。According to the first division graph, first distributed graph analysis data is obtained.
  2. 根据权利要求1所述的分布式图计算方法,其特征在于,还包括:The distributed graph calculation method according to claim 1, further comprising:
    获取第二图数据;Obtain the data of the second image;
    若所述第一图数据与所述第二图数据相同,则根据所述图划分算法,所述分布式构架及所述第一中间预处理图,得到第二划分图;If the first graph data is the same as the second graph data, a second division graph is obtained according to the graph division algorithm, the distributed architecture, and the first intermediate preprocessing graph;
    根据所述第二划分图,得到第二分布式图分析数据。According to the second division graph, a second distributed graph analysis data is obtained.
  3. 根据权利要求1所述的分布式图计算方法,其特征在于,还包括:The distributed graph calculation method according to claim 1, further comprising:
    获取第二图数据;Obtain the data of the second image;
    若所述第一图数据与所述第二图数据不相同,获取所述第二图数据与所述第一图数据的差异数据;If the first graph data is not the same as the second graph data, obtaining difference data between the second graph data and the first graph data;
    根据所述第一划分图,所述差异数据及差异图预处理算法得到第二中间预处理图;Obtaining a second intermediate preprocessing map according to the first division map, the difference data and the difference map preprocessing algorithm;
    根据所述图划分算法,所述分布式构架及所述第二中间预处理图,得到第三划分图;According to the graph division algorithm, the distributed architecture and the second intermediate preprocessing graph, a third division graph is obtained;
    根据所述第三划分图,得到第三分布式图分析数据。According to the third division graph, a third distributed graph analysis data is obtained.
  4. 根据权利要求3所述的分布式图计算方法,其特征在于,所述差异图预处理算法包括:The distributed graph calculation method according to claim 3, wherein the difference graph preprocessing algorithm comprises:
    若所述第二图数据与所述第一图数据的差异数据为增量数据,则根据所述差异数据以及增量式图预处理算法,得到第二中间预处理图。If the difference data between the second graph data and the first graph data is incremental data, a second intermediate preprocessed graph is obtained according to the difference data and the incremental graph preprocessing algorithm.
  5. 根据权利要求4所述的分布式图计算方法,其特征在于,所述增量式图预处理算法还包括:The distributed graph calculation method according to claim 4, wherein the incremental graph preprocessing algorithm further comprises:
    根据所述第二图数据,得到起始顶点与终止顶点间的邻边;Obtain the adjacent edges between the start vertex and the end vertex according to the second graph data;
    根据增量式图边排序算法及所述起始顶点与所述终止顶点间的邻边,得到增量式图边排序。According to the incremental graph edge sorting algorithm and the adjacent edges between the start vertex and the end vertex, an incremental graph edge sort is obtained.
  6. 根据权利要求5所述的分布式图计算方法,其特征在于,所述增量式图边排序算法应用于主计算节点,所述增量式图边排序算法包括:The distributed graph computing method according to claim 5, wherein the incremental graph edge sorting algorithm is applied to the main computing node, and the incremental graph edge sorting algorithm comprises:
    发送所述第二图数据至第一从属计算节点;Sending the second graph data to the first subordinate computing node;
    获取所述第一从属计算节点发送的局部解;Acquiring a local solution sent by the first subordinate computing node;
    根据所述局部解,得到优化解;According to the local solution, an optimized solution is obtained;
    根据所述优化解,得到局部优化解;According to the optimized solution, a local optimized solution is obtained;
    发送所述局部优化解至所述第一从属计算节点。Sending the local optimization solution to the first subordinate computing node.
  7. 根据权利要求3所述的分布式图计算方法,其特征在于,所述差异图预处理算法还包括:The distributed graph calculation method according to claim 3, wherein the difference graph preprocessing algorithm further comprises:
    若所述第二图数据与所述第一图数据的差异数据为减量数据,则移除所述减量数据;If the difference data between the second graph data and the first graph data is decrement data, remove the decrement data;
    根据所述第一图数据,移除所述减量数据后的第二图数据以及所述图预处理算法得到第二中间预处理图。According to the first graph data, the second graph data after removing the decremented data and the graph preprocessing algorithm obtain a second intermediate preprocessed graph.
  8. 根据权利要求1至7任一项所述的分布式图计算方法,其特征在于,所述分布式图预处理算法还包括预处理图边排序;The distributed graph computing method according to any one of claims 1 to 7, wherein the distributed graph preprocessing algorithm further comprises preprocessing graph edge sorting;
    所述预处理图边排序包括:The preprocessing graph edge sorting includes:
    根据所述第一图数据,得到所述第一图数据的边数据以及所述第一图数据的顶点数据;Obtaining edge data of the first graph data and vertex data of the first graph data according to the first graph data;
    根据所述第一图数据的所述边数据以及所述第一图数据的所述顶点数据,得到所述第一中间预处理图。According to the edge data of the first graph data and the vertex data of the first graph data, the first intermediate preprocessed graph is obtained.
  9. 根据权利要求8所述的分布式图计算方法,其特征在于,所述根据所述边数据以及所述顶点数据,得到所述第一中间预处理图包括:The distributed graph computing method according to claim 8, wherein the obtaining the first intermediate preprocessing graph according to the edge data and the vertex data comprises:
    获取所述第一图数据的第一顶点数据;Acquiring first vertex data of the first graph data;
    根据所述预处理图边排序,得到优先级队列;According to the edge sorting of the preprocessing graph, the priority queue is obtained;
    根据宽度优先搜索以及所述优先级队列得到所述第一中间预处理图。Obtain the first intermediate preprocessing map according to the breadth first search and the priority queue.
  10. 根据权利要求1至7任一项所述的分布式图计算方法,其特征在于,所述第一中间预处理图包括:The distributed graph calculation method according to any one of claims 1 to 7, wherein the first intermediate preprocessing graph comprises:
    以二进制格式存储的边的起始顶点ID和边的终止顶点ID。The starting vertex ID of the edge and the ending vertex ID of the edge are stored in binary format.
  11. 根据权利要求1至7任一项所述的分布式图计算方法,其特征在于,所述图 划分算法包括:The distributed graph calculation method according to any one of claims 1 to 7, wherein the graph division algorithm comprises:
    获取所述分布式构架的节点及节点配置信息,所述节点配置信息包括节点数量、节点规格、节点性能中的一种或多种;Acquiring nodes and node configuration information of the distributed architecture, where the node configuration information includes one or more of the number of nodes, node specifications, and node performance;
    获取所述第一中间预处理图;Acquiring the first intermediate preprocessing map;
    根据所述第一中间预处理图,得到所述第一中间预处理图的边数据;Obtaining the edge data of the first intermediate preprocessing graph according to the first intermediate preprocessing graph;
    根据所述分布式构架的所述节点配置信息及所述第一中间预处理图的所述边数据,得到所述第一划分图;Obtaining the first partition graph according to the node configuration information of the distributed architecture and the edge data of the first intermediate preprocessing graph;
    发送所述第一划分图至所述分布式构架的节点。Send the first partition graph to the nodes of the distributed architecture.
  12. 一种终端,包括:第一存储器、第一处理器及存储在第一存储器上并可在第一处理器上运行的计算机程序,所述第一处理器执行所述程序时实现:A terminal includes: a first memory, a first processor, and a computer program that is stored on the first memory and can run on the first processor, and when the first processor executes the program:
    如权利要求1至11中任一项所述的分布式图计算方法。The distributed graph computing method according to any one of claims 1 to 11.
  13. 一种分布式图计算系统,包括第一分布式计算装置与第二分布式计算装置;A distributed graph computing system, including a first distributed computing device and a second distributed computing device;
    所述第一分布式计算装置包括:第二存储器、第二处理器及存储在第二存储器上并可在第二处理器上运行的第一计算机程序;所述第二处理器执行所述第一计算机程序时实现:如权利要求1至11中任一项所述的分布式图计算方法;The first distributed computing device includes: a second memory, a second processor, and a first computer program that is stored on the second memory and can run on the second processor; the second processor executes the first computer program A computer program is implemented: the distributed graph calculation method according to any one of claims 1 to 11;
    所述第二分布式计算装置包括:第三存储器、第三处理器及存储在第三存储器上并可在第三处理器上运行的第而计算机程序;所述第三处理器执行所述第而计算机程序时实现:如权利要求1至11中任一项所述的分布式图计算方法。The second distributed computing device includes: a third memory, a third processor, and a second computer program that is stored on the third memory and can run on the third processor; the third processor executes the first The computer program realizes: the distributed graph computing method according to any one of claims 1 to 11.
  14. 计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于:A computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to:
    执行权利要求1至11中任一项所述的分布式图计算方法。The distributed graph calculation method according to any one of claims 1 to 11 is executed.
PCT/CN2020/090238 2020-04-16 2020-05-14 Distributed-type graph computation method, terminal, system, and storage medium WO2021208174A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010298360.X 2020-04-16
CN202010298360.XA CN111581443B (en) 2020-04-16 2020-04-16 Distributed graph calculation method, terminal, system and storage medium

Publications (1)

Publication Number Publication Date
WO2021208174A1 true WO2021208174A1 (en) 2021-10-21

Family

ID=72122402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090238 WO2021208174A1 (en) 2020-04-16 2020-05-14 Distributed-type graph computation method, terminal, system, and storage medium

Country Status (2)

Country Link
CN (1) CN111581443B (en)
WO (1) WO2021208174A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326125B (en) * 2021-05-20 2023-03-24 清华大学 Large-scale distributed graph calculation end-to-end acceleration method and device
CN117251380B (en) * 2023-11-10 2024-03-19 中国人民解放军国防科技大学 Priority asynchronous scheduling method and system for monotone flow chart

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631878A (en) * 2013-11-08 2014-03-12 中国科学院深圳先进技术研究院 Method, device and system for processing massive data of graph structure
CN106250563A (en) * 2016-08-30 2016-12-21 江苏名通信息科技有限公司 K bisimulation computational algorithm based on GPS platform
WO2017076296A1 (en) * 2015-11-03 2017-05-11 华为技术有限公司 Method and device for processing graph data
CN106777351A (en) * 2017-01-17 2017-05-31 中国人民解放军国防科学技术大学 Computing system and its method are stored based on ART tree distributed systems figure
WO2018137346A1 (en) * 2017-01-26 2018-08-02 华为技术有限公司 Graph data processing method and apparatus
CN110619595A (en) * 2019-09-17 2019-12-27 华中科技大学 Graph calculation optimization method based on interconnection of multiple FPGA accelerators

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411679B (en) * 2010-09-26 2014-04-16 中国科学院计算技术研究所 Large-scale distributed parallel acceleration method and system for protein identification
CN102411666B (en) * 2010-09-26 2014-04-16 中国科学院计算技术研究所 Large-scale distributed parallel acceleration method and system for protein identification
CN108683738B (en) * 2018-05-16 2020-08-14 腾讯科技(深圳)有限公司 Graph data processing method and graph data calculation task issuing method
CN108804226B (en) * 2018-05-28 2021-09-03 中国人民解放军国防科技大学 Graph segmentation and division method for distributed graph computation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631878A (en) * 2013-11-08 2014-03-12 中国科学院深圳先进技术研究院 Method, device and system for processing massive data of graph structure
WO2017076296A1 (en) * 2015-11-03 2017-05-11 华为技术有限公司 Method and device for processing graph data
CN106250563A (en) * 2016-08-30 2016-12-21 江苏名通信息科技有限公司 K bisimulation computational algorithm based on GPS platform
CN106777351A (en) * 2017-01-17 2017-05-31 中国人民解放军国防科学技术大学 Computing system and its method are stored based on ART tree distributed systems figure
WO2018137346A1 (en) * 2017-01-26 2018-08-02 华为技术有限公司 Graph data processing method and apparatus
CN110619595A (en) * 2019-09-17 2019-12-27 华中科技大学 Graph calculation optimization method based on interconnection of multiple FPGA accelerators

Also Published As

Publication number Publication date
CN111581443B (en) 2023-05-30
CN111581443A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
US8959138B2 (en) Distributed data scalable adaptive map-reduce framework
CN102307206B (en) Caching system and caching method for rapidly accessing virtual machine images based on cloud storage
US20160378809A1 (en) Massive time series correlation similarity computation
WO2021208174A1 (en) Distributed-type graph computation method, terminal, system, and storage medium
WO2022142859A1 (en) Data processing method and apparatus, computer readable medium, and electronic device
CN111815738B (en) Method and device for constructing map
US20170139913A1 (en) Method and system for data assignment in a distributed system
CN110659278A (en) Graph data distributed processing system based on CPU-GPU heterogeneous architecture
CN107341210B (en) C-DBSCAN-K clustering algorithm under Hadoop platform
US11221890B2 (en) Systems and methods for dynamic partitioning in distributed environments
US10162830B2 (en) Systems and methods for dynamic partitioning in distributed environments
CN106599190A (en) Dynamic Skyline query method based on cloud computing
CN106599189A (en) Dynamic Skyline inquiry device based on cloud computing
CN106202152B (en) A kind of data processing method and system of cloud platform
CN108052535B (en) Visual feature parallel rapid matching method and system based on multiprocessor platform
WO2021027745A1 (en) Graph reconstruction method and apparatus
CN109254844B (en) Triangle calculation method of large-scale graph
Adam et al. Big data management and analysis
WO2023083241A1 (en) Graph data division
CN110309367B (en) Information classification method, information processing method and device
CN102637200A (en) Method for distributing multi-level associated data to same node of cluster
CN109947530B (en) Multi-dimensional virtual machine mapping method for cloud platform
US20210103478A1 (en) Systems and methods for dynamic partitioning in distributed environments
Khan et al. BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU
CN104899073A (en) Distributed data processing method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931149

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931149

Country of ref document: EP

Kind code of ref document: A1