WO2022082860A1 - Lightweight and efficient graph vertex rearrangement method - Google Patents

Lightweight and efficient graph vertex rearrangement method Download PDF

Info

Publication number
WO2022082860A1
WO2022082860A1 PCT/CN2020/125962 CN2020125962W WO2022082860A1 WO 2022082860 A1 WO2022082860 A1 WO 2022082860A1 CN 2020125962 W CN2020125962 W CN 2020125962W WO 2022082860 A1 WO2022082860 A1 WO 2022082860A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertex
vertices
graph
seed
new
Prior art date
Application number
PCT/CN2020/125962
Other languages
French (fr)
Chinese (zh)
Inventor
刘志丹
黄保福
伍楷舜
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2022082860A1 publication Critical patent/WO2022082860A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Definitions

  • the invention relates to the field of graph computing, and more particularly, to a lightweight and efficient graph vertex rearrangement method.
  • the above graph calculation preprocessing method specifically refers to a method for reassigning graph vertex IDs.
  • Graph calculation mainly includes two stages: neighbor information acquisition and local calculation.
  • a vertex needs to acquire the information of its neighbors. Specifically, it obtains the ID of its neighbor through the edge table, and then accesses the data corresponding to the ID.
  • arrays are often used to store the data of all vertices in order.
  • the ID of the vertex corresponds to the subscript of the array, and the data of the vertex can be directly accessed through ⁇ array+ID>.
  • the access to the elements of the array is divided into sequential access and irregular access, both of which are quite different in terms of data access efficiency.
  • the sequential access method can make good use of the principle of spatial locality to improve the hit rate of the cache, thereby speeding up the calculation speed. Therefore, the access to the array should be implemented in order as much as possible.
  • frequently accessed elements in the array should be resident in the cache to obtain more time locality benefits of data access and speed up computation.
  • the time-consuming ID rearrangement method is represented by Gorder (H.Wei, J.X.Yu, C.Lu, and X.Lin.Speedup graph processing by graph ordering.In ACM SIGMOD, 2016.).
  • Gorder gives through analysis that structurally adjacent vertices should have similar IDs, that is, vertices with many common neighbors, their IDs should be similar.
  • the formal description is as Equation 1. The first part represents the number of common neighbors of vertex u and vertex v, and the value is the total number of common neighbors. The second part represents whether u and v are neighbor vertices, and the value is 1 or 0.
  • Gorder pursues the maximization of F( ⁇ ) when assigning new IDs to vertices, and a more efficient rearrangement result can be obtained, where F( ⁇ ) is calculated by Equation 2. But Gorder also pointed out that minimizing F( ⁇ ) is an NP-hard problem. Using this method directly for rearrangement on large-scale graph data requires a long time for rearrangement, and the rearrangement operation is inefficient. It is suitable for practical applications, so Gorder proposes an approximate calculation method for rearrangement. Its rearrangement idea is consistent with maximizing F( ⁇ ), so its rearrangement effect is also efficient, but its rearrangement operation is still low. effective.
  • DBG P.Faldu, J.Diamond, and B.Grot.A closer look at lightweight graph reordering.In IEEE IISWC, 2019.
  • DBG considers the degree of vertices when assigning new IDs to vertices, and assigns new IDs to vertices through the clustering method, so that frequently visited vertices have similar IDs, and theoretically higher locality benefits can be obtained.
  • this type of method cannot guarantee its rearrangement effect. Because the scale of the actual application graph is large, there are relatively many vertices accessed frequently, and the cache capacity of the computer is limited, which cannot accommodate all the vertex information of high frequency access at the same time. If the frequently accessed vertices are far apart in the graph structure, and the adjacent IDs are incorrectly assigned to such frequently accessed vertices, the limitation of cache capacity will cause many high-frequency vertices to be accessed to be replaced. cache, which reduces the cache hit rate and affects computing efficiency.
  • the present invention provides a lightweight and high-efficiency graph vertex rearrangement method in order to overcome the defect of low efficiency of graph vertex rearrangement described in the prior art.
  • the method includes the following steps:
  • S1 includes the following steps:
  • the information of the whole image is stored in a compressed sparse row (Compressed Sparse Row, CSR) manner.
  • CSR compressed Sparse Row
  • S1.2 is specifically as follows: in order to reproduce the rearrangement result, set the ID rearrangement from the vertex with the smallest original ID of G to start the ID rearrangement, and select the seed to lead you to seed, where the vertex with the smallest original ID is the initial seed point.
  • the selection method of seed points is as follows:
  • S2 is specifically:
  • the vertices of the hypergraph structure are merged hypervertices.
  • a super vertex is a logical concept and is actually a collection of vertices.
  • the generation of hypervertices includes:
  • S2.2 includes the following steps:
  • N Hv represents the set of neighbor vertices of H
  • check the flag bit is_trivial of vertex u if the flag bit is_trivial is false, skip further operations on the vertex (that is, do not operate on the vertex), if the flag bit is_trivial is true, execute S2 .2.3;
  • S2.2.4 Repeat the above operations (S2.2.1-S2.2.3) for other connected outgoing neighbors one by one. After merging all unassigned neighbor vertices, add one to the number of merged hops;
  • S2.2.5 Repeat the above steps (S2.2.1-S2.2.4) for the newly added vertices in H, until the number of hops hop reaches the given number of merged hops k, one merge operation is completed, and the super vertex H is output; The low-in-degree vertices within k-hops are merged into H and output.
  • S3 includes the following steps:
  • S3.2 Select a vertex from the neighbor vertices of the super vertex H as the seed point seed. When all the vertices of a connected component of the seed point seed have obtained new IDs, select the vertex with the smallest original ID among the remaining connected components as the seed point seed. , perform S2 until all vertices get unique new IDs.
  • the allocation rule for allocating new IDs to the neighbor vertices of the super vertex H is as follows: firstly, the high-in-degree neighbor vertices are allocated consecutive IDs, and then the low-in-degree neighbor vertices are allocated consecutive IDs.
  • the beneficial effect of the technical solution of the present invention is that the method of reassigning IDs by merging low-in-degree vertices enables the common neighbors of the merging vertices to have similar IDs.
  • the method of the present invention can effectively improve the graph vertex rearrangement efficiency.
  • FIG. 1 shows the lightweight and efficient graph vertex rearrangement method described in Embodiment 1.
  • FIG. 1 shows the lightweight and efficient graph vertex rearrangement method described in Embodiment 1.
  • FIG. 2 is a schematic diagram of reassignment of graph vertex IDs.
  • FIG. 3 is a flowchart of graph vertex ID reassignment.
  • Figure 4 is a flow chart of generating hypervertices from seed points.
  • FIG. 5 is a process diagram of rearranging the numbers of the pictures that need to be rearranged by ID.
  • This embodiment provides a lightweight and efficient graph vertex rearrangement method. As shown in FIG. 1 , the method includes the following steps:
  • S1 Load the graph that needs to be rearranged by ID, and perform preprocessing to determine the seed point.
  • this method sets the ID rearrangement from the vertex with the smallest original ID, and selects the seed point seed, where the vertex with the smallest original ID is selected as the initial seed point.
  • step S2 the seed is selected and the super vertex H is generated by the seed, and then a new ID is allocated to the vertex.
  • the reassignment of the graph vertex ID is shown in Figure 2.
  • the specific allocation process is as follows (as shown in Figure 3):
  • N Hv represents the set of neighbor vertices of H
  • the specific operation of S3.2 is: select a vertex from H's neighbor vertices as a seed, and execute 2.2. If all vertices of the current connected component have obtained new IDs, select the vertex with the smallest original ID from the remaining connected components as seed, and execute 2.2 until all vertices have obtained unique new IDs.
  • Step 2 Generate super vertices: merge the vertices within the seed vertex k hops, the in-degree is less than ⁇ , and the new ID has not yet been assigned, and the vertices of vertex 1 within 1 hop, the in-degree is less than ⁇ , and the vertices that have not yet been assigned a new ID are 3, 7 number vertex, merge (1, 3, 7) into a super vertex, see Figure 5(a).
  • Step 3 Allocate new IDs for the vertices that have not obtained new IDs in the super-vertices: the new ID number that can be assigned is 0, so the vertices with the original IDs of 1, 3, and 7 correspond to the new IDs of 1, 2, and 3.
  • the new ID number currently assignable is 4.
  • Step 4 Assign a new ID to the neighbor of the super vertex that has not obtained a new ID: the number 5, 9, and 6 among the neighbors of the super vertex are high-in-degree vertices (sequentially assign new IDs to the high-in-degree vertices of each vertex of the super vertex , that is, first assign a new ID to the high-in-degree neighbor of vertex 1, then to the high-in-degree neighbor of vertex 3, and finally to the high-in-degree neighbor of vertex 7, the same below), 10 is the low-in-degree vertex (low
  • the ID allocation order of the in-degree neighbors is the same as that of the high-in-degree neighbors above, the same below).
  • the current assignable new ID number is 4, so the new IDs corresponding to the vertices whose original IDs are 5, 9, 6, and 10 are 4, 5, 6, and 7.
  • the currently assignable new ID number is 8, see Figure 5(b).
  • Step 5 Select seeds from neighbors: Since it is set here that only seeds are selected from outgoing neighbors, outgoing neighbors 5, 9, 6, and 10 all get new IDs.
  • Step 7 Generate super vertices: merge the vertices whose in-degree is less than ⁇ in seed vertex k and have not yet been assigned a new ID, and vertex 2, whose in-degree is less than ⁇ and has not yet been assigned a new ID, is vertex 8. Merge (2, 8) into a super vertex, see Figure 5(b).
  • Step 8 Allocate new IDs for the vertices that have not obtained new IDs in the super vertices: the current new ID number that can be allocated is 8, so the vertices whose original IDs are 2 and 8 correspond to new IDs of 8 and 9. The new ID number that is currently assignable is 10.
  • Step 9 Assign a new ID to the neighbor of the super vertex that has not obtained a new ID: the number 4 among the neighbors of the super vertex is a high-in-degree vertex.
  • the current assignable new ID number is 10, so the new ID corresponding to the vertex whose original ID is 4 is 10.
  • the currently assignable new ID number is 11, see Figure 5(c).
  • Step 11 Generate super vertices: merge the vertices with seed vertex k hops whose in-degree is less than ⁇ and have not yet been assigned a new ID, and vertex 4, whose in-degree is less than ⁇ and has not yet been assigned a new ID, is vertex 11. Merge (4, 11) into hypervertices, see Figure 5(c).
  • Step 12 Allocate a new ID to a vertex that has not obtained a new ID in the super-vertices: the new ID number that can be assigned is 11, so the new ID corresponding to the vertex whose original ID is 11 is 11. The new ID number currently assignable is 12. (Here, the vertex with the original ID of 4 has obtained a new ID and cannot be assigned repeatedly).
  • Step 13 Assign a new ID to the neighbors of the super-vertex that have not obtained the new ID: The neighbors of the super-vertex have all obtained new IDs and cannot be assigned repeatedly.
  • Step 14 The rearrangement is completed, and the result is shown in Figure 5(d).
  • the reassignment mechanism of adjacent low-in-degree vertices is combined and the ID allocation method for distinguishing the high- and low-in-degree distribution order is adopted.
  • the purpose of merging adjacent low-in-degree vertices is to determine the set of common neighbors between vertices in a fast method.
  • Related studies have shown that vertices with more common neighbors should have adjacent IDs. The property should be consistent with the proximity of vertices in the graph structure. With this mapping relationship, higher locality benefits and computational efficiency can be obtained when data access is performed in specific graph computations.
  • the order of ID allocation for distinguishing high and low in-degree vertices is to make the IDs of frequently accessed vertices similar.
  • Such vertices are frequently accessed, and similar IDs can improve the possibility of such vertices resident in the cache. Because a certain vertex generally has high-in-degree neighbors and low-in-degree neighbors, due to the limitation of cache capacity, if the high-in-degree vertex IDs are far apart, the access to low-in-degree neighbors and high-in-degree neighbors may lead to certain Some high-in-degree vertices that are about to be accessed are replaced out of the cache, reducing the cache hit rate.

Abstract

A lightweight and efficient graph vertex rearrangement method. The method comprises: loading a graph to be subjected to ID rearrangement and performing preprocessing to determine a seed point (S1); selecting the seed point and generating a super vertex by means of the seed point (S2); and assigning new IDs to vertices of said graph according to the super vertex to achieve graph vertex rearrangement (S3). According to the method, vertices having low in-degrees are merged before ID reassignment, such that common neighbors of the merged vertices have similar IDs, and because the ID rearrangement is always performed according to information of the vertices directly connected to a current vertex, which is local information, a rearrangement operation can be completed quickly while achieving local optimization. The method can improve graph vertex rearrangement efficiency.

Description

一种轻量级的高效图顶点重排方法A Lightweight and Efficient Graph Vertex Rearrangement Method 技术领域technical field
本发明涉及图计算领域,更具体地,涉及一种轻量级的高效图顶点重排方法。The invention relates to the field of graph computing, and more particularly, to a lightweight and efficient graph vertex rearrangement method.
背景技术Background technique
由于图能够很好地表示现实的各种关系,并且可以通过各种图算法来发挖掘图结构中潜在的事实,故图计算有很多应用。然而,大规模图谱的计算需要操作的数据量非常大,为了更高效地完成图计算,学者试图从多个方面进行优化,包括图计算环境的完善、算法层面的优化以及图数据的预处理等方面。其中图数据的预处理在图算法执行之前,不涉及具体图算法,不需要对原有算法进行修改,故某一确定图经过一次预处理后,可以在新图上进行不同图算法的单次或者多次计算。Since graphs can well represent various relationships in reality, and various graph algorithms can be used to discover potential facts in graph structures, graph computing has many applications. However, the calculation of large-scale graphs requires a very large amount of data to operate. In order to complete graph computations more efficiently, scholars have tried to optimize from many aspects, including the improvement of graph computing environment, optimization of algorithms, and preprocessing of graph data. aspect. The preprocessing of the graph data does not involve the specific graph algorithm before the execution of the graph algorithm, and the original algorithm does not need to be modified. Therefore, after a certain graph is preprocessed once, a single graph algorithm can be performed on the new graph. or multiple calculations.
以上的图计算预处理方法具体指对图顶点ID重新进行分配方法。用图结构表示现实关系时,需要为每个图顶点分配一个唯一的ID,便于图的存储与图计算。图计算主要包括邻居信息获取与本地计算两个阶段,顶点需要获取其邻居的信息,具体地是通过边表获取其邻居的ID,进而访问该ID所对应的数据。具体应用上,常常使用数组来按序存储所有顶点的数据,顶点的ID对应数组的下标,可通过<数组+ID>直接访问该顶点的数据。对数组元素的访问分为顺序访问方式和不规则访问方式,两者在数据的访问效率上差异较大。顺序访问方式可以很好地利用空间局部性原理,以提高cache的命中率,从而加快计算速度,故应该尽最大可能实现对数组的访问是按序。此外,数组中高频访问的元素应该常驻cache,以获得更多数据访问的时间局部性效益,加快计算速度。The above graph calculation preprocessing method specifically refers to a method for reassigning graph vertex IDs. When using the graph structure to represent the real relationship, it is necessary to assign a unique ID to each graph vertex, which is convenient for graph storage and graph computation. Graph calculation mainly includes two stages: neighbor information acquisition and local calculation. A vertex needs to acquire the information of its neighbors. Specifically, it obtains the ID of its neighbor through the edge table, and then accesses the data corresponding to the ID. In specific applications, arrays are often used to store the data of all vertices in order. The ID of the vertex corresponds to the subscript of the array, and the data of the vertex can be directly accessed through <array+ID>. The access to the elements of the array is divided into sequential access and irregular access, both of which are quite different in terms of data access efficiency. The sequential access method can make good use of the principle of spatial locality to improve the hit rate of the cache, thereby speeding up the calculation speed. Therefore, the access to the array should be implemented in order as much as possible. In addition, frequently accessed elements in the array should be resident in the cache to obtain more time locality benefits of data access and speed up computation.
在图顶点ID重排方面已有较广较深入的研究,根据不同方法完成重排所需要消耗的时间长短可分为耗时型与轻量级ID重排方法。经某ID重排方法得到图顶点新ID,如果在该新图上完成相同图计算所需的时间相对原图更少,则称该方法的重排结果是高效的,否则是低效的;如果该方法完成重排所需要的时间远大于图算法的计算时间,则称该方法的重排操作是低效的,称为耗时型ID重排 方法,否则是高效的,称为轻量级ID重排方法。There has been extensive and in-depth research on the ID rearrangement of graph vertices. According to the time consumed by different methods to complete the rearrangement, it can be divided into time-consuming and lightweight ID rearrangement methods. The new ID of the graph vertex is obtained by a certain ID rearrangement method. If the time required to complete the calculation of the same graph on the new graph is less than that of the original graph, the rearrangement result of this method is said to be efficient, otherwise it is inefficient; If the time required for the method to complete the rearrangement is much greater than the calculation time of the graph algorithm, the rearrangement operation of the method is said to be inefficient, which is called a time-consuming ID rearrangement method, otherwise it is efficient, called lightweight Class ID rearrangement method.
耗时型ID重排方法以Gorder(H.Wei,J.X.Yu,C.Lu,and X.Lin.Speedup graph processing by graph ordering.In ACM SIGMOD,2016.)为代表。Gorder通过分析给出,结构上相邻的顶点应该拥有相近的ID,即拥有许多共同邻居的顶点,它们ID应该是相近的。形式描述如等式①,第一部分表示顶点u与顶点v的共同邻居个数,取值为共同邻居总个数,第二部分表示顶点u与v是否为邻居顶点,取值为1或者0。Gorder为顶点分配新ID时追求F(Φ)的最大化,可得较高效的重排结果,其中F(Φ)由等式②计算所得。但是Gorder亦说明,最小化F(Φ)是一个NP难问题,在大规模的图数据上直接使用该法进行重排,所需要的重排时间很长,重排操作是低效的,不适用于实际应用,故Gorder提出一种近似计算方法来进行重排,其重排思想与最大化F(Φ)是一致的,故其重排效果也是高效的,但是其重排操作仍然是低效的。The time-consuming ID rearrangement method is represented by Gorder (H.Wei, J.X.Yu, C.Lu, and X.Lin.Speedup graph processing by graph ordering.In ACM SIGMOD, 2016.). Gorder gives through analysis that structurally adjacent vertices should have similar IDs, that is, vertices with many common neighbors, their IDs should be similar. The formal description is as Equation 1. The first part represents the number of common neighbors of vertex u and vertex v, and the value is the total number of common neighbors. The second part represents whether u and v are neighbor vertices, and the value is 1 or 0. Gorder pursues the maximization of F(Φ) when assigning new IDs to vertices, and a more efficient rearrangement result can be obtained, where F(Φ) is calculated by Equation ②. But Gorder also pointed out that minimizing F(Φ) is an NP-hard problem. Using this method directly for rearrangement on large-scale graph data requires a long time for rearrangement, and the rearrangement operation is inefficient. It is suitable for practical applications, so Gorder proposes an approximate calculation method for rearrangement. Its rearrangement idea is consistent with maximizing F(Φ), so its rearrangement effect is also efficient, but its rearrangement operation is still low. effective.
S(u,v)=S s(u,v)+S n(u,v)① S (u,v)=Ss(u,v)+ Sn (u,v)①
Figure PCTCN2020125962-appb-000001
Figure PCTCN2020125962-appb-000001
轻量级ID重排方法多是仅考虑顶点的度(与该顶点直接相连的顶点个数,下同)对重排效果的影响,其重排操作是高效的,但是其重排效果是低效的,如DBG(P.Faldu,J.Diamond,and B.Grot.A closer look at lightweight graph reordering.In IEEE IISWC,2019.)。DBG为顶点分配新ID时考虑顶点的度,通过聚类方法为顶点分配新ID,使得高频访问的顶点拥有相近的ID,理论上可以获得较高的局部性效益。但是由于此类方法仅考虑顶点的度,不结合更多的图结构特性,所以该类方法并不能保证其重排效果。因为实际应用的图规模很大,高频访问的顶点相对较多,而计算机的cache容量有限,不能同时容纳所有高频访问的顶点信息。如果高频访问的顶点在图结构上距离较远,而此时不正确地为这类高频访问的顶点分配相邻ID,cache容量的限制会导致很多即将被访问的高频顶点被置换出cache,降低cache的命中率,影响计算效率。Norder(E.Lee,J.Kim,K.Lim,S.H.Noh,and J.Seo.Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph engines.In USENIX ATC,2019.)在考虑顶点度的基 础上,结合顶点的邻居信息,其重排操作效果相对是高效的,而且操作效果相对也是高效的;但由于Noder考虑的邻居信息是局部性的,其重排效果相对于Gorder是要低很多的。Most of the lightweight ID rearrangement methods only consider the influence of the degree of the vertex (the number of vertices directly connected to the vertex, the same below) on the rearrangement effect. The rearrangement operation is efficient, but the rearrangement effect is low. Effective, such as DBG (P.Faldu, J.Diamond, and B.Grot.A closer look at lightweight graph reordering.In IEEE IISWC, 2019.). DBG considers the degree of vertices when assigning new IDs to vertices, and assigns new IDs to vertices through the clustering method, so that frequently visited vertices have similar IDs, and theoretically higher locality benefits can be obtained. However, since this type of method only considers the degree of vertices and does not combine more graph structure characteristics, this type of method cannot guarantee its rearrangement effect. Because the scale of the actual application graph is large, there are relatively many vertices accessed frequently, and the cache capacity of the computer is limited, which cannot accommodate all the vertex information of high frequency access at the same time. If the frequently accessed vertices are far apart in the graph structure, and the adjacent IDs are incorrectly assigned to such frequently accessed vertices, the limitation of cache capacity will cause many high-frequency vertices to be accessed to be replaced. cache, which reduces the cache hit rate and affects computing efficiency. Norder (E.Lee,J.Kim,K.Lim,S.H.Noh,and J.Seo.Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph engines.In USENIX ATC,2019.) in On the basis of considering the vertex degree, combined with the neighbor information of the vertex, the effect of the rearrangement operation is relatively efficient, and the operation effect is relatively efficient; but since the neighbor information considered by Noder is local, its rearrangement effect is relative to Gorder. is much lower.
发明内容SUMMARY OF THE INVENTION
本发明为克服上述现有技术所述的图顶点重排效率不高的缺陷,提供一种轻量级的高效图顶点重排方法。The present invention provides a lightweight and high-efficiency graph vertex rearrangement method in order to overcome the defect of low efficiency of graph vertex rearrangement described in the prior art.
所述方法包括以下步骤:The method includes the following steps:
S1:加载需进行ID重排的图,并进行预处理,确定种子点;S1: Load the graph that needs to be rearranged by ID, and perform preprocessing to determine the seed point;
S2:选择种子点,并通过种子点生成超顶点;S2: Select a seed point, and generate a super vertex from the seed point;
S3:根据超顶点为需进行ID重排的图的顶点分配新ID,从而实现图顶点重排。S3: Allocate new IDs to the vertices of the graph that need to be reordered according to the super-vertices, so as to realize the reordering of graph vertices.
优选地,S1包括以下步骤:Preferably, S1 includes the following steps:
S1.1:加载需要进行ID重排的图G=(V,E),V表示图的顶点集,E表示图的边集,|V|、|E|分别表示顶点数目与边数目;S1.1: Load the graph G=(V, E) that needs to be rearranged by ID, where V represents the vertex set of the graph, E represents the edge set of the graph, and |V| and |E| represent the number of vertices and the number of edges respectively;
然后存储整图信息,根据顶点的入度设置该顶点的标志位is_trivial,入度大于给定阈值λ则设置is_trivial=true,否则设置is_trivial=false;Then store the whole image information, set the flag bit is_trivial of the vertex according to the in-degree of the vertex, set is_trivial=true if the in-degree is greater than the given threshold λ, otherwise set is_trivial=false;
S1.2:设定重排规则,并选定种子点。S1.2: Set rearrangement rules and select seed points.
优选地,存储整图信息采用压缩稀疏行(Compressed Sparse Row,CSR)的方式进行存储。Preferably, the information of the whole image is stored in a compressed sparse row (Compressed Sparse Row, CSR) manner.
优选地,S1.2具体为:为了重排结果能够复现,设定从G的原ID最小的顶点开始进行ID重排,选定种子带你seed,其中,原ID最小的顶点为初始种子点。Preferably, S1.2 is specifically as follows: in order to reproduce the rearrangement result, set the ID rearrangement from the vertex with the smallest original ID of G to start the ID rearrangement, and select the seed to lead you to seed, where the vertex with the smallest original ID is the initial seed point.
其中,种子点的选定方法如下:Among them, the selection method of seed points is as follows:
(1)当需进行ID重排的图的连通分量还有未分配新ID的顶点时,选择已经分配新ID的顶点的一个未分配ID的顶点当成种子点;(1) when the connected component of the graph that needs to carry out ID rearrangement also has a vertex that does not have a new ID assigned, select an unassigned vertex of the vertex that has been assigned a new ID as a seed point;
(2)当需进行ID重排的图的连通分量所有顶点均获得了新ID,从剩下的、未进行ID分配的连通分量中选择一个顶点当成种子点;(2) when all vertices of the connected components of the graph that need to carry out ID rearrangement have all obtained new IDs, select a vertex as a seed point from the remaining connected components that do not carry out ID allocation;
(3)以上(1)、(2)符合要求的顶点可能有多个,故选择原ID相对最小的顶点当成种子点。(3) There may be multiple vertices that meet the requirements of (1) and (2) above, so the vertex with the relatively smallest original ID is selected as the seed point.
(4)初始种子点的选取就是根据以上的(2)进行的。(4) The selection of the initial seed point is carried out according to the above (2).
优选地,S2具体为:Preferably, S2 is specifically:
以初始种子点seed为中心,将其k跳内的低入度且还未分配新ID的顶点合并成超顶点H,得其超图结构;Taking the initial seed point seed as the center, merge the vertices with low in-degree within k hops that have not yet been assigned a new ID into a super vertex H, and obtain its hypergraph structure;
其中,超图结构的顶点为合并后的超顶点。超顶点为逻辑概念,实为顶点集合。The vertices of the hypergraph structure are merged hypervertices. A super vertex is a logical concept and is actually a collection of vertices.
优选地,超顶点的生成包括:Preferably, the generation of hypervertices includes:
S2.1:初始化需进行ID重排的图的所有顶点ID分配标志位assigned=false,置所有顶点新ID为Φ(*)=-1,种子点seed=-1,设辅助变量move_id=1(表示当前可分配的新ID),辅助变量v=-1(表示当前可参与分配的顶点原ID)。S2.1: Initialize the ID allocation flag of all vertices of the graph that needs to be rearranged by ID assigned=false, set the new ID of all vertices to Φ(*)=-1, seed point seed=-1, set auxiliary variable move_id=1 (representing the new ID that can be allocated currently), auxiliary variable v=-1 (representing the original ID of the vertex currently participating in the allocation).
S2.2:通过超顶点生成函数Fusion(seed)得到超顶点H:S2.2: The super vertex H is obtained by the super vertex generation function Fusion(seed):
优选地,S2.2包括以下步骤:Preferably, S2.2 includes the following steps:
S2.2.1:输入种子点seed,设置超顶点H={seed};S2.2.1: Input the seed point seed, and set the super vertex H={seed};
S2.2.2:
Figure PCTCN2020125962-appb-000002
(N Hv表示H的邻居顶点集),检查顶点u的标志位is_trivial,标志位is_trivial为false则跳过对该顶点的进一步操作(即不对该顶点进行操作),标志位is_trivial为true则执行S2.2.3;
S2.2.2:
Figure PCTCN2020125962-appb-000002
(N Hv represents the set of neighbor vertices of H), check the flag bit is_trivial of vertex u, if the flag bit is_trivial is false, skip further operations on the vertex (that is, do not operate on the vertex), if the flag bit is_trivial is true, execute S2 .2.3;
S2.2.3:检查顶点u的分配标志位assigned,分配标志位assigned为true则跳过对该顶点的进一步操作(即不对该顶点进行操作),分配标志位assigned为false则将该点添加至H中。S2.2.3: Check the assigned flag bit assigned of vertex u, if the assigned flag bit assigned is true, then skip the further operation of the vertex (that is, do not operate the vertex), if the assigned flag bit assigned is false, add the point to H middle.
S2.2.4:逐个对其他相连出边邻居重复以上操作(S2.2.1-S2.2.3),合并所有未分配邻居顶点后,合并跳数加一;S2.2.4: Repeat the above operations (S2.2.1-S2.2.3) for other connected outgoing neighbors one by one. After merging all unassigned neighbor vertices, add one to the number of merged hops;
S2.2.5:对H中新添的顶点重复执行以上步骤(S2.2.1-S2.2.4),直至跳数hop达到给定的合并跳数k,一次合并操作完成,输出超顶点H;即将种子点k跳内的低入度顶点合并入H后输出。S2.2.5: Repeat the above steps (S2.2.1-S2.2.4) for the newly added vertices in H, until the number of hops hop reaches the given number of merged hops k, one merge operation is completed, and the super vertex H is output; The low-in-degree vertices within k-hops are merged into H and output.
优选地,S3包括以下步骤:Preferably, S3 includes the following steps:
S3.1:先为S2生成的超顶点H分配连续ID,再为超顶点H的邻居顶点分配新ID;更新以上已获得新ID的顶点的分配标志位assigned=true。S3.1: First assign a continuous ID to the super vertex H generated by S2, and then assign a new ID to the neighbor vertices of the super vertex H; update the assignment flag bit assigned=true of the vertices that have obtained the new ID above.
S3.2:从超顶点H的邻居顶点中选择一个顶点当成种子点seed,当种子点 seed的某个连通分量的顶点全部获得新ID后,选择剩余连通分量中原ID最小的顶点当成种子点seed,执行S2,直至所有顶点均获得唯一的新ID。S3.2: Select a vertex from the neighbor vertices of the super vertex H as the seed point seed. When all the vertices of a connected component of the seed point seed have obtained new IDs, select the vertex with the smallest original ID among the remaining connected components as the seed point seed. , perform S2 until all vertices get unique new IDs.
优选地,为超顶点H的邻居顶点分配新ID的分配规则为:首先为高入度邻居顶点分配连续ID,再为低入度邻居顶点分配连续ID。Preferably, the allocation rule for allocating new IDs to the neighbor vertices of the super vertex H is as follows: firstly, the high-in-degree neighbor vertices are allocated consecutive IDs, and then the low-in-degree neighbor vertices are allocated consecutive IDs.
与现有技术相比,本发明技术方案的有益效果是:本发明通过合并低入度顶点再分配ID的方法,使得合并顶点的共同邻居拥有相近的ID。本发明所述方法可有效图提高图顶点重排效率。Compared with the prior art, the beneficial effect of the technical solution of the present invention is that the method of reassigning IDs by merging low-in-degree vertices enables the common neighbors of the merging vertices to have similar IDs. The method of the present invention can effectively improve the graph vertex rearrangement efficiency.
附图说明Description of drawings
图1为实施例1所述轻量级的高效图顶点重排方法。FIG. 1 shows the lightweight and efficient graph vertex rearrangement method described in Embodiment 1. FIG.
图2为图顶点ID再分配示意图。FIG. 2 is a schematic diagram of reassignment of graph vertex IDs.
图3为图顶点ID再分配流程图。FIG. 3 is a flowchart of graph vertex ID reassignment.
图4为通过种子点生成超顶点的流程图。Figure 4 is a flow chart of generating hypervertices from seed points.
图5为对需进行ID重排的图进行编号重排的过程图。FIG. 5 is a process diagram of rearranging the numbers of the pictures that need to be rearranged by ID.
具体实施方式Detailed ways
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;In order to better illustrate this embodiment, some parts of the drawings are omitted, enlarged or reduced, which do not represent the size of the actual product;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。It will be understood by those skilled in the art that some well-known structures and their descriptions may be omitted from the drawings.
下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.
实施例1Example 1
本实施例提供一种轻量级的高效图顶点重排方法,如图1所示,所述方法包括以下步骤:This embodiment provides a lightweight and efficient graph vertex rearrangement method. As shown in FIG. 1 , the method includes the following steps:
S1:加载需进行ID重排的图,并进行预处理,确定种子点。S1: Load the graph that needs to be rearranged by ID, and perform preprocessing to determine the seed point.
S1.1:加载需要进行ID重排的图G=(V,E),V表示图的顶点集,E表示图的边集,|V|、|E|分别表示顶点数目与边数目。采用压缩稀疏行(Compressed Sparse Row,CSR)的方式存储整图信息。根据顶点的入度设置该顶点的标志位is_trivial,入度大于给定阈值λ则设置is_trivial=true,否则设置is_trivial=false。S1.1: Load the graph G=(V, E) that needs to be rearranged by ID, where V represents the vertex set of the graph, E represents the edge set of the graph, and |V| and |E| represent the number of vertices and the number of edges, respectively. The whole image information is stored in the form of Compressed Sparse Row (CSR). The flag bit is_trivial of the vertex is set according to the in-degree of the vertex. If the in-degree is greater than the given threshold λ, is_trivial=true, otherwise, is_trivial=false.
S1.2:为了重排结果能够复现,本方法设定从原ID最小的顶点开始进行ID 重排,选定种子点seed,其中,选定原ID最小的顶点为初始种子点。S1.2: In order to reproduce the rearrangement result, this method sets the ID rearrangement from the vertex with the smallest original ID, and selects the seed point seed, where the vertex with the smallest original ID is selected as the initial seed point.
S2:以种子点seed为中心,将其k跳内的低入度且还未分配新ID的顶点合并成超顶点H,得其超图结构,超图的顶点为合并后的超顶点。S2: Take the seed point seed as the center, merge the vertices with low in-degree within k hops and have not yet been assigned a new ID into a super-vertex H to obtain the hyper-graph structure, and the vertices of the hyper-graph are the merged hyper-vertices.
以上步骤S2中,选择seed并通过seed生成超顶点H,进而为顶点分配新ID,图顶点ID再分配如图2所示,具体分配过程如下(如图3所示):In the above step S2, the seed is selected and the super vertex H is generated by the seed, and then a new ID is allocated to the vertex. The reassignment of the graph vertex ID is shown in Figure 2. The specific allocation process is as follows (as shown in Figure 3):
S2.1:初始化需进行ID重排的图的所有顶点ID分配标志位assigned=false,置所有顶点新ID为Φ(*)=-1,种子点seed=-1,设辅助变量move_id=1(表示当前可分配的新ID),辅助变量v=-1(表示当前可参与分配的顶点原ID)。S2.1: Initialize the ID allocation flag of all vertices of the graph that needs to be rearranged by ID assigned=false, set the new ID of all vertices to Φ(*)=-1, seed point seed=-1, set auxiliary variable move_id=1 (representing the new ID that can be allocated currently), auxiliary variable v=-1 (representing the original ID of the vertex currently participating in the allocation).
S2.2:通过超顶点生成函数Fusion(seed)得到H(见图4),具体包括以下步骤:S2.2: Obtain H (see Figure 4) through the hypervertex generating function Fusion(seed), which includes the following steps:
S2.2.1:设置超顶点H={seed},S2.2.1: Set the super vertex H={seed},
S2.2.2:
Figure PCTCN2020125962-appb-000003
(N Hv表示H的邻居顶点集),检查顶点u的标志位is_trivial,false则跳过对该顶点的进一步操作,true则继续下一步操作。
S2.2.2:
Figure PCTCN2020125962-appb-000003
(N Hv represents the set of neighbor vertices of H), check the flag bit is_trivial of vertex u, false skips further operations on the vertex, and true continues the next operation.
S2.2.3:检查顶点u的分配标志位assigned,true则跳过对该顶点的进一步操作,false则将该点添加至H中。S2.2.3: Check the assigned flag bit assigned of vertex u, true skips further operations on the vertex, and false adds the point to H.
S2.2.4:逐个对其他相连出边邻居重复以上步骤。合并所有未分配邻居顶点后,合并跳数加一。S2.2.4: Repeat the above steps for other connected outgoing neighbors one by one. After merging all unassigned neighbor vertices, increase the merge hop count by one.
S2.2.5:对H中新添的顶点重复执行以上步骤,直至跳数hop达到给定的合并跳数k,一次合并操作完成,输出超顶点H。S2.2.5: Repeat the above steps for the newly added vertices in H until the number of hops hop reaches the given number of merged hops k, one merge operation is completed, and the super vertex H is output.
S3:根据超顶点为需进行ID重排的图的顶点分配新ID,从而实现图顶点重排。S3: Allocate new IDs to the vertices of the graph that need to be reordered according to the super-vertices, so as to realize the reordering of graph vertices.
S3.1:先为以上生成的H分配连续ID,再为H的邻居顶点分配新ID,首先为高入度邻居顶点分配连续ID,再为低入度邻居顶点分配连续ID。更新以上已获得新ID的顶点的分配标志位assigned=true。S3.1: First assign continuous IDs to the H generated above, and then assign new IDs to the neighbor vertices of H. First, assign continuous IDs to high-in-degree neighbor vertices, and then assign continuous IDs to low-in-degree neighbor vertices. Update the assigned flag bit assigned=true of the vertex that has obtained the new ID above.
换句话说,S3.1的具体操作为:为S2得到的H分配新ID,再为H直接相连的顶点分配新ID。具体地,首先为其中标志位is_trivial=false的邻居顶点分配新ID,然后为标志位is_trivial=true的顶点分配新ID。将已完成分配的顶点的标志位is_assigned设置为true。In other words, the specific operation of S3.1 is: assign a new ID to the H obtained by S2, and then assign a new ID to the vertices directly connected to H. Specifically, firstly, a new ID is assigned to a neighbor vertex whose flag bit is_trivial=false, and then a new ID is assigned to a vertex whose flag bit is_trivial=true. Sets the flag bit is_assigned to true for vertices that have been assigned.
S3.2:从以上H的邻居顶点中选择一个顶点当成seed,当种子点seed的某个连通分量的顶点全部获得新ID后,选择剩余连通分量中原ID最小的顶点当成 种子点seed,执行步骤S2,直至所有顶点均获得唯一的新ID。S3.2: Select a vertex from the neighbor vertices of H above as the seed. When all the vertices of a connected component of the seed point seed have obtained new IDs, select the vertex with the smallest original ID among the remaining connected components as the seed point seed, and execute the steps S2 until all vertices get unique new IDs.
换句话说,S3.2的具体操作为:从H的邻居顶点中选择一个顶点当成seed,执行2.2。如果当前连通分量的所有顶点均已获得新ID,则从剩余连通分量中选择原ID最小的顶点当成seed,执行2.2,直至所有顶点均获得唯一的新ID。In other words, the specific operation of S3.2 is: select a vertex from H's neighbor vertices as a seed, and execute 2.2. If all vertices of the current connected component have obtained new IDs, select the vertex with the smallest original ID from the remaining connected components as seed, and execute 2.2 until all vertices have obtained unique new IDs.
作为一个具体的实施例,下面结合具体实例对本实施例进行说明:As a specific embodiment, the present embodiment will be described below in conjunction with specific examples:
以图5(a)为例,本实施例进行编号重排的过程如下(见图5),设定入度阈值λ=1,合并跳数k=1,当前可分配的新ID号为0;由于例图规模较小,此例设定仅从出边邻居中选择seed:Taking Fig. 5(a) as an example, the process of performing the numbering rearrangement in this embodiment is as follows (see Fig. 5), the in-degree threshold λ=1 is set, the number of merged hops k=1, and the currently assignable new ID number is 0. ; Due to the small scale of the example graph, this example only selects seeds from the neighbors on the outgoing edge:
步骤1:从未获得新ID的顶点中选择原ID最小的当成seed:符合条件的为1号顶点,选择seed=1。Step 1: Select the vertex with the smallest original ID as the seed from the vertices that have not obtained the new ID: the vertex that meets the conditions is No. 1 vertex, and select seed=1.
步骤2:生成超顶点:合并seed顶点k跳内、入度小于λ、还未分配新ID的顶点,1号顶点1跳内、入度小于λ、还未分配新ID的顶点为3,7号顶点,将(1,3,7)合并成超顶点,见图5(a)。Step 2: Generate super vertices: merge the vertices within the seed vertex k hops, the in-degree is less than λ, and the new ID has not yet been assigned, and the vertices of vertex 1 within 1 hop, the in-degree is less than λ, and the vertices that have not yet been assigned a new ID are 3, 7 number vertex, merge (1, 3, 7) into a super vertex, see Figure 5(a).
步骤3:为超顶点中未获得新ID的顶点分配新ID:当前可分配的新ID号为0,所以原ID为1,3,7的顶点对应新ID为1,2,3。当前可分配的新ID号为4。Step 3: Allocate new IDs for the vertices that have not obtained new IDs in the super-vertices: the new ID number that can be assigned is 0, so the vertices with the original IDs of 1, 3, and 7 correspond to the new IDs of 1, 2, and 3. The new ID number currently assignable is 4.
步骤4:为超顶点未获得新ID的邻居分配新ID:超顶点的邻居中编号5,9,6为高入度顶点(按序给超顶点的每个顶点的高入度顶点分配新ID,即先为1号顶点的高入度邻居分配新ID,再到3号顶点的高入度邻居,最后是7号顶点的高入度邻居,下同),10为低入度顶点(低入度邻居的ID分配顺序与以上高入度邻居的分配顺序一致,下同)。当前可分配的新ID号为4,所以原ID为5,9,6,10的顶点对应的新ID为4,5,6,7。当前可分配的新ID号为8,见图5(b)。Step 4: Assign a new ID to the neighbor of the super vertex that has not obtained a new ID: the number 5, 9, and 6 among the neighbors of the super vertex are high-in-degree vertices (sequentially assign new IDs to the high-in-degree vertices of each vertex of the super vertex , that is, first assign a new ID to the high-in-degree neighbor of vertex 1, then to the high-in-degree neighbor of vertex 3, and finally to the high-in-degree neighbor of vertex 7, the same below), 10 is the low-in-degree vertex (low The ID allocation order of the in-degree neighbors is the same as that of the high-in-degree neighbors above, the same below). The current assignable new ID number is 4, so the new IDs corresponding to the vertices whose original IDs are 5, 9, 6, and 10 are 4, 5, 6, and 7. The currently assignable new ID number is 8, see Figure 5(b).
步骤5:从邻居中选择seed:由于此处设定仅从出边邻居中选择seed,5,9,6,10的出边邻居均获得新ID。Step 5: Select seeds from neighbors: Since it is set here that only seeds are selected from outgoing neighbors, outgoing neighbors 5, 9, 6, and 10 all get new IDs.
步骤6:从未获得新ID的顶点中选择原ID最小的当成seed:符合条件的为2号顶点,选择seed=2。Step 6: Select the vertex with the smallest original ID as the seed from the vertices that have not obtained the new ID: the vertex that meets the conditions is No. 2 vertex, and select seed=2.
步骤7:生成超顶点:合并seed顶点k跳内入度小于λ、还未分配新ID的顶点,2号顶点1跳内、入度小于λ、还未分配新ID的顶点为8号顶点,将(2,8)合并成超顶点,见图5(b)。Step 7: Generate super vertices: merge the vertices whose in-degree is less than λ in seed vertex k and have not yet been assigned a new ID, and vertex 2, whose in-degree is less than λ and has not yet been assigned a new ID, is vertex 8. Merge (2, 8) into a super vertex, see Figure 5(b).
步骤8:为超顶点中未获得新ID的顶点分配新ID:当前可分配的新ID号为8,所以原ID为2,8的顶点对应新ID为8,9。当前可分配的新ID号为10。Step 8: Allocate new IDs for the vertices that have not obtained new IDs in the super vertices: the current new ID number that can be allocated is 8, so the vertices whose original IDs are 2 and 8 correspond to new IDs of 8 and 9. The new ID number that is currently assignable is 10.
步骤9:为超顶点未获得新ID的邻居分配新ID:超顶点的邻居中编号4为高入度顶点。当前可分配的新ID号为10,所以原ID为4的顶点对应的新ID为10。当前可分配的新ID号为11,见图5(c)。Step 9: Assign a new ID to the neighbor of the super vertex that has not obtained a new ID: the number 4 among the neighbors of the super vertex is a high-in-degree vertex. The current assignable new ID number is 10, so the new ID corresponding to the vertex whose original ID is 4 is 10. The currently assignable new ID number is 11, see Figure 5(c).
步骤10:从邻居中选择seed:选择seed=4;Step 10: Select seed from neighbors: select seed=4;
步骤11:生成超顶点:合并seed顶点k跳内入度小于λ、还未分配新ID的顶点,4号顶点1跳内、入度小于λ、还未分配新ID的顶点为11号顶点,将(4,11)合并成超顶点,见图5(c)。Step 11: Generate super vertices: merge the vertices with seed vertex k hops whose in-degree is less than λ and have not yet been assigned a new ID, and vertex 4, whose in-degree is less than λ and has not yet been assigned a new ID, is vertex 11. Merge (4, 11) into hypervertices, see Figure 5(c).
步骤12:为超顶点中未获得新ID的顶点分配新ID:当前可分配的新ID号为11,所以原ID为11的顶点对应的新ID为11。当前可分配的新ID号为12。(此处原ID为4的顶点已获得新ID,不能重复分配)。Step 12: Allocate a new ID to a vertex that has not obtained a new ID in the super-vertices: the new ID number that can be assigned is 11, so the new ID corresponding to the vertex whose original ID is 11 is 11. The new ID number currently assignable is 12. (Here, the vertex with the original ID of 4 has obtained a new ID and cannot be assigned repeatedly).
步骤13:为超顶点未获得新ID的邻居分配新ID:超顶点邻居均已获得新ID,不能重复分配。Step 13: Assign a new ID to the neighbors of the super-vertex that have not obtained the new ID: The neighbors of the super-vertex have all obtained new IDs and cannot be assigned repeatedly.
步骤14:重排完成,结果见图5(d)。Step 14: The rearrangement is completed, and the result is shown in Figure 5(d).
本实施例通过合并相邻低入度顶点再分配机制以及区分高低入度分配顺序的ID分配方法。其中合并相邻低入度顶点是为了以一种快速的方法确定顶点之间的共同邻居集合,相关研究表明,共同邻居越多的顶点之间应该拥有相邻的ID,顶点ID之间的临近性应该和顶点在图结构上的临近性是一致性的,有了此映射关系,在具体图计算进行数据访问时,能够获得更高的局部性效益,提高计算效率。区分高低入度顶点的ID分配顺序是为了使得高频访问的顶点之间的ID是相近的,此类顶点被频繁访问,ID相近可以提高此类顶点驻留cache的可能性。因为某一确定的顶点一般具有高入度邻居与低入度邻居,由于cache容量的限制,如果高入度顶点ID相去较远,对低入度邻居与高入度邻居的访问可能会导致某些即将被访问的高入度顶点被置换出cache,降低cache的命中率。In this embodiment, the reassignment mechanism of adjacent low-in-degree vertices is combined and the ID allocation method for distinguishing the high- and low-in-degree distribution order is adopted. Among them, the purpose of merging adjacent low-in-degree vertices is to determine the set of common neighbors between vertices in a fast method. Related studies have shown that vertices with more common neighbors should have adjacent IDs. The property should be consistent with the proximity of vertices in the graph structure. With this mapping relationship, higher locality benefits and computational efficiency can be obtained when data access is performed in specific graph computations. The order of ID allocation for distinguishing high and low in-degree vertices is to make the IDs of frequently accessed vertices similar. Such vertices are frequently accessed, and similar IDs can improve the possibility of such vertices resident in the cache. Because a certain vertex generally has high-in-degree neighbors and low-in-degree neighbors, due to the limitation of cache capacity, if the high-in-degree vertex IDs are far apart, the access to low-in-degree neighbors and high-in-degree neighbors may lead to certain Some high-in-degree vertices that are about to be accessed are replaced out of the cache, reducing the cache hit rate.
附图中描述位置关系的用语仅用于示例性说明,不能理解为对本专利的限制;The terms describing the positional relationship in the accompanying drawings are only used for exemplary illustration, and should not be construed as a limitation on this patent;
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进 等,均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the embodiments of the present invention. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (10)

  1. 一种轻量级的高效图顶点重排方法,其特征在于,所述方法包括以下步骤:A lightweight and efficient graph vertex rearrangement method, characterized in that the method comprises the following steps:
    S1:加载需进行ID重排的图,并进行预处理,确定种子点;S1: Load the graph that needs to be rearranged by ID, and perform preprocessing to determine the seed point;
    S2:选择种子点,并通过种子点生成超顶点;S2: Select a seed point, and generate a super vertex from the seed point;
    S3:根据超顶点为需进行ID重排的图的顶点分配新ID,从而实现图顶点重排。S3: Allocate new IDs to the vertices of the graph whose IDs need to be rearranged according to the super vertices, so as to realize the rearrangement of the graph vertices.
  2. 根据权利要求1所述轻量级的高效图顶点重排方法,其特征在于,S1包括以下步骤:The lightweight and efficient graph vertex rearrangement method according to claim 1, wherein S1 comprises the following steps:
    S1.1:加载需要进行ID重排的图G=(V,E),V表示图的顶点集,E表示图的边集,|V|、|E|分别表示顶点数目与边数目;S1.1: Load the graph G=(V, E) that needs to be rearranged by ID, where V represents the vertex set of the graph, E represents the edge set of the graph, and |V| and |E| represent the number of vertices and the number of edges respectively;
    然后存储整图信息,根据顶点的入度设置该顶点的标志位is_trivial,入度大于给定阈值λ则设置is_trivial=true,否则设置is_trivial=false;Then store the whole image information, set the flag bit is_trivial of the vertex according to the in-degree of the vertex, set is_trivial=true if the in-degree is greater than the given threshold λ, otherwise set is_trivial=false;
    S1.2:设定重排规则,并选定种子点。S1.2: Set rearrangement rules and select seed points.
  3. 根据权利要求2所述轻量级的高效图顶点重排方法,其特征在于,存储整图信息采用压缩稀疏行的方式进行存储。The lightweight and high-efficiency graph vertex rearrangement method according to claim 2, characterized in that the storage of the whole graph information is performed by compressing sparse rows.
  4. 根据权利要求3所述轻量级的高效图顶点重排方法,其特征在于,S1.2具体为:设定从需要进行ID重排的图G的原ID最小的顶点开始进行ID重排,选定种子点seed,其中,原ID最小的顶点为初始种子点。The lightweight high-efficiency graph vertex rearrangement method according to claim 3, wherein, S1.2 is specifically: setting the ID rearrangement from the vertex with the smallest original ID of the graph G that needs to be ID rearranged, Select the seed point seed, where the vertex with the smallest original ID is the initial seed point.
  5. 根据权利要求1-4任一项所述轻量级的高效图顶点重排方法,其特征在于,S2具体为:The lightweight high-efficiency graph vertex rearrangement method according to any one of claims 1-4, wherein S2 is specifically:
    以种子点seed为中心,将其k跳内的低入度且还未分配新ID的顶点合并成超顶点H。Taking the seed point seed as the center, merge the vertices with low in-degree within k hops that have not yet been assigned a new ID into a super vertex H.
  6. 根据权利要求5所述轻量级的高效图顶点重排方法,其特征在于,超顶点的生成包括:The lightweight high-efficiency graph vertex rearrangement method according to claim 5, wherein the generation of the super-vertices comprises:
    S2.1:初始化需进行ID重排的图的所有顶点ID分配标志位assigned=false,置所有顶点新ID为Φ(*)=-1,种子点seed=-1,设辅助变量move_id=1,表示当前可分配的新ID;辅助变量v=-1,表示当前可参与分配的顶点原ID;S2.1: Initialize the ID allocation flag of all vertices in the graph that needs to be rearranged by ID assigned=false, set the new ID of all vertices to Φ(*)=-1, seed point seed=-1, set auxiliary variable move_id=1 , indicating the new ID that can be assigned at present; the auxiliary variable v=-1, indicating the original ID of the vertices that can be assigned at present;
    S2.2:通过超顶点生成函数Fusion(seed)得到超顶点H。S2.2: Obtain the hypervertex H through the hypervertex generating function Fusion(seed).
  7. 根据权利要求6所述轻量级的高效图顶点重排方法,其特征在于,S2.2包括以下步骤:The lightweight and efficient graph vertex rearrangement method according to claim 6, wherein S2.2 comprises the following steps:
    S2.2.1:输入种子点seed,设置超顶点H={seed};S2.2.1: Input the seed point seed, and set the super vertex H={seed};
    S2.2.2:
    Figure PCTCN2020125962-appb-100001
    其中,N Hv表示H的邻居顶点集,检查顶点u的标志位is_trivial,标志位is_trivial为false则跳过对该顶点的进一步操作,标志位is_trivial为true则执行S2.2.3;
    S2.2.2:
    Figure PCTCN2020125962-appb-100001
    Among them, N Hv represents the set of neighbor vertices of H, check the flag bit is_trivial of vertex u, if the flag bit is_trivial is false, skip the further operation of the vertex, if the flag bit is_trivial is true, execute S2.2.3;
    S2.2.3:检查顶点u的分配标志位assigned,分配标志位assigned为true则跳过对该顶点的进一步操作,分配标志位assigned为false则将该点添加至H中;S2.2.3: Check the assigned flag bit assigned of vertex u. If the assigned flag bit assigned is true, the further operation of the vertex will be skipped. If the assigned flag bit assigned is false, the point will be added to H;
    S2.2.4:逐个对其他相连出边邻居重复S2.2.1-S2.2.3,合并所有未分配邻居顶点后,合并跳数加一;S2.2.4: Repeat S2.2.1-S2.2.3 for other connected outgoing neighbors one by one. After merging all unassigned neighbor vertices, add one to the number of merged hops;
    S2.2.5:对H中新添的顶点重复执行S2.2.1-S2.2.4,直至跳数hop达到给定的合并跳数k,将种子点k跳内的低入度顶点合并入H后输出。S2.2.5: Repeat S2.2.1-S2.2.4 for the newly added vertices in H until the number of hops hop reaches the given number of merged hops k, and merge the low-in-degree vertices within the seed point k hops into H and then output .
  8. 根据权利要求7所述轻量级的高效图顶点重排方法,其特征在于,S3包括以下步骤:The lightweight high-efficiency graph vertex rearrangement method according to claim 7, characterized in that, S3 comprises the following steps:
    S3.1:先为S2生成的超顶点H分配连续ID,再为超顶点H的邻居顶点分配新ID;更新以上已获得新ID的顶点的分配标志位assigned=true;S3.1: First assign a continuous ID to the super vertex H generated by S2, and then assign a new ID to the neighbor vertices of the super vertex H; update the assignment flag bit assigned=true of the vertices that have obtained the new ID above;
    S3.2:从超顶点H的邻居顶点中选择一个顶点当成种子点seed,当种子点seed的某个连通分量的顶点全部获得新ID后,选择剩余连通分量中原ID最小的顶点当成种子点seed,执行S2,直至所有顶点均获得唯一的新ID。S3.2: Select a vertex from the neighbor vertices of the super vertex H as the seed point seed. When all the vertices of a connected component of the seed point seed have obtained new IDs, select the vertex with the smallest original ID among the remaining connected components as the seed point seed. , perform S2 until all vertices have unique new IDs.
  9. 根据权利要求8所述轻量级的高效图顶点重排方法,其特征在于,为超顶点H的邻居顶点分配新ID的分配规则为:首先为高入度邻居顶点分配连续ID,再为低入度邻居顶点分配连续ID。The lightweight high-efficiency graph vertex rearrangement method according to claim 8, characterized in that, the allocation rule for allocating new IDs for the neighbor vertices of the hypervertex H is: firstly, allocating continuous IDs for high-in-degree neighbor vertices, and then for low-in-degree neighbor vertices. In-degree neighbor vertices are assigned consecutive IDs.
  10. 根据权利要求1或9任一项所述轻量级的高效图顶点重排方法,其特征在于,种子点的选定方法如下:The lightweight high-efficiency graph vertex rearrangement method according to any one of claims 1 or 9, wherein the method for selecting seed points is as follows:
    (1)当需进行ID重排的图的连通分量还有未分配新ID的顶点时,选择已经分配新ID的顶点的一个未分配ID的顶点当成种子点;(1) when the connected component of the graph that needs to carry out ID rearrangement also has the vertex that does not have the new ID assigned, select a vertex that does not assign the ID of the vertex that has been assigned the new ID as the seed point;
    (2)当需进行ID重排的图的连通分量所有顶点均获得了新ID,从剩下的、未进行ID分配的连通分量中选择一个顶点当成种子点;(2) When all the vertices of the connected components of the graph that need to carry out ID rearrangement have all obtained new IDs, select a vertex as a seed point from the remaining connected components that do not carry out ID allocation;
    (3)以上(1)、(2)符合要求的顶点若有多个,则选择原ID相对最小的顶点当成种子点;(3) If there are multiple vertices that meet the requirements of (1) and (2) above, select the vertex with the relatively smallest original ID as the seed point;
    (4)初始种子点的选取是根据以上的(2)进行的。(4) The selection of the initial seed point is carried out according to the above (2).
PCT/CN2020/125962 2020-10-21 2020-11-02 Lightweight and efficient graph vertex rearrangement method WO2022082860A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011134445.0 2020-10-21
CN202011134445.0A CN112380397B (en) 2020-10-21 2020-10-21 Lightweight efficient graph vertex rearrangement method

Publications (1)

Publication Number Publication Date
WO2022082860A1 true WO2022082860A1 (en) 2022-04-28

Family

ID=74580467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125962 WO2022082860A1 (en) 2020-10-21 2020-11-02 Lightweight and efficient graph vertex rearrangement method

Country Status (2)

Country Link
CN (1) CN112380397B (en)
WO (1) WO2022082860A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844702A (en) * 2017-01-18 2017-06-13 中山大学 A kind of non-directed graph based on MCMC processes resets sequence algorithm
CN109656798A (en) * 2018-12-26 2019-04-19 中国人民解放军国防科技大学 Vertex reordering-based big data processing capability test method for supercomputer
US20200226124A1 (en) * 2020-03-27 2020-07-16 Intel Corporation Edge batch reordering for streaming graph analytics

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101556060B1 (en) * 2014-04-28 2015-10-01 포항공과대학교 산학협력단 Method and system for searching subgraph isomorphism using candidate region exploration
CN109033159A (en) * 2018-06-15 2018-12-18 华中科技大学 A kind of diagram data layout method based on vertex influence power

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844702A (en) * 2017-01-18 2017-06-13 中山大学 A kind of non-directed graph based on MCMC processes resets sequence algorithm
CN109656798A (en) * 2018-12-26 2019-04-19 中国人民解放军国防科技大学 Vertex reordering-based big data processing capability test method for supercomputer
US20200226124A1 (en) * 2020-03-27 2020-07-16 Intel Corporation Edge batch reordering for streaming graph analytics

Also Published As

Publication number Publication date
CN112380397B (en) 2023-12-12
CN112380397A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN110704360B (en) Graph calculation optimization method based on heterogeneous FPGA data flow
Liu et al. Powerwalk: Scalable personalized pagerank via random walks with vertex-centric decomposition
JP2010033561A (en) Method and apparatus for partitioning and sorting data set on multiprocessor system
KR102147356B1 (en) Cache memory system and operating method for the same
JP2000187668A (en) Grouping method and overlap excluding method
Dehne et al. Efficient external memory algorithms by simulating coarse-grained parallel algorithms
US10127281B2 (en) Dynamic hash table size estimation during database aggregation processing
CN111292225B (en) Partitioning graphics data for large-scale graphics processing
US20230281157A1 (en) Post-exascale graph computing method, system, storage medium and electronic device thereof
Awad et al. Dynamic graphs on the GPU
Cui et al. On efficient external-memory triangle listing
JP6418431B2 (en) Method for efficient one-to-one coupling
JP7381429B2 (en) Storage system and method for accelerating hierarchical sorting around storage
CN113419861B (en) GPU card group-oriented graph traversal hybrid load balancing method
WO2022082860A1 (en) Lightweight and efficient graph vertex rearrangement method
TWI690848B (en) Memory processor-based multiprocessing architecture and operation method thereof
Zeng et al. Htc: Hybrid vertex-parallel and edge-parallel triangle counting
CN109741421B (en) GPU-based dynamic graph coloring method
Dong et al. High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems
US8059123B1 (en) Parallel processing system, method, and computer program product for postponing the execution of primitive intersection
WO2015143708A1 (en) Method and apparatus for constructing suffix array
Ibrahim et al. Improvement of data throughput in data-intensive cloud computing applications
Kang et al. Isomorphic strategy for processor allocation in k-ary n-cube systems
CN106383791B (en) A kind of memory block combined method and device based on nonuniform memory access framework
JP2009116855A (en) Distributed storage management program, distributed storage management apparatus and distributed storage management method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20958440

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180723)

122 Ep: pct application non-entry in european phase

Ref document number: 20958440

Country of ref document: EP

Kind code of ref document: A1