CN111539534B

CN111539534B - General distributed graph processing method and system based on reinforcement learning

Info

Publication number: CN111539534B
Application number: CN202010462112.4A
Authority: CN
Inventors: 周池; 罗鹃云; 毛睿
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2023-03-21
Anticipated expiration: 2040-05-27
Also published as: WO2021238305A1; CN111539534A

Abstract

The invention discloses a general distributed graph processing method and system based on reinforcement learning, wherein a distributed data processing center is defined based on graph theory to form a distributed graph, a preset graph cutting model and a preset graph processing model are utilized, the distributed graph is cut in a reinforcement learning mode based on preset constraint conditions, a learning automaton is distributed to each vertex, the most suitable data processing center is found for the vertex through training, the possibility of each vertex in all the data processing centers is subjected to certain probability distribution, the whole system comprises five steps of action selection, vertex migration, score calculation, reinforcement signal calculation and probability updating in each iteration process, the maximum iteration times are reached or the constraint conditions are converged, and the iteration is judged to be finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a general distributed graph model, and for different optimization targets, only different fraction calculation schemes and different weight vectors need to be designed.

Description

General distributed graph processing method and system based on reinforcement learning

Technical Field

The invention relates to the field of large-scale graph segmentation processing, in particular to a general distributed graph processing method and system based on reinforcement learning.

Background

In order to efficiently perform large-scale graph processing, a graph generally needs to be divided so that the divided subgraphs can be processed in parallel. There are several classical models for large-scale graph segmentation:

heuristic models, such as Pregel and PowerGraph, which are traditional mainstream large-scale graph processing systems, adopt heuristic segmentation algorithms. The Pregel default partitioning method is to perform a modulo operation on the Hash value of the vertex id to achieve the purpose of enhancing the locality of the partition and reducing the optimization target of the network traffic among the computing nodes. PowerGraph defaults to greedy point-slicing, and for a newly added edge, if a certain vertex of the newly added edge exists on a certain machine, the edge is distributed to the corresponding machine, so that the number of edges crossing the machine is minimized, and the communication traffic is reduced. The heuristic graph segmentation algorithm is easy to fall into a local optimal solution, and a better solution space is not searched.

The machine learning model, phamet et al, proposes a graph partitioning method, specifically, allocating operations (nodes) on a Tensorflow computation graph to available devices to minimize computation time. They employ a reinforcement learning model, which assigns operations using seq2seq strategy. This approach is only applicable to a small number of graph nodes, so the policy space is not too large. Naziet et al propose an algorithm GAP that solves the graph partitioning problem with deep learning. GAP is an unsupervised learning method, and solves the problem of partitioning a balance graph as a vertex classification problem. However, if the optimization goal involves the heterogeneity of network price and bandwidth, the calculation of embeddings of nodes is complicated. The existing machine learning models for graph segmentation are single in application scene, and when the graph scale is large and the optimization target is more complex, the method cannot well solve the graph segmentation problem.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the defects that the graph cutting model in the prior art is easy to fall into a locally optimal solution, and the cutting effect is poor due to a single use scene, so as to provide a general distributed graph processing method and system based on reinforcement learning.

In order to achieve the purpose, the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a generalized distributed graph processing method based on reinforcement learning, including the following steps: defining a distributed data processing center to form a distributed graph based on graph theory, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model;

distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for each vertex according to a preset action selection method by the learning automaton based on the initialized probability;

the learning automaton selects the data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed;

each learning automaton calculates the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;

each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector;

updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal, and guiding the next action selection to iterate;

and generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.

In an embodiment, the preset graph cut model is a hybrid-cut graph cut model, the preset graph processing model is a GAS graph processing model, vertex calculation is iteratively performed by using the GAS graph processing model, and the constraint condition is that the capital budget cost and the data transmission time are minimum.

In one embodiment, the data transfer time is expressed as the sum of the data transfer times of the collection phase and the application phase, and the data transfer time T (i) of the ith iteration is calculated by the formula:

wherein,

wherein,

at 1, it indicates that the vertex v in the data processing center DCr is master,

when 0, it means that the vertex v in DCr is master;

when the value is 1, it indicates that the vertex v in DCr is high-degree,

when 0, vertex v in DCr is low-degree;

indicating the collection r phase from DC in the ith iteration _r Transfers the size of the data volume to the master vertex v in the copy of (1);

a _v (i) Representing the size of the amount of data sent from the master vertex v to each replica in the application phase in the ith iteration;

U _r /D _r representing the upload/download bandwidth of the DCr;

R _v a set of data processing centres DC representing replicas containing v;

the communication cost between the data processing centres DC is the sum of the costs of uploading data in the collection phase and the application phase, from the DC _r The cost of a unit for uploading data to the network is P _r The capital budget cost is expressed as:

the constraint conditions are as follows:

min T(i) (3)

C _comm (i)≤B (4)

where B is the capital budget for using network resources.

In an embodiment, the step of initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for the vertex by the learning automaton according to a preset action selection method includes:

initializing the vertices v at the data processing center DC _i Probability of P (v) _i ) Is composed of

M is the number of distributed DCs;

obtaining the cumulative probability of the vertex to each data processing center DC according to the probability distribution of the vertex, Q (v) _i ) Representing vertex v at data processing center DC _i The cumulative probability of (a), wherein,

randomly generating a floating-point number r epsilon [0,1 ]]If r is less than or equal to Q (v) ₀ ) Then DC0 will be selected; if r is between Q (v) _k-1 ) And Q (v) _k ) And (k is more than or equal to 1), selecting the data processing center DC k.

presetting a trial-and-error parameter tau, randomly generating a floating point number r epsilon [0,1]If r is less than or equal to tau, the learning automaton randomly selects a DC for the vertex; if r is>τ, the learning automaton selects P (v) for its vertex _i ) The data processing center DC with the largest value.

In one embodiment, each learning automaton computes a score for its vertex at each data processing center, by the following formula:

wherein,

denotes the score of vertex v at DCi, B denotes the capital budget for using network resources, T _b Data transfer time of the whole system before score calculation, C _b Representing the data transfer cost of the system as a whole before the score is calculated,

indicating the data transfer time of the system as a whole when computing the vertices at the DCi,

representing the data transmission cost of the whole system when the vertex is calculated at the DCi, and tw and cw respectively represent a time weight and a capital cost weight; at C _b When the number of iterations is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the number of iterations, and tw is uniformly increased from 0 to 1 along with the increase of the number of iterations; when C is present _b <And B, tw is uniformly reduced from 1 to 0 along with the increase of the iteration number, and cw is uniformly increased from 0 to 1 along with the increase of the iteration number.

Each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector, wherein the steps comprise:

a reference standard for calculating the weight vector is calculated by the following formula:

wherein,

label p representing the propagation of when vertex u receives its neighbor v _v When it is used, itReference criterion for calculating weight vectors, p _v Representing the DC corresponding to the maximum fraction of the vertex v, nbr (v) representing the neighbor vertex set of the vertex v;

to move the vertex v to p _v Then move the vertex u to ρ _v The overall data transmission time of the rear system;

indicating that the vertex v is moved to ρ _v The overall data transmission time of the rear system;

indicating that the vertex v is moved to ρ _v The capital cost of the rear system as a whole;

to move the vertex v to p _v Then move the vertex u to ρ _v The capital cost of the rear system as a whole;

after the vertex u is calculated with the reference standard, the weight vector updating formula is as follows:

representing vertex u vs. DC ρ _v The weight vector of (a), initialized to 0;

after calculating the weight vectors of the vertexes for all the data processing centers, the learning automaton calculates corresponding strengthening signals according to the weight vectors, and the calculation formula is as follows:

the value of the vertex u to the strengthening signal of the data processing center DCi is 0 or 1, which respectively represents the reward signal and the penalty signal,

the weight vector representing the top u for data processing center DC i is initialized to 0.

In one embodiment, before updating the probability value of the vertex in each data processing center, the regularization weight is acquired and divided into an incentive regularization weight and a penalty regularization weight, wherein:

the reward regularization weight representing vertex v for DCi is calculated by the following formula:

wherein Neg () is an inverting function,

representing the enhancement signal of the vertex v to the data processing centre DCi,

representing the weight vector of vertex v to DC i,

represents the weight vector of vertex v for DCk;

represents the penalty regularization weight of vertex v for DCi by the following equationCalculating the formula:

wherein,

represents the weight vector of vertex v to DC i,

representing the weight vector of vertex v to DCk.

In one embodiment, the probability of vertex v is updated based on regularization weights, the order of update proceeding from small to large with respect to the reward regularization weight for data processing center DC, given vertex v and DC _i ，

Smallest among all reward regularization weights, priority use

Probability updating is carried out on all DCs, and the updating formula is as follows:

wherein,

representing the probability of the vertex v to the DC i in the nth iteration, wherein alpha represents the rewarding weight, n is the iteration number, and j and i are both vertexes;

study on the followingThe learning automaton finds larger ones in turn

Then using the probability updating method to perform probability updating on all DCs; the learning automaton updates vertices for their reinforcement signals

The update order proceeds from small to large according to the penalty regularization weight for DC, assuming given vertex v and DC _i 、DC k，

The largest of all the penalty regularization weights,

smallest among all penalty regularization weights, priority use

wherein the penalty weight is represented by the beta value,

representing the probability of the vertex v to DC j in the nth iteration, wherein n is the iteration number, and j and i are both vertexes;

then the learning automaton will find larger ones in turn

And corresponding DC k, reuse

Performing probability updating on all DCs; if the preset iteration times are reached or the constraint condition is converged, the iteration is finished; otherwise, entering N +1 iterations, and selecting the action in the N +1 th iteration by taking the probability updated by the nth iteration as a reference.

In a second aspect, an embodiment of the present invention provides a reinforcement learning-based general distributed graph processing system, including:

the distributed graph definition and constraint condition setting module is used for defining a distributed data processing center based on graph theory to form a distributed graph, and cutting the distributed graph based on preset constraint conditions by utilizing a preset graph cutting model and a preset graph processing model;

the action selection module is used for distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the maximum probability for each vertex by the learning automaton according to a preset action selection method based on the initialized probability;

the vertex migration module is used for selecting the data processing center with the maximum probability for the vertex, comparing the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, migrating the vertex to the data processing center corresponding to the action, and otherwise, not performing any operation;

the score calculation module is used for calculating the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;

each learning automaton is used for transmitting the number of the data processing center corresponding to the maximum score to the learning automaton to which the neighbor of the vertex of the learning automaton belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex of the learning automaton according to the weight vector;

the probability updating module is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to iterate;

and the segmentation result acquisition module is used for generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the reinforcement learning-based general distributed graph processing method according to the first aspect of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer device, including: the apparatus comprises a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the reinforcement learning-based general distributed graph processing method according to the first aspect of the embodiments of the present invention.

The technical scheme of the invention has the following advantages:

the invention provides a general distributed graph processing method and system based on reinforcement learning, which define distributed data processing centers to form a distributed graph based on graph theory, utilize a preset graph cutting model and a preset graph processing model, cut the distributed graph by a reinforcement learning mode based on preset constraint conditions, allocate a learning automaton to each vertex, find the most suitable data processing center for the vertex through training, the possibility of each vertex in all the data processing centers obeys certain probability distribution, the whole system comprises five steps of action selection, vertex migration, score calculation, reinforcement signal calculation and probability updating in each iteration process, the maximum iteration times or constraint condition convergence is reached, and the iteration is judged to be finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a distributed graph model with better adaptivity, and different fraction calculation schemes and different weight vectors only need to be designed for different optimization targets.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart illustrating a generalized distributed graph processing method based on reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an iteration process based on a reinforcement learning graph segmentation process according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a specific example of a generalized distributed graph processing system based on reinforcement learning in an embodiment of the present invention;

fig. 4 is a block diagram of a specific example of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

The embodiment of the invention provides a general distributed graph processing method based on reinforcement learning, which can be applied to different optimization targets, such as performance and cost optimization, load balancing, performance optimization and the like of a geographic distributed graph processing system, and as shown in fig. 1, the method comprises the following steps:

step S10: the distributed data processing center is defined based on graph theory to form a distributed graph, and the distributed graph is cut by utilizing a preset graph cutting model and a preset graph processing model and based on preset constraint conditions and preset constraint conditions.

The embodiment of the invention takes the geographical distribution type graph segmentation processing process as an example, and assumes that the vertex data is not backed up on a data processing center (hereinafter referred to as DC), and one machine can only execute the graph processing task of one vertex at a time; the computational resources of each DC are not limited, while the data communication between the DCs is a performance bottleneck for the geographically distributed graph processing; assuming that the connection between the DC is free of network congestion, the bottleneck of the network comes only from the uplink (uplink) and downlink (downlink) bandwidth between the DC and the WAN; only a fee is charged for uploading data from the DC to the WAN. Considering the possible conflicting contradictory between cost and performance: when the bandwidth of uplink is large, the transmission data on the link can be increased to achieve the purpose of reducing the transmission time, but the price of the link may be relatively high to make the cost high, so that the graph partitioning needs to be performed by optimizing the performance and the cost at the same time as the optimization target.

First, a graph G (V, E) is defined, V being a set of vertices and E being a set of edges, considering M geographically distributed data processing centers (hereinafter DC), each vertex V having an initial position Lv (Lv ∈ (0, 1, \ 8230; M-1),

indicating that the vertex v is a master vertex,

indicating that the vertex is not a master vertex, rv is a DC set of replicated vertices containing vertex v, ur is the bandwidth of the uplink, and Dr is the bandwidth of the downlink.

The embodiment of the invention uses a hybrid-cut graph cutting model, and the cutting model follows the following rules: given a threshold value theta, for vertex v, if its in-degree is greater than or equal to theta, it is called a high-degree type vertex, and conversely, it is called a low-degree vertex. If vertex v is low-degree, all its incoming edges will be assigned to the DC where it resides, and if vertex v is high-degree, its incoming edges will be assigned to the DC where the edge-to-end vertex resides.

Embodiments of the present invention use a GAS graph processing model that iteratively performs user-defined vertex calculations. There are three stages of computation in each GAS iteration, namely Gather (Gather), apply (Apply), and Scatter (Scatter). In the collection phase, each active vertex collects data of neighbors, and a summation function (Sum) is defined to aggregate the received data into a set of gathers (thermally Sum). In the application phase, each active vertex uses the aggregation and updates its data. In the divergence phase, each active vertex activates its neighbors that it executes in the next iteration. A global barrier (global barrier) is defined to ensure that all vertices complete their computation before starting the next step.

The transfer time in the ith iteration may be expressed as the sum of the data transfer times of the gather and apply phases. The calculation formula of the transmission time of the ith iteration is as follows:

wherein,

and is 1, it means that the vertex v in the data processing center DCr is master,

when 0, it means that the vertex v in DCr is master;

when the value is 1, it indicates that the vertex v in DCr is high-degree,

when 0, vertex v in DCr is low-degree;

indicating the collection r phase from DC in the ith iteration _r Transferring the size of the data volume to the master vertex v in the copy of (1);

U _r /D _r representing the upload/download bandwidth of the DCr;

R _v a set of data processing centres DC representing replicas containing v;

the cost of communication between DCs is the sum of the cost of uploading data during the gather phase and the application phase, and the cost of a unit defining the uploading of data from DC r to the Internet is P _r The total communication cost may be expressed as:

the geographical distribution graph segmentation problem is expressed as a constraint optimization problem, namely the constraint conditions are as follows:

min T(i) (3)

C _comm (i)≤B (4)

the geography distribution graph segmentation problem to be solved is the optimization problem under the constraint condition described by the formulas (3) and (4).

After defining the meaning represented by each element of the geographic distribution graph, each vertex is required to be assigned with a learning automaton (hereinafter, referred to as LA), a DC which is most suitable for the vertex is found by training, the probability of each vertex at all DCs obeys a certain probability distribution, and in each iteration process, the method mainly comprises the following steps: five steps of action selection, vertex migration, score calculation, strengthening signal generation and probability updating are performed, the whole work flow chart in the process of optimizing the performance and the cost of the geographic distributed graph processing system is shown in figure 2, and the main functions of each step and the connection among the steps are described as follows.

Step S11: and allocating a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the maximum probability for the vertex by the learning automaton according to a preset action selection method based on the initialized probability.

In the embodiment of the present invention, the following are defined: p (v) _i ) Denotes the probability of vertex v at DC i, initialized to

M is the number of distributed DCs, Q (v) _i ) Represents the cumulative probability of vertex v at DC i, calculated as follows:

in one embodiment, the LA selects the appropriate action (DC) for its vertex using a roulette algorithm. LA first obtains the accumulation probability of the vertex to each DC according to the probability distribution of the vertex, and then randomly generates a floating point number r epsilon [0,1]. If r is less than or equal to Q (v) ₀ ) Then DC0 will be selected; if r is between Q (v) _k-1 ) And Q (v) _k ) (k ≧ 1), DCk will be selected. In this way, actions with greater probability have a greater chance of being selected, but actions with less probability may also be selected. When the action selected by LA (action with high probability) is performed, the graph segmentation result is more likely to be performed towards the direction of the optimization target; when the LA chooses bad actions (actions with small probability), this process is a trial and error process, and at the present time seemingly bad choices may explore a better state space.

In another embodiment, the action selection may also take another way: defining a trial and error parameter τ =0.1; randomly generating a floating-point number r epsilon [0,1 ]]. If r is less than or equal to tau, LA will choose a DC for its vertex randomly; if r is>τ, LA will select P (v) for its vertex _i ) The highest value DC.

Step S12: the learning automaton selects the data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed.

In the embodiment LA of the present invention, the action obtained in step S11 is compared with the DC where the vertex is currently located, and if the action is not consistent with the DC, the vertex is migrated to the DC corresponding to the action, otherwise, no operation is performed.

Step S13: and each learning automatic machine calculates the score of the vertex of each learning automatic machine in each data processing center, and the score is determined according to the preset constraint condition.

For each LA, the embodiment of the present invention calculates the score of the vertex at each DC for its vertex, and first defines L _v Indicating the DC, T at which the vertex v is currently located _b The data transmission time of the whole system before the score is calculated according to the formula (1),

denotes the data transmission time of the system as a whole when the vertex is calculated at DC i, C _b The data transmission cost of the whole system before the score is calculated is obtained according to the formula (2),

indicating the data transmission cost of the system as a whole when calculating the vertex at DC i.

And

the calculation method is as follows: moving the vertex v to DC i, calculating according to the formula (1) and the formula (2) respectively, and finally moving the vertex v back to L _v 。

Representing the fraction of the vertex v at DC i, the calculation method is as follows:

in equation (5), B represents the capital budget, and tw and cw represent the time weight and the cost weight, respectively. At C _b When the system cost is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the iteration times, and tw is uniformly increased from 0 to 1 along with the increase of the iteration times, so that the overall communication cost of the graph processing system is optimized preferentially, and more graph partition states capable of reducing the system cost are explored; when C is present _b <And B, uniformly reducing tw from 1 to 0 along with the increase of the iteration times, uniformly increasing cw from 0 to 1 along with the increase of the iteration times, and aiming at preferentially optimizing the data transmission time of the whole graph processing system and slowing down the optimization speed of the transmission time so as to achieve better optimization effect.

Step 14: and each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector.

In practice, each LA will communicate with other LAs to generate a reinforcement signal for all DCs for its vertices, and the weight vectors for all DCs for the vertices need to be calculated before the reinforcement signal is calculated. After each LA calculates the scores of all the DCs, the DC number corresponding to the maximum score is transmitted to the LAs to which the neighbors of the vertex belong, and the LAs immediately generate corresponding weight term vectors.

In the present embodiment, ρ is defined _v Represents the DC corresponding to the maximum fraction of vertex v, nbr (v) represents the set of neighbor vertices of vertex v,

label p representing the propagation of when vertex u receives its neighbor v _v Then, it calculates the reference standard of the weight vector, and the calculation formula is as follows:

note that tw, cw, and sign (B-C) _b ) Are the same as the values of equation (5) in step S13, since they are in the same iteration. After the vertex u is calculated with the reference standard, the weight vector updating formula is as follows:

representing vertex u vs. DC ρ _v The weight vector of (a), initialized to 0;

after calculating the weight vectors of the vertices for all DCs, LA calculates the corresponding enhancement signals according to the weight vectors, and the formula is as follows:

wherein,

the weight vector representing the top u to the data processing center DC i is initialized to 0.

Step 15: and updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal, and guiding the next action selection to iterate.

In this embodiment, the LA updates the probability value of its vertex at each DC by using the weight vector obtained in step 14 and the enhancement signal, so as to guide the next action selection. Before this, the regularization weight needs to be calculated, and the regularization weight is divided into two parts, namely reward regularization weight and penalty regularization weight.

Definition of the present embodiment

Representing the bonus regularization weights of vertex v to DC i,

represents the penalty regularization weight of vertex v for DC i. Wherein,

the calculation method of (2) is as follows:

where Neg () is an inverting function.

The calculation method of (2) is as follows:

represents the weight vector of vertex v to DC i,

representing the weight vector of vertex v to DCk.

After the regularization weights are obtained, the present embodiment may begin to update the probability of the vertex v. Definition of

Representing the probability of vertex v for DC i in the nth iteration, LA will first update the vertex for its reinforcement signal as

The update sequence proceeds from small to large according to the reward regularization weight for DC. Given the vertices v and DC i,

the smallest of all reward regularization weights is used preferentially

where α represents the reward weight, equation (11) increases the probability of DC i and decreases the probability of other DCs. Then, LA will find larger LA in turn

It is then used to perform probability updates on all DCs. The advantage of this embodiment is that it ultimately enables

The probability of the largest one DC is the largest.

The LA then updates the vertices to their enhanced signals

The update order proceeds from small to large with a penalty regularization weight for DC. Given vertex v and DC i and DC k,

the largest of all the penalty regularization weights,

the smallest among all penalty regularization weights is used preferentially

wherein the beta table represents a penalty weight for,

representing the probability of vertex v for DC j in the nth iteration, equation (12) above down-regulates the probability of DC k and increases the probabilities of the other DCs. Then, LA will find larger in turn

And corresponding DC k, reuse

Probability updates are made for all DCs. The advantage of this embodiment is that it ultimately enables

The probability of the smallest DC is minimal.

Step 16: and generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.

If the maximum iteration number is reached or the constraint condition is converged, the embodiment of the invention judges that the iteration is finished. Otherwise, entering N +1 iterations, and continuously executing operations such as vertex migration, fraction calculation, signal enhancement calculation, probability updating, next iteration and the like by selecting actions in the N +1 iteration by taking the probability updated by the nth iteration as a reference until the iteration is finished to generate a geographic distribution graph segmentation result which meets the capital budget and has extremely small data transmission time.

In order to verify the effectiveness and efficiency of the distribution graph processing method provided by the embodiment of the invention, a real graph data set is adopted on a real cloud and cloud simulator for evaluation, and 5 real graphs are specifically used: gnutella (GN), wikiVote (WV), googleWeb (GW), livejournal (LJ) and Twitter (TW), real cloud experiments are carried out on Amazon EC2 and Windows Azure two cloud platforms, and a GAS-based PowerGraph system is adopted to execute graph processing algorithms, including classic graph algorithms such as pagerank, sssp and subwraph. The distribution graph processing method provided by the embodiment of the invention is integrated in the PowerGraph, and the graph is divided during loading. The evaluation of real graphs in real geographically distributed DCs and simulation shows that compared with the performance of the most advanced geographically distributed graph processing system and the cost optimization algorithm Geo-Cut, the distribution graph processing method provided by the embodiment of the invention can reduce the inter-DC data transmission time by up to 72% and the capital cost by up to 63%, and the loads are relatively balanced.

The embodiments provided by the present invention can be applied to a number of scenarios, for example: facebook receives text, image and video data at the tb level from users around the world on a daily basis. Facebook builds four geographically distributed DCs to maintain and manage these data. If the load capacity and the system response time of the DCs are considered, the method provided by the embodiment of the invention can be used for carrying out segmentation optimization on the graph, so that the DCs can work stably and bring good experience to users. If network heterogeneity and cost budget and system performance under a geographic distributed environment are considered, the method provided by the embodiment of the invention can be used for carrying out segmentation optimization on the graph, and good performance improvement can be achieved in the aspects of transmission time and cost budget.

It should be noted that the embodiment of the present invention only uses the performance and cost advantage of the geographic distribution graph cutting process system as an example to describe the operation principle of the distribution graph processing method. In fact, the processing model formed by the distributed graph processing method provided by this embodiment is a general model, and the model can solve not only the performance and cost optimization problems of the above-mentioned geographically distributed graph processing system, but also the load balancing and performance optimization problems, and for different optimization targets, only different score calculation schemes and different weight vector calculation schemes need to be designed.

Example 2

An embodiment of the present invention provides a general distributed graph processing system based on reinforcement learning, as shown in fig. 3, including:

the distributed graph definition and constraint condition setting module 10 is used for defining a distributed data processing center based on graph theory to form a distributed graph, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model. This module executes the method described in step S10 in embodiment 1, and is not described herein again.

And the action selection module 11 is configured to allocate a learning automaton to each vertex of the distributed graph, initialize the probability of each vertex in each data processing center, and select the data processing center with the highest probability for the vertex according to a preset action selection method based on the initialized probability. This module executes the method described in step S11 in embodiment 1, which is not described herein again.

The vertex migration module 12, the learning automaton, is used to select the data processing center with the highest probability for the vertex, compare with the data processing center where the vertex is currently located, if not, migrate the vertex to the data processing center corresponding to the action, otherwise, do nothing. This module executes the method described in step S12 in embodiment 1, and is not described herein again.

And the score calculating module 13 is used for calculating the score of each learning automaton when the vertex of each learning automaton is positioned in each data processing center, and the score is determined according to the preset constraint condition. This module executes the method described in step S13 in embodiment 1, and details are not repeated here.

The reinforced signal calculation module 14 is used for transmitting the data processing center number corresponding to the maximum score to the learning automata to which the neighbors of the vertex belong to, generating corresponding weight vectors, and calculating reinforced signals corresponding to all the data processing centers for the vertex of the learning automata according to the weight vectors; this module executes the method described in step S14 in embodiment 1, and is not described herein again.

The probability updating module 15 is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to carry out iteration; this module executes the method described in step S15 in embodiment 1, and is not described herein again.

And the segmentation result acquisition module 16 is configured to generate a segmentation result of the distributed graph that meets the preset constraint condition until a preset iteration number is reached or the constraint condition is converged. This module executes the method described in step S16 in embodiment 1, and is not described herein again.

The general distributed graph processing system based on reinforcement learning provided by the embodiment of the invention defines distributed data processing centers based on graph theory to form a distributed graph, cuts the distributed graph by using a preset graph cutting model and a preset graph processing model and a reinforcement learning mode based on preset constraint conditions, allocates a learning automaton to each vertex, finds the most suitable data processing center for the vertex through training, ensures that the probability of each vertex in all the data processing centers is in accordance with certain probability distribution, comprises five steps of action selection, vertex migration, fraction calculation, reinforcement signal calculation and probability updating in each iteration process, reaches the maximum iteration times or ensures that the constraint conditions are converged, and judges the iteration is finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a general distributed graph model, and for different optimization targets, only different fraction calculation schemes and different weight vectors need to be designed.

Example 3

An embodiment of the present invention provides a computer device, as shown in fig. 4, the device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in another manner, and fig. 4 takes the connection by the bus as an example.

The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 52, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the corresponding program instructions/modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 52, that is, implements the reinforcement learning-based general distributed graph processing method in the above method embodiments.

The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 51, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 52, and when executed by the processor 51, perform the reinforcement learning-based general distributed graph processing method in embodiment 1.

The details of the computer device can be understood by referring to the corresponding related descriptions and effects in embodiment 1, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program that instructs the relevant hardware to perform the processes, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

1. A general distributed graph processing method based on reinforcement learning is characterized by comprising the following steps:

defining a distributed data processing center to form a distributed graph based on graph theory, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model;

the learning automaton selects a data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are inconsistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed;

2. The reinforcement learning-based general distributed graph processing method according to claim 1, wherein the preset graph cut model is a hybrid-cut graph cut model, the preset graph processing model is a GAS graph processing model, the GAS graph processing model is used for performing iterative vertex calculation, and the constraint condition is that a capital budget cost and a data transmission time are minimum.

3. The reinforcement learning-based generic distributed graph processing method according to claim 2, wherein the data transmission time is expressed as the sum of the data transmission times of the collection phase and the application phase, and the data transmission time T (i) of the ith iteration is calculated by the formula:

wherein,

wherein,

when 0, it means that the vertex v in DCr is master;

when the value is 1, it indicates that the vertex v in DCr is high-degree,

when 0, vertex v in DCr is low-degree;

represents the size of the transfer of the amount of data from the copy of the DCr to the master vertex v during the gather r phase in the ith iteration;

a _v (i) Represents the size of the amount of data sent from the master vertex v to each replica in the application phase in the ith iteration;

Ur/Dr represents the uploading/downloading bandwidth of DCr;

rv represents the set of data processing centers DC containing a copy of v;

the communication cost between the data processing centers DC is the sum of the cost of uploading data in the collection phase and the application phase, and the cost of uploading data from the DCr to the network is P _r The capital budget cost is expressed as:

the constraint conditions are as follows:

minT(i) (3)

C _comm (i)≤B (4)

where B is the capital budget for using network resources.

4. The generalized distributed graph processing method based on reinforcement learning of claim 3, wherein the step of initializing the probability of each vertex in each data processing center, and the learning automaton selects the data processing center with the highest probability for the vertex according to a preset action selection method comprises:

initializing the probability P (v) of a vertex v at a data processing center DCi _i ) Is composed of

P is the number of distributed DCs;

obtaining the cumulative probability of the vertex to each data processing center DC according to the probability distribution of the vertex, Q (v) _i ) Representing the cumulative probability of the vertex v at the data processing center DCi, where,

randomly generating a floating-point number r epsilon [0,1 ]]If r is less than or equal to Q (v) ₀ ) Then DC0 will be selected; if r is between Q (v) _k-1 ) And Q (v) _k ) And (k is more than or equal to 1), selecting the data processing center DCk.

5. The generalized distributed graph processing method based on reinforcement learning of claim 3, wherein the step of initializing the probability of each vertex in each data processing center, and the learning automaton selects the data processing center with the highest probability for the vertex according to a preset action selection method comprises:

presetting a trial-and-error parameter tau, randomly generating a floating point number r epsilon [0,1]If r is less than or equal to tau, the learning automaton randomly selects a DC for the vertex; if r > τ, the learning automaton selects P (v) for its vertex _i ) The data processing center DC with the largest value.

6. The generalized distributed graph processing method based on reinforcement learning according to claim 4 or 5, wherein each learning automaton calculates the score of its vertex at each data processing center by the following formula:

wherein,

representing the data transmission cost of the whole system when the vertex is calculated at the DCi, and tw and cw respectively represent a time weight and a capital cost weight; at C _b When the number of iterations is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the number of iterations, and tw is uniformly increased from 0 to 1 along with the increase of the number of iterations; when C is present _b When < B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations.

7. The reinforcement learning-based general distributed graph processing method according to claim 6, wherein each learning automaton propagates the number of the data processing center corresponding to the maximum score to the learning automaton to which the neighbor of its vertex belongs, generates a corresponding weight vector, and the learning automaton calculates the reinforcement signals corresponding to all the data processing centers for its vertex according to the weight vector, including:

wherein,

label p representing the propagation of when vertex u receives its neighbor v _v It calculates the reference criterion of the weight vector, p _v Representing the DC corresponding to the maximum fraction of the vertex v, nbr (v) representing the neighbor vertex set of the vertex v;

representing vertex u vs. DC ρ _v The weight vector of (a), initialized to 0;

after the weight vectors of the vertexes to all the data processing centers are calculated, the learning automaton calculates corresponding strengthening signals according to the weight vectors, and the calculation formula is as follows:

the weight vector representing the vertex u for the data processing center DCi is initialized to 0 and M represents the number of vertices.

8. The reinforcement learning-based general distributed graph processing method according to claim 7, wherein before updating the probability values of the vertices at each data processing center, a regularization weight is acquired, and the regularization weight is divided into a reward regularization weight and a penalty regularization weight, wherein:

wherein Neg () is an inverting function,

represents the weight vector of vertex v to DCi,

represents the weight vector of vertex v for DCk;

the penalty regularization weight representing the vertex v for the DCi is calculated by the following formula:

wherein,

represents the weight vector of vertex v to DCi,

representing the weight vector of vertex v to DCk.

9. The reinforcement learning-based general distributed graph processing method according to claim 8, wherein the probability of the vertex v is updated according to a regularization weight, the updating sequence is performed from small to large according to an incentive regularization weight for the data processing center DC, and the vertex v and the DC are given _i ，

Smallest among all reward regularization weights, priority use

wherein,

the learning automaton then finds larger ones in turn

The update order proceeds from small to large with penalizing regularization weights for DC, assuming given vertex v and DC _i 、DC k，

The largest of all the penalty regularization weights,

smallest among all penalty regularization weights, priority use

wherein the beta table represents a penalty weight for,

then the learning automaton will find larger ones in turn

And corresponding DC k, reuse

10. A generalized distributed graph processing system based on reinforcement learning, comprising:

the action selection module is used for distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for the vertex by the learning automaton according to a preset action selection method based on the initialized probability;

the system comprises a reinforced signal calculation module, a data processing center calculation module and a data processing center calculation module, wherein each learning automaton is used for transmitting a data processing center number corresponding to a maximum score to a learning automaton to which a neighbor of a vertex of the learning automaton belongs to generate a corresponding weight vector, and the learning automaton calculates reinforced signals corresponding to all data processing centers for the vertex of the learning automaton according to the weight vector;

the probability updating module is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to carry out iteration;

11. A computer-readable storage medium storing computer instructions for causing a computer to perform the reinforcement learning-based general distributed graph processing method according to any one of claims 1 to 9.

12. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the reinforcement learning based general distributed graph processing method according to any one of claims 1 to 9.