CN111539534B - General distributed graph processing method and system based on reinforcement learning - Google Patents

General distributed graph processing method and system based on reinforcement learning Download PDF

Info

Publication number
CN111539534B
CN111539534B CN202010462112.4A CN202010462112A CN111539534B CN 111539534 B CN111539534 B CN 111539534B CN 202010462112 A CN202010462112 A CN 202010462112A CN 111539534 B CN111539534 B CN 111539534B
Authority
CN
China
Prior art keywords
vertex
data processing
processing center
probability
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010462112.4A
Other languages
Chinese (zh)
Other versions
CN111539534A (en
Inventor
周池
罗鹃云
毛睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202010462112.4A priority Critical patent/CN111539534B/en
Publication of CN111539534A publication Critical patent/CN111539534A/en
Priority to PCT/CN2021/076484 priority patent/WO2021238305A1/en
Application granted granted Critical
Publication of CN111539534B publication Critical patent/CN111539534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a general distributed graph processing method and system based on reinforcement learning, wherein a distributed data processing center is defined based on graph theory to form a distributed graph, a preset graph cutting model and a preset graph processing model are utilized, the distributed graph is cut in a reinforcement learning mode based on preset constraint conditions, a learning automaton is distributed to each vertex, the most suitable data processing center is found for the vertex through training, the possibility of each vertex in all the data processing centers is subjected to certain probability distribution, the whole system comprises five steps of action selection, vertex migration, score calculation, reinforcement signal calculation and probability updating in each iteration process, the maximum iteration times are reached or the constraint conditions are converged, and the iteration is judged to be finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a general distributed graph model, and for different optimization targets, only different fraction calculation schemes and different weight vectors need to be designed.

Description

General distributed graph processing method and system based on reinforcement learning
Technical Field
The invention relates to the field of large-scale graph segmentation processing, in particular to a general distributed graph processing method and system based on reinforcement learning.
Background
In order to efficiently perform large-scale graph processing, a graph generally needs to be divided so that the divided subgraphs can be processed in parallel. There are several classical models for large-scale graph segmentation:
heuristic models, such as Pregel and PowerGraph, which are traditional mainstream large-scale graph processing systems, adopt heuristic segmentation algorithms. The Pregel default partitioning method is to perform a modulo operation on the Hash value of the vertex id to achieve the purpose of enhancing the locality of the partition and reducing the optimization target of the network traffic among the computing nodes. PowerGraph defaults to greedy point-slicing, and for a newly added edge, if a certain vertex of the newly added edge exists on a certain machine, the edge is distributed to the corresponding machine, so that the number of edges crossing the machine is minimized, and the communication traffic is reduced. The heuristic graph segmentation algorithm is easy to fall into a local optimal solution, and a better solution space is not searched.
The machine learning model, phamet et al, proposes a graph partitioning method, specifically, allocating operations (nodes) on a Tensorflow computation graph to available devices to minimize computation time. They employ a reinforcement learning model, which assigns operations using seq2seq strategy. This approach is only applicable to a small number of graph nodes, so the policy space is not too large. Naziet et al propose an algorithm GAP that solves the graph partitioning problem with deep learning. GAP is an unsupervised learning method, and solves the problem of partitioning a balance graph as a vertex classification problem. However, if the optimization goal involves the heterogeneity of network price and bandwidth, the calculation of embeddings of nodes is complicated. The existing machine learning models for graph segmentation are single in application scene, and when the graph scale is large and the optimization target is more complex, the method cannot well solve the graph segmentation problem.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defects that the graph cutting model in the prior art is easy to fall into a locally optimal solution, and the cutting effect is poor due to a single use scene, so as to provide a general distributed graph processing method and system based on reinforcement learning.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a generalized distributed graph processing method based on reinforcement learning, including the following steps: defining a distributed data processing center to form a distributed graph based on graph theory, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model;
distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for each vertex according to a preset action selection method by the learning automaton based on the initialized probability;
the learning automaton selects the data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed;
each learning automaton calculates the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;
each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector;
updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal, and guiding the next action selection to iterate;
and generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.
In an embodiment, the preset graph cut model is a hybrid-cut graph cut model, the preset graph processing model is a GAS graph processing model, vertex calculation is iteratively performed by using the GAS graph processing model, and the constraint condition is that the capital budget cost and the data transmission time are minimum.
In one embodiment, the data transfer time is expressed as the sum of the data transfer times of the collection phase and the application phase, and the data transfer time T (i) of the ith iteration is calculated by the formula:
Figure BDA0002511286800000031
wherein the content of the first and second substances,
Figure BDA0002511286800000032
Figure BDA0002511286800000033
wherein the content of the first and second substances,
Figure BDA0002511286800000034
at 1, it indicates that the vertex v in the data processing center DCr is master,
Figure BDA0002511286800000035
when 0, it means that the vertex v in DCr is master;
Figure BDA0002511286800000036
when the value is 1, it indicates that the vertex v in DCr is high-degree,
Figure BDA0002511286800000037
when 0, vertex v in DCr is low-degree;
Figure BDA0002511286800000041
indicating the collection r phase from DC in the ith iteration r Transfers the size of the data volume to the master vertex v in the copy of (1);
a v (i) Representing the size of the amount of data sent from the master vertex v to each replica in the application phase in the ith iteration;
U r /D r representing the upload/download bandwidth of the DCr;
R v a set of data processing centres DC representing replicas containing v;
the communication cost between the data processing centres DC is the sum of the costs of uploading data in the collection phase and the application phase, from the DC r The cost of a unit for uploading data to the network is P r The capital budget cost is expressed as:
Figure BDA0002511286800000042
the constraint conditions are as follows:
min T(i) (3)
C comm (i)≤B (4)
where B is the capital budget for using network resources.
In an embodiment, the step of initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for the vertex by the learning automaton according to a preset action selection method includes:
initializing the vertices v at the data processing center DC i Probability of P (v) i ) Is composed of
Figure BDA0002511286800000051
M is the number of distributed DCs;
obtaining the cumulative probability of the vertex to each data processing center DC according to the probability distribution of the vertex, Q (v) i ) Representing vertex v at data processing center DC i The cumulative probability of (a), wherein,
Figure BDA0002511286800000052
randomly generating a floating-point number r epsilon [0,1 ]]If r is less than or equal to Q (v) 0 ) Then DC0 will be selected; if r is between Q (v) k-1 ) And Q (v) k ) And (k is more than or equal to 1), selecting the data processing center DC k.
In an embodiment, the step of initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for the vertex by the learning automaton according to a preset action selection method includes:
presetting a trial-and-error parameter tau, randomly generating a floating point number r epsilon [0,1]If r is less than or equal to tau, the learning automaton randomly selects a DC for the vertex; if r is>τ, the learning automaton selects P (v) for its vertex i ) The data processing center DC with the largest value.
In one embodiment, each learning automaton computes a score for its vertex at each data processing center, by the following formula:
Figure BDA0002511286800000053
Figure BDA0002511286800000054
wherein the content of the first and second substances,
Figure BDA0002511286800000055
denotes the score of vertex v at DCi, B denotes the capital budget for using network resources, T b Data transfer time of the whole system before score calculation, C b Representing the data transfer cost of the system as a whole before the score is calculated,
Figure BDA0002511286800000056
indicating the data transfer time of the system as a whole when computing the vertices at the DCi,
Figure BDA0002511286800000061
representing the data transmission cost of the whole system when the vertex is calculated at the DCi, and tw and cw respectively represent a time weight and a capital cost weight; at C b When the number of iterations is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the number of iterations, and tw is uniformly increased from 0 to 1 along with the increase of the number of iterations; when C is present b <And B, tw is uniformly reduced from 1 to 0 along with the increase of the iteration number, and cw is uniformly increased from 0 to 1 along with the increase of the iteration number.
Each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector, wherein the steps comprise:
a reference standard for calculating the weight vector is calculated by the following formula:
Figure BDA0002511286800000062
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002511286800000063
label p representing the propagation of when vertex u receives its neighbor v v When it is used, itReference criterion for calculating weight vectors, p v Representing the DC corresponding to the maximum fraction of the vertex v, nbr (v) representing the neighbor vertex set of the vertex v;
Figure BDA0002511286800000064
to move the vertex v to p v Then move the vertex u to ρ v The overall data transmission time of the rear system;
Figure BDA0002511286800000065
indicating that the vertex v is moved to ρ v The overall data transmission time of the rear system;
Figure BDA0002511286800000066
indicating that the vertex v is moved to ρ v The capital cost of the rear system as a whole;
Figure BDA0002511286800000067
to move the vertex v to p v Then move the vertex u to ρ v The capital cost of the rear system as a whole;
after the vertex u is calculated with the reference standard, the weight vector updating formula is as follows:
Figure BDA0002511286800000068
Figure BDA0002511286800000071
representing vertex u vs. DC ρ v The weight vector of (a), initialized to 0;
after calculating the weight vectors of the vertexes for all the data processing centers, the learning automaton calculates corresponding strengthening signals according to the weight vectors, and the calculation formula is as follows:
Figure BDA0002511286800000072
Figure BDA0002511286800000073
Figure BDA0002511286800000074
the value of the vertex u to the strengthening signal of the data processing center DCi is 0 or 1, which respectively represents the reward signal and the penalty signal,
Figure BDA0002511286800000075
the weight vector representing the top u for data processing center DC i is initialized to 0.
In one embodiment, before updating the probability value of the vertex in each data processing center, the regularization weight is acquired and divided into an incentive regularization weight and a penalty regularization weight, wherein:
Figure BDA0002511286800000076
the reward regularization weight representing vertex v for DCi is calculated by the following formula:
Figure BDA0002511286800000077
wherein Neg () is an inverting function,
Figure BDA0002511286800000078
representing the enhancement signal of the vertex v to the data processing centre DCi,
Figure BDA0002511286800000079
representing the weight vector of vertex v to DC i,
Figure BDA00025112868000000710
represents the weight vector of vertex v for DCk;
Figure BDA00025112868000000711
represents the penalty regularization weight of vertex v for DCi by the following equationCalculating the formula:
Figure BDA00025112868000000712
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002511286800000081
representing the enhancement signal of the vertex v to the data processing centre DCi,
Figure BDA0002511286800000082
represents the weight vector of vertex v to DC i,
Figure BDA0002511286800000083
representing the weight vector of vertex v to DCk.
In one embodiment, the probability of vertex v is updated based on regularization weights, the order of update proceeding from small to large with respect to the reward regularization weight for data processing center DC, given vertex v and DC i
Figure BDA0002511286800000084
Figure BDA0002511286800000085
Smallest among all reward regularization weights, priority use
Figure BDA0002511286800000086
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure BDA0002511286800000087
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002511286800000088
representing the probability of the vertex v to the DC i in the nth iteration, wherein alpha represents the rewarding weight, n is the iteration number, and j and i are both vertexes;
study on the followingThe learning automaton finds larger ones in turn
Figure BDA0002511286800000089
Then using the probability updating method to perform probability updating on all DCs; the learning automaton updates vertices for their reinforcement signals
Figure BDA00025112868000000810
The update order proceeds from small to large according to the penalty regularization weight for DC, assuming given vertex v and DC i 、DC k,
Figure BDA00025112868000000811
Figure BDA00025112868000000812
The largest of all the penalty regularization weights,
Figure BDA00025112868000000813
smallest among all penalty regularization weights, priority use
Figure BDA00025112868000000814
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure BDA00025112868000000815
wherein the penalty weight is represented by the beta value,
Figure BDA00025112868000000816
representing the probability of the vertex v to DC j in the nth iteration, wherein n is the iteration number, and j and i are both vertexes;
then the learning automaton will find larger ones in turn
Figure BDA00025112868000000817
And corresponding DC k, reuse
Figure BDA00025112868000000818
Performing probability updating on all DCs; if the preset iteration times are reached or the constraint condition is converged, the iteration is finished; otherwise, entering N +1 iterations, and selecting the action in the N +1 th iteration by taking the probability updated by the nth iteration as a reference.
In a second aspect, an embodiment of the present invention provides a reinforcement learning-based general distributed graph processing system, including:
the distributed graph definition and constraint condition setting module is used for defining a distributed data processing center based on graph theory to form a distributed graph, and cutting the distributed graph based on preset constraint conditions by utilizing a preset graph cutting model and a preset graph processing model;
the action selection module is used for distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the maximum probability for each vertex by the learning automaton according to a preset action selection method based on the initialized probability;
the vertex migration module is used for selecting the data processing center with the maximum probability for the vertex, comparing the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, migrating the vertex to the data processing center corresponding to the action, and otherwise, not performing any operation;
the score calculation module is used for calculating the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;
each learning automaton is used for transmitting the number of the data processing center corresponding to the maximum score to the learning automaton to which the neighbor of the vertex of the learning automaton belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex of the learning automaton according to the weight vector;
the probability updating module is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to iterate;
and the segmentation result acquisition module is used for generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the reinforcement learning-based general distributed graph processing method according to the first aspect of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer device, including: the apparatus comprises a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the reinforcement learning-based general distributed graph processing method according to the first aspect of the embodiments of the present invention.
The technical scheme of the invention has the following advantages:
the invention provides a general distributed graph processing method and system based on reinforcement learning, which define distributed data processing centers to form a distributed graph based on graph theory, utilize a preset graph cutting model and a preset graph processing model, cut the distributed graph by a reinforcement learning mode based on preset constraint conditions, allocate a learning automaton to each vertex, find the most suitable data processing center for the vertex through training, the possibility of each vertex in all the data processing centers obeys certain probability distribution, the whole system comprises five steps of action selection, vertex migration, score calculation, reinforcement signal calculation and probability updating in each iteration process, the maximum iteration times or constraint condition convergence is reached, and the iteration is judged to be finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a distributed graph model with better adaptivity, and different fraction calculation schemes and different weight vectors only need to be designed for different optimization targets.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart illustrating a generalized distributed graph processing method based on reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an iteration process based on a reinforcement learning graph segmentation process according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of a specific example of a generalized distributed graph processing system based on reinforcement learning in an embodiment of the present invention;
fig. 4 is a block diagram of a specific example of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1
The embodiment of the invention provides a general distributed graph processing method based on reinforcement learning, which can be applied to different optimization targets, such as performance and cost optimization, load balancing, performance optimization and the like of a geographic distributed graph processing system, and as shown in fig. 1, the method comprises the following steps:
step S10: the distributed data processing center is defined based on graph theory to form a distributed graph, and the distributed graph is cut by utilizing a preset graph cutting model and a preset graph processing model and based on preset constraint conditions and preset constraint conditions.
The embodiment of the invention takes the geographical distribution type graph segmentation processing process as an example, and assumes that the vertex data is not backed up on a data processing center (hereinafter referred to as DC), and one machine can only execute the graph processing task of one vertex at a time; the computational resources of each DC are not limited, while the data communication between the DCs is a performance bottleneck for the geographically distributed graph processing; assuming that the connection between the DC is free of network congestion, the bottleneck of the network comes only from the uplink (uplink) and downlink (downlink) bandwidth between the DC and the WAN; only a fee is charged for uploading data from the DC to the WAN. Considering the possible conflicting contradictory between cost and performance: when the bandwidth of uplink is large, the transmission data on the link can be increased to achieve the purpose of reducing the transmission time, but the price of the link may be relatively high to make the cost high, so that the graph partitioning needs to be performed by optimizing the performance and the cost at the same time as the optimization target.
First, a graph G (V, E) is defined, V being a set of vertices and E being a set of edges, considering M geographically distributed data processing centers (hereinafter DC), each vertex V having an initial position Lv (Lv ∈ (0, 1, \ 8230; M-1),
Figure BDA0002511286800000131
indicating that the vertex v is a master vertex,
Figure BDA0002511286800000132
indicating that the vertex is not a master vertex, rv is a DC set of replicated vertices containing vertex v, ur is the bandwidth of the uplink, and Dr is the bandwidth of the downlink.
The embodiment of the invention uses a hybrid-cut graph cutting model, and the cutting model follows the following rules: given a threshold value theta, for vertex v, if its in-degree is greater than or equal to theta, it is called a high-degree type vertex, and conversely, it is called a low-degree vertex. If vertex v is low-degree, all its incoming edges will be assigned to the DC where it resides, and if vertex v is high-degree, its incoming edges will be assigned to the DC where the edge-to-end vertex resides.
Embodiments of the present invention use a GAS graph processing model that iteratively performs user-defined vertex calculations. There are three stages of computation in each GAS iteration, namely Gather (Gather), apply (Apply), and Scatter (Scatter). In the collection phase, each active vertex collects data of neighbors, and a summation function (Sum) is defined to aggregate the received data into a set of gathers (thermally Sum). In the application phase, each active vertex uses the aggregation and updates its data. In the divergence phase, each active vertex activates its neighbors that it executes in the next iteration. A global barrier (global barrier) is defined to ensure that all vertices complete their computation before starting the next step.
The transfer time in the ith iteration may be expressed as the sum of the data transfer times of the gather and apply phases. The calculation formula of the transmission time of the ith iteration is as follows:
Figure BDA0002511286800000133
Figure BDA0002511286800000134
Figure BDA0002511286800000135
wherein the content of the first and second substances,
Figure BDA0002511286800000141
and is 1, it means that the vertex v in the data processing center DCr is master,
Figure BDA0002511286800000142
when 0, it means that the vertex v in DCr is master;
Figure BDA0002511286800000143
when the value is 1, it indicates that the vertex v in DCr is high-degree,
Figure BDA0002511286800000144
when 0, vertex v in DCr is low-degree;
Figure BDA0002511286800000145
indicating the collection r phase from DC in the ith iteration r Transferring the size of the data volume to the master vertex v in the copy of (1);
a v (i) Representing the size of the amount of data sent from the master vertex v to each replica in the application phase in the ith iteration;
U r /D r representing the upload/download bandwidth of the DCr;
R v a set of data processing centres DC representing replicas containing v;
the cost of communication between DCs is the sum of the cost of uploading data during the gather phase and the application phase, and the cost of a unit defining the uploading of data from DC r to the Internet is P r The total communication cost may be expressed as:
Figure BDA0002511286800000146
the geographical distribution graph segmentation problem is expressed as a constraint optimization problem, namely the constraint conditions are as follows:
min T(i) (3)
C comm (i)≤B (4)
the geography distribution graph segmentation problem to be solved is the optimization problem under the constraint condition described by the formulas (3) and (4).
After defining the meaning represented by each element of the geographic distribution graph, each vertex is required to be assigned with a learning automaton (hereinafter, referred to as LA), a DC which is most suitable for the vertex is found by training, the probability of each vertex at all DCs obeys a certain probability distribution, and in each iteration process, the method mainly comprises the following steps: five steps of action selection, vertex migration, score calculation, strengthening signal generation and probability updating are performed, the whole work flow chart in the process of optimizing the performance and the cost of the geographic distributed graph processing system is shown in figure 2, and the main functions of each step and the connection among the steps are described as follows.
Step S11: and allocating a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the maximum probability for the vertex by the learning automaton according to a preset action selection method based on the initialized probability.
In the embodiment of the present invention, the following are defined: p (v) i ) Denotes the probability of vertex v at DC i, initialized to
Figure BDA0002511286800000151
M is the number of distributed DCs, Q (v) i ) Represents the cumulative probability of vertex v at DC i, calculated as follows:
Figure BDA0002511286800000152
in one embodiment, the LA selects the appropriate action (DC) for its vertex using a roulette algorithm. LA first obtains the accumulation probability of the vertex to each DC according to the probability distribution of the vertex, and then randomly generates a floating point number r epsilon [0,1]. If r is less than or equal to Q (v) 0 ) Then DC0 will be selected; if r is between Q (v) k-1 ) And Q (v) k ) (k ≧ 1), DCk will be selected. In this way, actions with greater probability have a greater chance of being selected, but actions with less probability may also be selected. When the action selected by LA (action with high probability) is performed, the graph segmentation result is more likely to be performed towards the direction of the optimization target; when the LA chooses bad actions (actions with small probability), this process is a trial and error process, and at the present time seemingly bad choices may explore a better state space.
In another embodiment, the action selection may also take another way: defining a trial and error parameter τ =0.1; randomly generating a floating-point number r epsilon [0,1 ]]. If r is less than or equal to tau, LA will choose a DC for its vertex randomly; if r is>τ, LA will select P (v) for its vertex i ) The highest value DC.
Step S12: the learning automaton selects the data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed.
In the embodiment LA of the present invention, the action obtained in step S11 is compared with the DC where the vertex is currently located, and if the action is not consistent with the DC, the vertex is migrated to the DC corresponding to the action, otherwise, no operation is performed.
Step S13: and each learning automatic machine calculates the score of the vertex of each learning automatic machine in each data processing center, and the score is determined according to the preset constraint condition.
For each LA, the embodiment of the present invention calculates the score of the vertex at each DC for its vertex, and first defines L v Indicating the DC, T at which the vertex v is currently located b The data transmission time of the whole system before the score is calculated according to the formula (1),
Figure BDA0002511286800000161
denotes the data transmission time of the system as a whole when the vertex is calculated at DC i, C b The data transmission cost of the whole system before the score is calculated is obtained according to the formula (2),
Figure BDA0002511286800000162
indicating the data transmission cost of the system as a whole when calculating the vertex at DC i.
Figure BDA0002511286800000163
And
Figure BDA0002511286800000164
the calculation method is as follows: moving the vertex v to DC i, calculating according to the formula (1) and the formula (2) respectively, and finally moving the vertex v back to L v
Figure BDA0002511286800000165
Representing the fraction of the vertex v at DC i, the calculation method is as follows:
Figure BDA0002511286800000166
Figure BDA0002511286800000171
in equation (5), B represents the capital budget, and tw and cw represent the time weight and the cost weight, respectively. At C b When the system cost is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the iteration times, and tw is uniformly increased from 0 to 1 along with the increase of the iteration times, so that the overall communication cost of the graph processing system is optimized preferentially, and more graph partition states capable of reducing the system cost are explored; when C is present b <And B, uniformly reducing tw from 1 to 0 along with the increase of the iteration times, uniformly increasing cw from 0 to 1 along with the increase of the iteration times, and aiming at preferentially optimizing the data transmission time of the whole graph processing system and slowing down the optimization speed of the transmission time so as to achieve better optimization effect.
Step 14: and each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates the strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector.
In practice, each LA will communicate with other LAs to generate a reinforcement signal for all DCs for its vertices, and the weight vectors for all DCs for the vertices need to be calculated before the reinforcement signal is calculated. After each LA calculates the scores of all the DCs, the DC number corresponding to the maximum score is transmitted to the LAs to which the neighbors of the vertex belong, and the LAs immediately generate corresponding weight term vectors.
In the present embodiment, ρ is defined v Represents the DC corresponding to the maximum fraction of vertex v, nbr (v) represents the set of neighbor vertices of vertex v,
Figure BDA0002511286800000172
to move the vertex v to p v Then move the vertex u to ρ v The overall data transmission time of the rear system;
Figure BDA0002511286800000173
indicating that the vertex v is moved to ρ v The overall data transmission time of the rear system;
Figure BDA0002511286800000174
indicating that the vertex v is moved to ρ v The capital cost of the rear system as a whole;
Figure BDA0002511286800000175
to move the vertex v to p v Then move the vertex u to ρ v The capital cost of the rear system as a whole;
Figure BDA0002511286800000176
label p representing the propagation of when vertex u receives its neighbor v v Then, it calculates the reference standard of the weight vector, and the calculation formula is as follows:
Figure BDA0002511286800000181
note that tw, cw, and sign (B-C) b ) Are the same as the values of equation (5) in step S13, since they are in the same iteration. After the vertex u is calculated with the reference standard, the weight vector updating formula is as follows:
Figure BDA0002511286800000182
Figure BDA0002511286800000183
representing vertex u vs. DC ρ v The weight vector of (a), initialized to 0;
after calculating the weight vectors of the vertices for all DCs, LA calculates the corresponding enhancement signals according to the weight vectors, and the formula is as follows:
Figure BDA0002511286800000184
Figure BDA0002511286800000185
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002511286800000186
the value of the vertex u to the strengthening signal of the data processing center DCi is 0 or 1, which respectively represents the reward signal and the penalty signal,
Figure BDA0002511286800000187
the weight vector representing the top u to the data processing center DC i is initialized to 0.
Step 15: and updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal, and guiding the next action selection to iterate.
In this embodiment, the LA updates the probability value of its vertex at each DC by using the weight vector obtained in step 14 and the enhancement signal, so as to guide the next action selection. Before this, the regularization weight needs to be calculated, and the regularization weight is divided into two parts, namely reward regularization weight and penalty regularization weight.
Definition of the present embodiment
Figure BDA0002511286800000191
Representing the bonus regularization weights of vertex v to DC i,
Figure BDA0002511286800000192
represents the penalty regularization weight of vertex v for DC i. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002511286800000193
the calculation method of (2) is as follows:
Figure BDA0002511286800000194
where Neg () is an inverting function.
Figure BDA0002511286800000195
The calculation method of (2) is as follows:
Figure BDA0002511286800000196
Figure BDA0002511286800000197
representing the enhancement signal of the vertex v to the data processing centre DCi,
Figure BDA0002511286800000198
represents the weight vector of vertex v to DC i,
Figure BDA0002511286800000199
representing the weight vector of vertex v to DCk.
After the regularization weights are obtained, the present embodiment may begin to update the probability of the vertex v. Definition of
Figure BDA00025112868000001910
Representing the probability of vertex v for DC i in the nth iteration, LA will first update the vertex for its reinforcement signal as
Figure BDA00025112868000001911
The update sequence proceeds from small to large according to the reward regularization weight for DC. Given the vertices v and DC i,
Figure BDA00025112868000001912
Figure BDA00025112868000001913
the smallest of all reward regularization weights is used preferentially
Figure BDA00025112868000001914
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure BDA0002511286800000201
where α represents the reward weight, equation (11) increases the probability of DC i and decreases the probability of other DCs. Then, LA will find larger LA in turn
Figure BDA0002511286800000202
It is then used to perform probability updates on all DCs. The advantage of this embodiment is that it ultimately enables
Figure BDA0002511286800000203
The probability of the largest one DC is the largest.
The LA then updates the vertices to their enhanced signals
Figure BDA0002511286800000204
The update order proceeds from small to large with a penalty regularization weight for DC. Given vertex v and DC i and DC k,
Figure BDA0002511286800000205
Figure BDA0002511286800000206
the largest of all the penalty regularization weights,
Figure BDA0002511286800000207
the smallest among all penalty regularization weights is used preferentially
Figure BDA0002511286800000208
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure BDA0002511286800000209
wherein the beta table represents a penalty weight for,
Figure BDA00025112868000002010
representing the probability of vertex v for DC j in the nth iteration, equation (12) above down-regulates the probability of DC k and increases the probabilities of the other DCs. Then, LA will find larger in turn
Figure BDA00025112868000002011
And corresponding DC k, reuse
Figure BDA00025112868000002012
Probability updates are made for all DCs. The advantage of this embodiment is that it ultimately enables
Figure BDA00025112868000002013
The probability of the smallest DC is minimal.
Step 16: and generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.
If the maximum iteration number is reached or the constraint condition is converged, the embodiment of the invention judges that the iteration is finished. Otherwise, entering N +1 iterations, and continuously executing operations such as vertex migration, fraction calculation, signal enhancement calculation, probability updating, next iteration and the like by selecting actions in the N +1 iteration by taking the probability updated by the nth iteration as a reference until the iteration is finished to generate a geographic distribution graph segmentation result which meets the capital budget and has extremely small data transmission time.
In order to verify the effectiveness and efficiency of the distribution graph processing method provided by the embodiment of the invention, a real graph data set is adopted on a real cloud and cloud simulator for evaluation, and 5 real graphs are specifically used: gnutella (GN), wikiVote (WV), googleWeb (GW), livejournal (LJ) and Twitter (TW), real cloud experiments are carried out on Amazon EC2 and Windows Azure two cloud platforms, and a GAS-based PowerGraph system is adopted to execute graph processing algorithms, including classic graph algorithms such as pagerank, sssp and subwraph. The distribution graph processing method provided by the embodiment of the invention is integrated in the PowerGraph, and the graph is divided during loading. The evaluation of real graphs in real geographically distributed DCs and simulation shows that compared with the performance of the most advanced geographically distributed graph processing system and the cost optimization algorithm Geo-Cut, the distribution graph processing method provided by the embodiment of the invention can reduce the inter-DC data transmission time by up to 72% and the capital cost by up to 63%, and the loads are relatively balanced.
The embodiments provided by the present invention can be applied to a number of scenarios, for example: facebook receives text, image and video data at the tb level from users around the world on a daily basis. Facebook builds four geographically distributed DCs to maintain and manage these data. If the load capacity and the system response time of the DCs are considered, the method provided by the embodiment of the invention can be used for carrying out segmentation optimization on the graph, so that the DCs can work stably and bring good experience to users. If network heterogeneity and cost budget and system performance under a geographic distributed environment are considered, the method provided by the embodiment of the invention can be used for carrying out segmentation optimization on the graph, and good performance improvement can be achieved in the aspects of transmission time and cost budget.
It should be noted that the embodiment of the present invention only uses the performance and cost advantage of the geographic distribution graph cutting process system as an example to describe the operation principle of the distribution graph processing method. In fact, the processing model formed by the distributed graph processing method provided by this embodiment is a general model, and the model can solve not only the performance and cost optimization problems of the above-mentioned geographically distributed graph processing system, but also the load balancing and performance optimization problems, and for different optimization targets, only different score calculation schemes and different weight vector calculation schemes need to be designed.
Example 2
An embodiment of the present invention provides a general distributed graph processing system based on reinforcement learning, as shown in fig. 3, including:
the distributed graph definition and constraint condition setting module 10 is used for defining a distributed data processing center based on graph theory to form a distributed graph, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model. This module executes the method described in step S10 in embodiment 1, and is not described herein again.
And the action selection module 11 is configured to allocate a learning automaton to each vertex of the distributed graph, initialize the probability of each vertex in each data processing center, and select the data processing center with the highest probability for the vertex according to a preset action selection method based on the initialized probability. This module executes the method described in step S11 in embodiment 1, which is not described herein again.
The vertex migration module 12, the learning automaton, is used to select the data processing center with the highest probability for the vertex, compare with the data processing center where the vertex is currently located, if not, migrate the vertex to the data processing center corresponding to the action, otherwise, do nothing. This module executes the method described in step S12 in embodiment 1, and is not described herein again.
And the score calculating module 13 is used for calculating the score of each learning automaton when the vertex of each learning automaton is positioned in each data processing center, and the score is determined according to the preset constraint condition. This module executes the method described in step S13 in embodiment 1, and details are not repeated here.
The reinforced signal calculation module 14 is used for transmitting the data processing center number corresponding to the maximum score to the learning automata to which the neighbors of the vertex belong to, generating corresponding weight vectors, and calculating reinforced signals corresponding to all the data processing centers for the vertex of the learning automata according to the weight vectors; this module executes the method described in step S14 in embodiment 1, and is not described herein again.
The probability updating module 15 is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to carry out iteration; this module executes the method described in step S15 in embodiment 1, and is not described herein again.
And the segmentation result acquisition module 16 is configured to generate a segmentation result of the distributed graph that meets the preset constraint condition until a preset iteration number is reached or the constraint condition is converged. This module executes the method described in step S16 in embodiment 1, and is not described herein again.
The general distributed graph processing system based on reinforcement learning provided by the embodiment of the invention defines distributed data processing centers based on graph theory to form a distributed graph, cuts the distributed graph by using a preset graph cutting model and a preset graph processing model and a reinforcement learning mode based on preset constraint conditions, allocates a learning automaton to each vertex, finds the most suitable data processing center for the vertex through training, ensures that the probability of each vertex in all the data processing centers is in accordance with certain probability distribution, comprises five steps of action selection, vertex migration, fraction calculation, reinforcement signal calculation and probability updating in each iteration process, reaches the maximum iteration times or ensures that the constraint conditions are converged, and judges the iteration is finished. The distributed graph processing model formed by the general distributed graph processing method provided by the invention is a general distributed graph model, and for different optimization targets, only different fraction calculation schemes and different weight vectors need to be designed.
Example 3
An embodiment of the present invention provides a computer device, as shown in fig. 4, the device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in another manner, and fig. 4 takes the connection by the bus as an example.
The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 52, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the corresponding program instructions/modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 52, that is, implements the reinforcement learning-based general distributed graph processing method in the above method embodiments.
The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 51, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 52, and when executed by the processor 51, perform the reinforcement learning-based general distributed graph processing method in embodiment 1.
The details of the computer device can be understood by referring to the corresponding related descriptions and effects in embodiment 1, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program that instructs the relevant hardware to perform the processes, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims (12)

1. A general distributed graph processing method based on reinforcement learning is characterized by comprising the following steps:
defining a distributed data processing center to form a distributed graph based on graph theory, and cutting the distributed graph based on preset constraint conditions by using a preset graph cutting model and a preset graph processing model;
distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for each vertex according to a preset action selection method by the learning automaton based on the initialized probability;
the learning automaton selects a data processing center with the maximum probability for the vertex, compares the data processing center with the data processing center where the vertex is located currently, if the data processing centers are inconsistent, the vertex is transferred to the data processing center corresponding to the action, and otherwise, no operation is performed;
each learning automaton calculates the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;
each learning automaton transmits the data processing center number corresponding to the maximum score to the learning automaton to which the neighbor of the vertex belongs to generate a corresponding weight vector, and the learning automaton calculates strengthening signals corresponding to all the data processing centers for the vertex according to the weight vector;
updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal, and guiding the next action selection to iterate;
and generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.
2. The reinforcement learning-based general distributed graph processing method according to claim 1, wherein the preset graph cut model is a hybrid-cut graph cut model, the preset graph processing model is a GAS graph processing model, the GAS graph processing model is used for performing iterative vertex calculation, and the constraint condition is that a capital budget cost and a data transmission time are minimum.
3. The reinforcement learning-based generic distributed graph processing method according to claim 2, wherein the data transmission time is expressed as the sum of the data transmission times of the collection phase and the application phase, and the data transmission time T (i) of the ith iteration is calculated by the formula:
Figure FDA0004014751030000021
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004014751030000022
Figure FDA0004014751030000023
wherein the content of the first and second substances,
Figure FDA0004014751030000024
at 1, it indicates that the vertex v in the data processing center DCr is master,
Figure FDA0004014751030000025
when 0, it means that the vertex v in DCr is master;
Figure FDA0004014751030000026
when the value is 1, it indicates that the vertex v in DCr is high-degree,
Figure FDA0004014751030000027
when 0, vertex v in DCr is low-degree;
Figure FDA0004014751030000028
represents the size of the transfer of the amount of data from the copy of the DCr to the master vertex v during the gather r phase in the ith iteration;
a v (i) Represents the size of the amount of data sent from the master vertex v to each replica in the application phase in the ith iteration;
Ur/Dr represents the uploading/downloading bandwidth of DCr;
rv represents the set of data processing centers DC containing a copy of v;
the communication cost between the data processing centers DC is the sum of the cost of uploading data in the collection phase and the application phase, and the cost of uploading data from the DCr to the network is P r The capital budget cost is expressed as:
Figure FDA0004014751030000031
the constraint conditions are as follows:
minT(i) (3)
C comm (i)≤B (4)
where B is the capital budget for using network resources.
4. The generalized distributed graph processing method based on reinforcement learning of claim 3, wherein the step of initializing the probability of each vertex in each data processing center, and the learning automaton selects the data processing center with the highest probability for the vertex according to a preset action selection method comprises:
initializing the probability P (v) of a vertex v at a data processing center DCi i ) Is composed of
Figure FDA0004014751030000032
P is the number of distributed DCs;
obtaining the cumulative probability of the vertex to each data processing center DC according to the probability distribution of the vertex, Q (v) i ) Representing the cumulative probability of the vertex v at the data processing center DCi, where,
Figure FDA0004014751030000033
randomly generating a floating-point number r epsilon [0,1 ]]If r is less than or equal to Q (v) 0 ) Then DC0 will be selected; if r is between Q (v) k-1 ) And Q (v) k ) And (k is more than or equal to 1), selecting the data processing center DCk.
5. The generalized distributed graph processing method based on reinforcement learning of claim 3, wherein the step of initializing the probability of each vertex in each data processing center, and the learning automaton selects the data processing center with the highest probability for the vertex according to a preset action selection method comprises:
presetting a trial-and-error parameter tau, randomly generating a floating point number r epsilon [0,1]If r is less than or equal to tau, the learning automaton randomly selects a DC for the vertex; if r > τ, the learning automaton selects P (v) for its vertex i ) The data processing center DC with the largest value.
6. The generalized distributed graph processing method based on reinforcement learning according to claim 4 or 5, wherein each learning automaton calculates the score of its vertex at each data processing center by the following formula:
Figure FDA0004014751030000041
Figure FDA0004014751030000042
wherein the content of the first and second substances,
Figure FDA0004014751030000043
denotes the score of vertex v at DCi, B denotes the capital budget for using network resources, T b Data transfer time of the whole system before score calculation, C b Representing the data transfer cost of the system as a whole before the score is calculated,
Figure FDA0004014751030000044
indicating the data transfer time of the system as a whole when computing the vertices at the DCi,
Figure FDA0004014751030000045
representing the data transmission cost of the whole system when the vertex is calculated at the DCi, and tw and cw respectively represent a time weight and a capital cost weight; at C b When the number of iterations is more than or equal to B, cw is uniformly reduced from 1 to 0 along with the increase of the number of iterations, and tw is uniformly increased from 0 to 1 along with the increase of the number of iterations; when C is present b When < B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations.
7. The reinforcement learning-based general distributed graph processing method according to claim 6, wherein each learning automaton propagates the number of the data processing center corresponding to the maximum score to the learning automaton to which the neighbor of its vertex belongs, generates a corresponding weight vector, and the learning automaton calculates the reinforcement signals corresponding to all the data processing centers for its vertex according to the weight vector, including:
a reference standard for calculating the weight vector is calculated by the following formula:
Figure FDA0004014751030000051
wherein the content of the first and second substances,
Figure FDA0004014751030000052
label p representing the propagation of when vertex u receives its neighbor v v It calculates the reference criterion of the weight vector, p v Representing the DC corresponding to the maximum fraction of the vertex v, nbr (v) representing the neighbor vertex set of the vertex v;
Figure FDA0004014751030000053
to move the vertex v to p v Then move the vertex u to ρ v The overall data transmission time of the rear system;
Figure FDA0004014751030000054
indicating that the vertex v is moved to ρ v The overall data transmission time of the rear system;
Figure FDA0004014751030000055
indicating that the vertex v is moved to ρ v The capital cost of the rear system as a whole;
Figure FDA0004014751030000056
to move the vertex v to p v Then move the vertex u to ρ v The capital cost of the rear system as a whole;
after the vertex u is calculated with the reference standard, the weight vector updating formula is as follows:
Figure FDA0004014751030000057
Figure FDA0004014751030000058
representing vertex u vs. DC ρ v The weight vector of (a), initialized to 0;
after the weight vectors of the vertexes to all the data processing centers are calculated, the learning automaton calculates corresponding strengthening signals according to the weight vectors, and the calculation formula is as follows:
Figure FDA0004014751030000059
Figure FDA00040147510300000510
Figure FDA0004014751030000061
the value of the vertex u to the strengthening signal of the data processing center DCi is 0 or 1, which respectively represents the reward signal and the penalty signal,
Figure FDA0004014751030000062
the weight vector representing the vertex u for the data processing center DCi is initialized to 0 and M represents the number of vertices.
8. The reinforcement learning-based general distributed graph processing method according to claim 7, wherein before updating the probability values of the vertices at each data processing center, a regularization weight is acquired, and the regularization weight is divided into a reward regularization weight and a penalty regularization weight, wherein:
Figure FDA0004014751030000063
the reward regularization weight representing vertex v for DCi is calculated by the following formula:
Figure FDA0004014751030000064
wherein Neg () is an inverting function,
Figure FDA0004014751030000065
representing the enhancement signal of the vertex v to the data processing centre DCi,
Figure FDA0004014751030000066
represents the weight vector of vertex v to DCi,
Figure FDA0004014751030000067
represents the weight vector of vertex v for DCk;
Figure FDA0004014751030000068
the penalty regularization weight representing the vertex v for the DCi is calculated by the following formula:
Figure FDA0004014751030000069
wherein the content of the first and second substances,
Figure FDA00040147510300000610
representing the enhancement signal of the vertex v to the data processing centre DCi,
Figure FDA00040147510300000611
represents the weight vector of vertex v to DCi,
Figure FDA00040147510300000612
representing the weight vector of vertex v to DCk.
9. The reinforcement learning-based general distributed graph processing method according to claim 8, wherein the probability of the vertex v is updated according to a regularization weight, the updating sequence is performed from small to large according to an incentive regularization weight for the data processing center DC, and the vertex v and the DC are given i
Figure FDA00040147510300000613
Figure FDA00040147510300000614
Smallest among all reward regularization weights, priority use
Figure FDA00040147510300000615
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure FDA0004014751030000071
wherein the content of the first and second substances,
Figure FDA0004014751030000072
representing the probability of the vertex v to the DC i in the nth iteration, wherein alpha represents the rewarding weight, n is the iteration number, and j and i are both vertexes;
the learning automaton then finds larger ones in turn
Figure FDA0004014751030000073
Then using the probability updating method to perform probability updating on all DCs; the learning automaton updates vertices for their reinforcement signals
Figure FDA0004014751030000074
The update order proceeds from small to large with penalizing regularization weights for DC, assuming given vertex v and DC i 、DC k,
Figure FDA0004014751030000075
Figure FDA0004014751030000076
The largest of all the penalty regularization weights,
Figure FDA0004014751030000077
smallest among all penalty regularization weights, priority use
Figure FDA0004014751030000078
Probability updating is carried out on all DCs, and the updating formula is as follows:
Figure FDA0004014751030000079
wherein the beta table represents a penalty weight for,
Figure FDA00040147510300000710
representing the probability of the vertex v to DC j in the nth iteration, wherein n is the iteration number, and j and i are both vertexes;
then the learning automaton will find larger ones in turn
Figure FDA00040147510300000711
And corresponding DC k, reuse
Figure FDA00040147510300000712
Performing probability updating on all DCs; if the preset iteration times are reached or the constraint condition is converged, the iteration is finished; otherwise, entering N +1 iterations, and selecting the action in the N +1 th iteration by taking the probability updated by the nth iteration as a reference.
10. A generalized distributed graph processing system based on reinforcement learning, comprising:
the distributed graph definition and constraint condition setting module is used for defining a distributed data processing center based on graph theory to form a distributed graph, and cutting the distributed graph based on preset constraint conditions by utilizing a preset graph cutting model and a preset graph processing model;
the action selection module is used for distributing a learning automaton for each vertex of the distributed graph, initializing the probability of each vertex in each data processing center, and selecting the data processing center with the highest probability for the vertex by the learning automaton according to a preset action selection method based on the initialized probability;
the vertex migration module is used for selecting the data processing center with the maximum probability for the vertex, comparing the data processing center with the data processing center where the vertex is located currently, if the data processing centers are not consistent, migrating the vertex to the data processing center corresponding to the action, and otherwise, not performing any operation;
the score calculation module is used for calculating the score of the vertex of each learning automaton in each data processing center, and the score is determined according to the preset constraint condition;
the system comprises a reinforced signal calculation module, a data processing center calculation module and a data processing center calculation module, wherein each learning automaton is used for transmitting a data processing center number corresponding to a maximum score to a learning automaton to which a neighbor of a vertex of the learning automaton belongs to generate a corresponding weight vector, and the learning automaton calculates reinforced signals corresponding to all data processing centers for the vertex of the learning automaton according to the weight vector;
the probability updating module is used for updating the probability value of the vertex of the learning automaton in each data processing center according to the weight vector and the strengthening signal and guiding the next action selection to carry out iteration;
and the segmentation result acquisition module is used for generating a segmentation result of the distributed graph meeting the preset constraint condition until the preset iteration times are reached or the constraint condition is converged.
11. A computer-readable storage medium storing computer instructions for causing a computer to perform the reinforcement learning-based general distributed graph processing method according to any one of claims 1 to 9.
12. A computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the reinforcement learning based general distributed graph processing method according to any one of claims 1 to 9.
CN202010462112.4A 2020-05-27 2020-05-27 General distributed graph processing method and system based on reinforcement learning Active CN111539534B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010462112.4A CN111539534B (en) 2020-05-27 2020-05-27 General distributed graph processing method and system based on reinforcement learning
PCT/CN2021/076484 WO2021238305A1 (en) 2020-05-27 2021-02-10 Universal distributed graph processing method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010462112.4A CN111539534B (en) 2020-05-27 2020-05-27 General distributed graph processing method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111539534A CN111539534A (en) 2020-08-14
CN111539534B true CN111539534B (en) 2023-03-21

Family

ID=71980779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010462112.4A Active CN111539534B (en) 2020-05-27 2020-05-27 General distributed graph processing method and system based on reinforcement learning

Country Status (2)

Country Link
CN (1) CN111539534B (en)
WO (1) WO2021238305A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539534B (en) * 2020-05-27 2023-03-21 深圳大学 General distributed graph processing method and system based on reinforcement learning
CN113726342B (en) * 2021-09-08 2023-11-07 中国海洋大学 Segmented difference compression and inert decompression method for large-scale graph iterative computation
CN113835899B (en) * 2021-11-25 2022-02-22 支付宝(杭州)信息技术有限公司 Data fusion method and device for distributed graph learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106953801A (en) * 2017-01-24 2017-07-14 上海交通大学 Stochastic shortest route implementation method based on hierarchical structure learning automaton
CN109889393A (en) * 2019-03-11 2019-06-14 深圳大学 A kind of geographically distributed figure processing method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208257B2 (en) * 2013-03-15 2015-12-08 Oracle International Corporation Partitioning a graph by iteratively excluding edges
EP2884453A1 (en) * 2013-12-12 2015-06-17 Telefonica Digital España, S.L.U. A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network
CN105590321B (en) * 2015-12-24 2018-12-28 华中科技大学 A kind of block-based subgraph building and distributed figure processing method
CN106970779B (en) * 2017-03-30 2020-01-03 重庆大学 Memory computing-oriented stream balance graph partitioning method
CN107222565B (en) * 2017-07-06 2019-07-12 太原理工大学 A kind of network dividing method and system
CN109033191A (en) * 2018-06-28 2018-12-18 山东科技大学 A kind of dividing method towards extensive power-law distribution figure
CN111539534B (en) * 2020-05-27 2023-03-21 深圳大学 General distributed graph processing method and system based on reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106953801A (en) * 2017-01-24 2017-07-14 上海交通大学 Stochastic shortest route implementation method based on hierarchical structure learning automaton
CN109889393A (en) * 2019-03-11 2019-06-14 深圳大学 A kind of geographically distributed figure processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Cost-Aware Partitioning for Efficient Large Graph Processing in Geo-Distributed Datacenters;Amelie Chi Zhou et al.;《IEEE Transactions on Parallel and Distributed Systems》;20191125;第31卷(第7期);第1-2页 *

Also Published As

Publication number Publication date
WO2021238305A1 (en) 2021-12-02
CN111539534A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN111539534B (en) General distributed graph processing method and system based on reinforcement learning
US11410046B2 (en) Learning-based service migration in mobile edge computing
US11018979B2 (en) System and method for network slicing for service-oriented networks
KR102621640B1 (en) Method and apparatus for automated decision making
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN111406264A (en) Neural architecture search
CN112291335B (en) Optimized task scheduling method in mobile edge calculation
CN113835899B (en) Data fusion method and device for distributed graph learning
CN111447005B (en) Link planning method and device for software defined satellite network
CN112512013B (en) Learning pruning-based vehicle networking mobile edge computing task unloading method and system
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
WO2019168692A1 (en) Capacity engineering in distributed computing systems
CN115066694A (en) Computation graph optimization
CN114595049A (en) Cloud-edge cooperative task scheduling method and device
CN111488528A (en) Content cache management method and device and electronic equipment
CN113489787B (en) Method and device for collaborative migration of mobile edge computing service and data
CN117041330B (en) Edge micro-service fine granularity deployment method and system based on reinforcement learning
CN109889393B (en) Method and system for processing geographic distributed graph
CN111510334B (en) Particle swarm algorithm-based VNF online scheduling method
CN113515378A (en) Method and device for migration and calculation resource allocation of 5G edge calculation task
Garg et al. Heuristic and reinforcement learning algorithms for dynamic service placement on mobile edge cloud
CN116996941A (en) Calculation force unloading method, device and system based on cooperation of cloud edge ends of distribution network
CN116489668A (en) Edge computing task unloading method based on high-altitude communication platform assistance
CN116405493A (en) Edge cloud collaborative task unloading method based on MOGWO strategy
CN113708982B (en) Service function chain deployment method and system based on group learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant