WO2021238305A1 - 一种基于强化学习的通用分布式图处理方法及系统 - Google Patents
一种基于强化学习的通用分布式图处理方法及系统 Download PDFInfo
- Publication number
- WO2021238305A1 WO2021238305A1 PCT/CN2021/076484 CN2021076484W WO2021238305A1 WO 2021238305 A1 WO2021238305 A1 WO 2021238305A1 CN 2021076484 W CN2021076484 W CN 2021076484W WO 2021238305 A1 WO2021238305 A1 WO 2021238305A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vertex
- data processing
- processing center
- probability
- graph
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to the field of large-scale graph segmentation processing, and in particular to a general distributed graph processing method and system based on reinforcement learning.
- Pregel the traditional mainstream large-scale graph processing systems Pregel, PowerGraph, etc. all use heuristic segmentation algorithms.
- Pregel's default partitioning method is to achieve the optimization goal of enhancing the locality of the partition and reducing the network traffic between computing nodes by modulo the hash value of the vertex id.
- PowerGraph uses a greedy point segmentation method. For a newly added edge, if one of its vertices already exists on a certain machine, the edge will be allocated to the corresponding machine, thereby minimizing the cross-machine The number of edges reduces the amount of communication. This heuristic graph segmentation algorithm is easy to fall into the local optimal solution, and some better solution spaces have not been searched.
- Phamet et al. proposed a graph partitioning method, which specifically refers to allocating operations (nodes) on the Tensorflow calculation graph to available devices to minimize the calculation time. They use a reinforcement learning model and use the seq2seq strategy to allocate operations. This method is only suitable for the case where the number of graph nodes is small, so that the strategy space will not be too large, and this method is suitable.
- Naziet et al. proposed an algorithm GAP that uses deep learning to solve the problem of graph partitioning. GAP is an unsupervised learning method that treats the problem of balanced graph partitioning as a vertex classification problem to solve. But if the optimization goal involves the heterogeneity of network prices and bandwidth, the calculation of nodes' embeddings is very complicated.
- the application scenarios of these existing machine learning models for graph segmentation are relatively single. When the graph scale becomes larger and the optimization goal becomes more complex, these methods cannot solve the graph segmentation problem well.
- the technical problem to be solved by this application is to overcome the defects of the graph cutting model in the prior art that it is easy to fall into the local optimal solution, the use scene is single and the segmentation effect is poor, thereby providing a general distributed graph processing method based on reinforcement learning And system.
- an embodiment of the present application provides a general distributed graph processing method based on reinforcement learning, which includes the following steps: a distributed data processing center is defined based on graph theory to form a distributed graph, and a preset graph is used to cut the model and the preset graph. Processing model, cutting distributed graphs based on preset constraints;
- the learning automaton selects the data processing center with the highest probability for the vertex according to the preset action selection method ;
- the learning automaton will select the data processing center with the highest probability for the vertex, and compare it with the data processing center where the vertex is currently located. If it is inconsistent, it will move the vertex to the data processing center corresponding to the action, otherwise it will not do any operation;
- Each learning automaton calculates the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition
- Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
- the learning automaton calculates all data processing center correspondences for its vertices based on the weight vector Enhanced signal;
- the learning automaton updates the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guides the next action selection to iterate;
- the preset graph cutting model is a hybrid-cut graph cutting model
- the preset graph processing model is a GAS graph processing model
- the GAS graph processing model is used to iteratively perform vertex calculation
- the constraint condition is funding The budget cost and data transmission time are the smallest.
- the data transmission time is expressed as the sum of the data transmission time of the collection phase and the application phase, and the calculation formula of the data transmission time T(i) of the i-th iteration is:
- a v (i) represents the amount of data sent from the master vertex v to each replica in the application phase of the i-th iteration
- U r /D r represents the upload/download bandwidth of DCr
- R v represents a collection of data processing centers DC containing copies of v
- the communication cost between the data processing centers DC is the sum of the cost of uploading data in the collection phase and the application phase.
- the unit cost of uploading data from DC r to the network is P r , and the capital budget cost is expressed as:
- B is the capital budget for using network resources.
- initializing the probability of each vertex in each data processing center, and the step of selecting the data processing center with the highest probability for the vertex according to a preset action selection method by the learning automaton includes:
- the probability P(v i ) of initializing vertex v in the data processing center DC i is M is the number of distributed DCs;
- the probability distribution acquisition vertex for each vertex of a cumulative probability of a data processing center DC, Q (v i) represents the cumulative probability of vertex v in a data processing center DC I, wherein,
- initializing the probability of each vertex in each data processing center, and the step of selecting the data processing center with the highest probability for the vertex according to a preset action selection method by the learning automaton includes:
- a trial and error parameter ⁇ randomly generate a floating-point number r ⁇ [0,1], if r ⁇ , the learning automaton randomly selects a DC for its vertex; if r> ⁇ , the learning automaton is its vertex Select the data processing center DC with the largest P(v i) value.
- each learning automaton calculates the score when its vertex is in each data processing center, which is calculated by the following formula:
- B represents the capital budget for using network resources
- T b represents the overall data transmission time of the system before the score is calculated
- C b represents the overall data transmission cost of the system before the score is calculated.
- Indicates the overall data transmission time of the system when the vertex is at DCi Indicates the overall data transmission cost of the system when the vertex is calculated at DCi.
- tw and cw respectively represent the time weight and capital cost weight; when C b ⁇ B, cw decreases from 1 to 0 evenly as the number of iterations increases, and tw increases with iteration The increase of the number of times increases uniformly from 0 to 1; when C b ⁇ B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations.
- Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
- the learning automaton calculates all data processing center correspondences for its vertices based on the weight vector.
- the steps to strengthen the signal include:
- the reference standard for calculating the weight vector is calculated by the following formula:
- the learning automaton calculates the corresponding reinforcement signal according to the weight vector, the calculation formula is as follows:
- the regularization weight which is divided into two parts: the reward and the penalty regularization weight, among which: Represents the regularization weight of the vertex v for DCi, calculated by the following formula:
- Neg() is the inverse function, Represents the enhanced signal of vertex v to the data processing center DCi, Represents the weight vector of vertex v to DC i, Represents the weight vector of vertex v to DCk;
- the probability of vertex v is updated according to the regularization weight, and the update sequence is performed according to the reward regularization weight for the data processing center DC from small to large.
- the update formula Given the vertex v and DC i , The smallest of all reward regularization weights, priority use Probability update for all DCs, the update formula is as follows:
- the update order of the DC is based on the penalty regularization weight for DC from small to large, assuming that a given vertex v and DC i , DC k, The largest of all penalty regularization weights, The smallest of all penalty regularization weights, the priority is used Probability update for all DCs, the update formula is as follows:
- ⁇ represents the penalty weight
- the embodiments of the present application provide a general distributed graph processing system based on reinforcement learning, including:
- Distributed graph definition and constraint condition setting module used to define distributed data processing center based on graph theory to form distributed graph, use preset graph cutting model and preset graph processing model, and cut distributed graph based on preset constraint conditions ;
- the action selection module is used to assign a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is vertex selection according to the preset action selection method The most probable data processing center;
- Vertex migration module the learning automaton is used to select the data processing center with the highest probability for the vertices and compare it with the data processing center where the vertices are currently located. Do any operation
- a score calculation module each learning automaton is used to calculate the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition;
- each learning automaton is used to propagate the data processing center number corresponding to the maximum score to the learning automata of its vertex neighbors, and generate a corresponding weight vector, and the learning automaton is its vertex according to the weight vector Calculate the enhanced signals corresponding to all data processing centers;
- Probability update module the learning automaton is used to update the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guide the next action selection to iterate;
- the segmentation result acquisition module is used to generate the segmentation result of the distributed graph that meets the preset constraint condition until the preset number of iterations is reached or the constraint condition converges.
- an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the enhancement based on the first aspect of the embodiments of the present application. Learned general distributed graph processing method.
- an embodiment of the present application provides a computer device, including: a memory and a processor, the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes all The computer instructions are described to execute the general distributed graph processing method based on reinforcement learning in the first aspect of the embodiments of the present application.
- the general distributed graph processing method and system based on reinforcement learning provided in this application define distributed data processing centers based on graph theory to form distributed graphs, use preset graph cutting models and preset graph processing models, and utilize preset constraint conditions
- the method of reinforcement learning cuts the distributed graph, assigns a learning automaton to each vertex, and finds the most suitable data processing center for the vertex through training.
- the possibility of each vertex in all data processing centers obeys a certain probability distribution.
- the system includes five steps in each iteration process: action selection, vertex migration, score calculation, enhanced signal calculation, and probability update. When the maximum number of iterations is reached or the constraint conditions are converged, the end of the iteration is judged.
- the distributed graph processing model formed by the general distributed graph processing method provided in this application is a distributed graph model with good adaptability. For different optimization goals, only different score calculation schemes and different weight vectors need to be designed.
- FIG. 1 is a flowchart of a specific example of a general distributed graph processing method based on reinforcement learning in an embodiment of the application;
- FIG. 3 is a functional block diagram of a specific example of a general distributed graph processing system based on reinforcement learning in an embodiment of the application;
- Fig. 4 is a composition diagram of a specific example of a computer device provided by an embodiment of the application.
- the embodiment of the application provides a general distributed graph processing method based on reinforcement learning, which can be applied to different optimization goals, for example, in the performance and cost optimization, load balancing, and performance optimization of a geographically distributed graph processing system, such as As shown in Figure 1, it includes the following steps:
- Step S10 Define a distributed data processing center based on graph theory to form a distributed graph, use a preset graph cutting model and a preset graph processing model, and cut the distributed graph based on preset constraint conditions.
- the embodiment of the application takes the geographically distributed graph segmentation process as an example. It is assumed that the vertex data is not backed up on the data processing center (hereinafter referred to as DC), and a machine can only perform graph processing tasks for one vertex at a time; each DC
- the computing resources are not limited, and the data communication between DCs is the performance bottleneck of geographically distributed graph processing; assuming that the connection between DCs is free from network congestion, the network bottleneck only comes from the uplink between DC and WAN Uplink and downlink bandwidth; only charge for uploading data from DC to WAN.
- V is the set of vertices
- E is the set of edges
- DC geographically distributed data processing centers
- the embodiment of this application uses a hybrid-cut graph cutting model, which follows the following rules: Given a threshold theta, for a vertex v, if its in-degree is greater than or equal to theta, it is called a high-degree vertex, on the contrary, it is called a high-degree vertex. low-degree vertex. If the vertex v is low-degree, all its incoming edges are assigned to the DC where it is located. If the vertex v is high-degree, its incoming edge will be assigned to the DC where the opposite vertex of the edge is located.
- the embodiment of this application uses a GAS graph processing model, which iteratively performs user-defined vertex calculations.
- GAS global analysis system
- each active vertex collects neighbors' data, and the sum function (Sum) is defined as aggregating the received data into a gathered sum.
- each active vertex uses aggregation and updates its data.
- each active vertex activates its neighbors executed in the next iteration.
- a global barrier is defined as ensuring that all vertices complete their calculations before starting the next step.
- the transmission time in the i-th iteration can be expressed as the sum of the data transmission time in the gather phase and the apply phase.
- the calculation formula for the transmission time of the i-th iteration is:
- a v (i) represents the amount of data sent from the master vertex v to each replica in the application phase of the i-th iteration
- U r /D r represents the upload/download bandwidth of DCr
- R v represents a collection of data processing centers DC containing copies of v
- the communication cost between DCs is the sum of the cost of uploading data in the gather phase and the apply phase.
- the unit cost of uploading data from DC r to the Internet is defined as P r .
- the total communication cost can be expressed as:
- the geographical distribution map segmentation problem is expressed as a constrained optimization problem, that is, the constraint conditions are:
- the geographical distribution map segmentation problem to be solved is the optimization problem under the constraint conditions described in formulas (3) and (4).
- Step S11 Assign a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is the one with the highest probability of vertex selection according to the preset action selection method Data processing center.
- P(v i ) represents the probability of vertex v at DC i, which is initialized as M is the number of distributed DC
- Q (v i) represents the vertex v in a cumulative probability of DC I, calculated as follows:
- the LA uses a roulette algorithm to select the appropriate action (DC) for its vertices.
- LA first obtains the cumulative probability of the vertex for each DC according to the probability distribution of the vertex, and then randomly generates a floating-point number r ⁇ [0,1]. If r is less than or equal to Q(v 0 ), DC0 will be selected; if r is between Q(v k-1 ) and Q(v k ) (k ⁇ 1), DCk will be selected. In this way, an action with a higher probability has a greater chance of being selected, but an action with a lower probability may also be selected.
- Step S12 The learning automaton will select the data processing center with the highest probability for the vertex and compare it with the data processing center where the vertex is currently located. If it is inconsistent, the vertex will be migrated to the data processing center corresponding to the action, otherwise no operation will be performed.
- the action obtained in step S11 is compared with the DC where the vertex is currently located, and if it is inconsistent, the vertex is migrated to the DC corresponding to the action, otherwise, no operation is performed.
- Step S13 Each learning automaton calculates the score when its vertex is in each data processing center, and the score is determined according to the preset constraint condition.
- the score of the vertex at each DC is calculated for its vertex.
- L v is defined as the DC where vertex v is currently located
- T b is the overall data transmission time of the system before the score is calculated.
- Formula (1) is calculated, It represents the data transmission time of the entire system when the calculation vertex is at DC i
- C b represents the data transmission cost of the entire system before calculating the score, which is calculated according to formula (2)
- the calculation method of is: move vertex v to DC i, then calculate according to formula (1) and formula (2), and finally move vertex v back to L v .
- the calculation method is as follows:
- B represents capital budget
- tw and cw represent time weight and cost weight respectively.
- the purpose is to prioritize the optimization of the overall communication cost of the graph processing system and explore more The graph partition state that can reduce the system cost; when C b ⁇ B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations, the purpose is to prioritize the optimization of graph processing The overall data transmission time of the system and the optimization speed of slowing down the transmission time, so as to achieve a better optimization effect.
- Step 14 Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
- the learning automaton calculates all data for its vertices based on the weight vector The enhanced signal corresponding to the processing center.
- each LA will communicate with other LAs to generate enhanced signals for all DCs for its vertices. Before calculating the enhanced signals, it is necessary to calculate the weight vectors of the vertices for all DCs. After each LA has calculated the scores of all DCs, it will propagate the DC number corresponding to the maximum score to the LAs of its vertex neighbors, and these LAs immediately generate corresponding weight term vectors.
- the definition ⁇ v represents the DC corresponding to the maximum score of vertex v
- Nbr(v) represents the set of neighbor vertices of vertex v
- the vertex v is moved to ⁇ v, and then to move the vertex u [rho] v data transmission time of the whole system;
- the vertex v is moved to ⁇ v, then vertex u capital cost of the whole system to move the [rho] v;
- Means that when the label is received from vertex u [rho] v v propagation of its neighbors, the reference standard for calculating the weight vector, is calculated as follows:
- LA After calculating the weight vector of the vertex for all DCs, LA will calculate the corresponding enhancement signal according to the weight vector, the formula is as follows:
- Step 15 The learning automaton updates the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guides the next action selection to iterate.
- the LA will use the weight vector and the reinforcement signal obtained in step 14 to update the probability value of its vertex at each DC, so as to guide the next action selection.
- the regularization weight needs to be calculated first, which is divided into two parts: the reward and the penalty regularization weight.
- Neg() is the inverse function.
- the calculation method is as follows:
- the probability of the vertex v can be updated.
- LA will first update the vertex for its enhanced signal as The update order of the DC is based on the regularization weight of the reward for the DC from small to large. Assuming that given vertex v and DC i, Among all the reward regularization weights, the smallest one will be used first Probability update for all DCs, the update formula is as follows:
- formula (11) increases the probability of DC i and decreases the probability of other DCs. Then, LA will find larger ones in turn Then use it to update the probability of all DCs.
- the beneficial effect of this implementation is that it can ultimately make The largest DC has the greatest probability.
- LA will update those vertices for its enhanced signal as
- the update order of the DC is based on the penalty regularization weight for the DC from small to large. Assuming that given vertex v and DC i and DC k, The largest of all penalty regularization weights, Among all the penalty regularization weights, the smallest one is used first Probability update for all DCs, the update formula is as follows:
- ⁇ represents the penalty weight
- ⁇ Indicates the probability of the vertex v for DC j in the nth iteration.
- the above formula (12) lowers the probability of DC k and increases the probability of other DCs. Then, LA will find larger ones in turn And the corresponding DC k, then use Probability update is performed on all DCs.
- the beneficial effect of this implementation is that it can ultimately make The smallest DC has the smallest probability.
- Step 16 Until the preset number of iterations is reached or the constraint condition converges, a segmentation result of the distributed graph meeting the preset constraint condition is generated.
- the action selection in the N+1 iteration will use the updated probability of the N iteration as a reference, and continue to perform vertex migration, score calculation, enhanced signal calculation, probability update, and the next iteration Wait until the iteration ends, and generate a geographically distributed graph segmentation result that satisfies the funding budget and has a very small data transmission time.
- real graphics data sets are used for evaluation on real clouds and cloud simulators.
- five real maps are used: Gnutella (GN), WikiVote ( WV), GoogleWeb (GW), LiveJournal (LJ), and Twitter (TW).
- Real cloud experiments are carried out on Amazon EC2 and Windows Azure cloud platforms.
- the GAS-based PowerGraph system is used to execute graph processing algorithms, including pagerank, Classic graph algorithms such as sssp and subgraph.
- the distribution graph processing method provided by the embodiment of the application is integrated in PowerGraph, and the graph is segmented during loading.
- the evaluation of real geographically distributed DCs and real graphics in simulation shows that, compared with the most advanced geographically distributed graphics processing system performance and cost optimization algorithm Geo-Cut, the distribution map processing method provided by the embodiments of this application can reduce Up to 72% of the data transmission time between DCs and up to 63% of the capital cost, and the load is relatively balanced.
- the embodiments provided in this application can be applied to multiple scenarios. For example, Facebook receives terabytes of text, image, and video data from users all over the world every day. Facebook built four geographically distributed DCs to maintain and manage these data. If the load capacity and system response time of these DCs are considered, the method provided in the embodiment of the present application can be used to segment and optimize the graph, which can make the DC work stably and bring a good experience to the user. If considering network heterogeneity, cost budget and system performance in a geographically distributed environment, the method provided in the embodiment of this application can also be used to segment and optimize the graph, which can achieve a good performance improvement in both transmission time and cost budget. .
- the embodiment of the present application only takes the performance and cost advantages of the geographically distributed graph cutting process system as an example to explain the working principle of the distribution graph processing method.
- the processing model formed by the distributed graph processing method proposed in this embodiment is a general model. This model can not only solve the performance and cost optimization problems of the geographically distributed graph processing system, but also solve load balancing and performance optimization. For problems such as different optimization goals, it is only necessary to design different score calculation schemes and different weight vector calculation schemes.
- the embodiment of the present application provides a general distributed graph processing system based on reinforcement learning, as shown in FIG. 3, including:
- the distributed graph definition and constraint condition setting module 10 is used to define the distributed data processing center based on graph theory to form a distributed graph, use the preset graph cutting model and preset graph processing model, and perform the distributed graph based on preset constraint conditions. Cutting. This module executes the method described in step S10 in embodiment 1, which will not be repeated here.
- the action selection module 11 is used to allocate a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is a vertex according to a preset action selection method Choose the data processing center with the highest probability. This module executes the method described in step S11 in embodiment 1, which will not be repeated here.
- the vertex migration module 12 the learning automaton is used to select the data processing center with the highest probability for the vertex and compare it with the data processing center where the vertex is currently located. If it is inconsistent, then the vertex is migrated to the data processing center corresponding to the action, otherwise, Do nothing.
- This module executes the method described in step S12 in embodiment 1, which will not be repeated here.
- each learning automaton is used to calculate the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition. This module executes the method described in step S13 in embodiment 1, which will not be repeated here.
- Each learning automaton is used to propagate the data processing center number corresponding to the maximum score to the learning automaton to which the neighbors of its vertex belong, and generate a corresponding weight vector.
- the learning automaton is based on the weight vector.
- the vertex calculates the enhanced signals corresponding to all the data processing centers; this module executes the method described in step S14 in embodiment 1, which will not be repeated here.
- Probability update module 15 the learning automaton is used to update the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guide the next action selection to iterate; this module executes the steps in embodiment 1 The method described in S15 will not be repeated here.
- the segmentation result acquisition module 16 is configured to generate a segmentation result of the distributed graph that meets the preset constraint condition until the preset number of iterations is reached or the constraint condition converges. This module executes the method described in step S16 in embodiment 1, which will not be repeated here.
- the general distributed graph processing system based on reinforcement learning defines distributed data processing centers based on graph theory to form distributed graphs, uses preset graph cutting models and preset graph processing models, and utilizes preset constraint conditions
- the method of reinforcement learning cuts the distributed graph, assigns a learning automaton to each vertex, and finds the most suitable data processing center for the vertex through training.
- the possibility of each vertex in all data processing centers obeys a certain probability distribution.
- the system includes five steps in each iteration process: action selection, vertex migration, score calculation, enhanced signal calculation, and probability update. When the maximum number of iterations is reached or the constraint conditions have converged, the end of the iteration is judged.
- the distributed graph processing model formed by the general distributed graph processing method provided in this application is a general distributed graph model. For different optimization goals, only different score calculation schemes and different weight vectors need to be designed.
- FIG. 4 An embodiment of the present application provides a computer device. As shown in FIG. 4, the device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in other ways. FIG. 4 uses a bus connection as an example .
- the processor 51 may be a central processing unit (Central Processing Unit, CPU).
- the processor 51 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), or Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.
- the memory 52 can be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as corresponding program instructions/modules in the embodiments of the present application.
- the processor 51 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions, and modules stored in the memory 52, that is, realizes the general distributed graph based on reinforcement learning in the above method embodiment. Approach.
- the memory 52 may include a program storage area and a data storage area.
- the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by the processor 51 and the like.
- the memory 52 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
- the memory 52 may optionally include memories remotely provided with respect to the processor 51, and these remote memories may be connected to the processor 51 through a network. Examples of the aforementioned network include, but are not limited to, the Internet, an intranet, an intranet, a mobile communication network, and combinations thereof.
- One or more modules are stored in the memory 52, and when executed by the processor 51, the general distributed graph processing method based on reinforcement learning in Embodiment 1 is executed.
- a computer program can be used to instruct relevant hardware to complete the program, which can be stored in a computer readable storage medium, and when the program is executed , May include the processes of the above-mentioned method embodiments.
- the storage media can be magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), random access memory (RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive) , Abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the foregoing types of memories.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Machine Translation (AREA)
Abstract
本申请公开了一种基于强化学习的通用分布式图处理方法及系统,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件利用强化学习的方式对分布式图切割,给每一个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个系统在每个迭代过程中包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或者约束条件已经收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个通用的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。
Description
本申请涉及大规模图分割处理领域,具体涉及一种基于强化学习的通用分布式图处理方法及系统。
为了高效地进行大规模图处理,通常需要对图进行分割,使得分割后的子图可以并行地进行处理。大规模图分割目前有以下几种经典模型:
启发式模型,传统主流的大规模图处理系统Pregel、PowerGraph等都采用的是启发式的分割算法。Pregel默认的分区方法就是通过对顶点id的Hash值进行取模操作以达到增强分区的局部性,减少计算节点之间网络流量的优化目标。PowerGraph默认采用的是贪婪的点切分方式,对于新加进来的边,如果它的某个顶点已经存在于某台机器上,就将该边分配到对应的机器上,从而最小化跨机器的边的数目,减少通信量。这种启发式的图分割算法容易陷入局部最优解,有一些更好的解空间并没有被搜索到。
机器学习模型,Phamet等人提出一种图分区方式,具体是指将Tensorflow计算图上的operations(node)分配到可利用的设备上,使得计算时间最短。他们采用的是强化学习模型,利用seq2seq策略来分配operations。这种方式只适用于图节点数目较少的情况,这样策略空间不会太大,这种方法才适用。Naziet等人提出了一种用深度学习来解决图分区问题的算法GAP。GAP是一个无监督学习方法,把平衡图分区问题当作顶点分类问题进行解决。但是如果优化目标涉及到网络价格以及带宽的异构性时,nodes的embeddings的计算就十分复杂了。这些已有的用于图分割的机器学习模型适用场景比较单一,当图规模变大、优化目标更复杂时,这些方法就不能很好地解决图分割问题了。
发明内容
因此,本申请要解决的技术问题在于克服现有技术中图切割模型存在易陷入局部最优解、使用场景单一等分割效果差的缺陷,从而提供一种基于强化学习的通用分布式图处理方法及系统。
为达到上述目的,本申请提供如下技术方案:
第一方面,本申请实施例提供一种基于强化学习的通用分布式图处理方法,包括如下步骤:基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概 率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作;
每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
在一实施例中,所述预设图切割模型为hybrid-cut图切割模型,所述预设图处理模型为GAS图处理模型,利用GAS图处理模型迭代执行顶点计算,所述约束条件为资金预算成本及数据传输时间最小。
在一实施例中,所述数据传输时间表示为收集阶段和应用阶段的数据传输时间之和,第i次迭代的数据传输时间T(i)的计算公式为:
a
v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量 的大小;
U
r/D
r表示DCr的上传/下载带宽;
R
v表示包含v的副本的数据处理中心DC的集合;
数据处理中心DC之间的通信成本为在收集阶段和应用阶段的上传数据的成本之和,从DC
r将数据上传至网络的单元成本为P
r,所述资金预算成本表示为:
约束条件为:
minT(i) (3)
C
comm(i)≤B (4)
其中,B为使用网络资源的资金预算。
在一实施例中,初始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
随机生成一个浮点数r∈[0,1],如果r小于等于Q(v
0),则DC 0将被选中;如果r介于Q(v
k-1)与Q(v
k)(k≥1)之间时,则数据处理中心DC k被选中。
在一实施例中,初始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
预设一试错参数τ,随机生成一个浮点数r∈[0,1],如果r≤τ,则学习自动机为其顶点随机选择一个DC;如果r>τ,则学习自动机为其顶点选择P(v
i)值最大的数据处理中心DC。
在一实施例中,每个学习自动机计算其顶点在每一个数据处理中心时的分数,通过以下公式计算:
其中,
表示顶点v在DCi时的分数,B表示使用网络资源的资金预算,T
b表示计算分数之前系统整体的数据传输时间,C
b表示计算分数之前系统整体的数据传输成本,
表示计算顶点在DCi时系统整体的数据传输时间,
表示计算顶点在DCi时系统整体的数据传输成本,tw与cw分别表示时间权重以及资金成本权重;在C
b≥B时,cw随着迭代次数的增加从1均匀减少至0,tw随着迭代次数的增加从0均匀增加至1;当C
b<B时,tw随着迭代次数的增加从1均匀减少至0,cw随着迭代次数的增加从0均匀增加至1。
每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号的步骤,包括:
计算权重向量的参考标准,通过如下公式计算:
其中,
表示当顶点u收到其邻居v传播的标签ρ
v时,其计算权重向量的参考标准,ρ
v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合;
为将顶点v移动至ρ
v,再将顶点u移动至ρ
v后系统整体的数据传输时间;
表示将顶点v移动至ρ
v后系统整体的数据传输时间;
表示将顶点v移动至ρ
v后系统整体的资金成本;
为将顶点v移动至ρ
v,再将顶点u移动至ρ
v后系统整体的资金成本;
顶点u在计算完参考标准后,其权重向量更新公式如下:
在计算完顶点对于所有数据处理中心的权重向量后,学习自动机根据权重向量计算出相应的强化信号,计算公式如下:
在一实施例中,根据正则化权重对顶点v的概率进行更新,更新顺序按照对于数据处理中心DC的奖励正则化权重从小到大进行,给定顶点v以及DC
i,
在所有奖励正则化权重中最小,优先使用
对所有DC进行概率更新,更新公式如下:
接着学习自动机依次找到更大的
再使用它对所有的DC进行概率更新;学习自动机更新顶点对于其强化信号为
的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行,假设给定顶点v以及DC
i、DC k,
在所有惩罚正则化权重中最大,
在所有惩罚正则化权重中最小,优先使用
对所有DC进行概率更新,更新公式如下:
接着学习自动机会依次找到更大的
以及对应的DC k,再使用
对所有的DC进行概率更新;如果达到预设迭代次数或者约束条件已经收敛,则迭代结束;否则,进入N+1次迭代,第N+1次迭代中的动作选择会以第N次迭代更新后的概率为参考。
第二方面,本申请实施例提供一种基于强化学习的通用分布式图处理系统,包括:
分布式图定义及约束条件设置模块,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
动作选择模块,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
顶点迁移模块,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作;
分数计算模块,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
强化信号计算模块,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
概率更新模块,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
分割结果获取模块,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
第三方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行本申请实施例第一方面的基于强化学习的通用分布式图处理方法。
第四方面,本申请实施例提供一种计算机设备,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行本申请实施例第一方面的基于强化学习的通用分布式图处理方法。
本申请技术方案,具有如下优点:
本申请提供的基于强化学习的通用分布式图处理方法及系统,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条 件利用强化学习的方式对分布式图切割,给每个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个系统在每个迭代过程中均包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或约束条件收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个自适应性较好的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。
为了更清楚地说明本申请具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例中基于强化学习的通用分布式图处理方法的一个具体示例的流程图;
图2为本申请实施例提供的基于强化学习图分割过程进行迭代的流程图;
图3为本申请实施例中基于强化学习的通用分布式图处理系统的一个具体示例的原理框图;
图4为本申请实施例提供的计算机设备一个具体示例的组成图。
下面将结合附图对本申请的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
此外,下面所描述的本申请不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。
实施例1
本申请实施例提供一种基于强化学习的通用分布式图处理方法,可以应用于不同的优化目标,例如在地理分布式图处理系统的性能以及成本优化、负载均衡以及性能优化等问题中,如图1所示,包括如下步骤:
步骤S10:基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件,对分布式图进行切割。
本申请实施例以地理分布式图分割处理过程作为举例说明,假设顶点数据没有在数据处理中心(以下简称DC)上备份,且一台机器一次只能执行一个顶点的图处理任务;每个DC的计算资源不受限制,而DC之间的数据通信是地理分布式图处理的性能瓶颈;假设DC之间的连接是没有网络拥塞的,网络的瓶颈仅来自于DC和WAN之间的上行 链路(uplink)和下行链路(downlink)带宽;只收取从DC到WAN的上传数据的费用。考虑到成本与性能之间可能存在矛盾对立的情况:当uplink的带宽较大时,可以增加在这个链路上的传输数据,从而达到减少传输时间的目的,但是这个链路的价格可能会相对来说较高从而使得成本变高,因此需要同时优化性能和成本作为优化目标来进行图分割。
首先定义图G(V,E),V是顶点的集合,E是边的集合,考虑M个地理分布式数据处理中心(以下简称DC),每个顶点v具有初始位置Lv(Lv∈(0,1,…,M-1),
表示该顶点v是master顶点,
表示该顶点不是master顶点,Rv是包含顶点v的复制顶点的DC集合,Ur是uplink的带宽,Dr是downlink的带宽。
本申请实施例使用的是hybrid-cut图切割模型,遵循以下规则:给定一个阈值theta,对于顶点v,如果其入度大于等于theta,称其为high-degree型顶点,相反,称其为low-degree顶点。如果顶点v是low-degree的,它的所有入边都分配到它所在的DC,如果顶点v是high-degree的,它的入边将分配到该边对端顶点所在的DC。
本申请实施例使用的是GAS图处理模型,该模型迭代地执行用户定义的顶点计算。每个GAS迭代中有三个计算阶段,即收集(Gather),应用(Apply)和发散(Scatter)。在收集阶段,每个活动顶点收集邻居的数据,并且求和函数(Sum)被定义为将接收的数据聚合为聚集和(gathered sum)。在应用阶段,每个活动顶点使用聚集和更新其数据。在发散阶段,每个活动顶点激活它在下一次迭代中执行的邻居。全局障碍(global barrier)定义为确保所有顶点在开始下一步之前完成其计算。
第i次迭代中的传输时间可以表示为gather阶段和apply阶段的数据传输时间之和。第i次迭代的传输时间的计算公式为:
a
v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量的大小;
U
r/D
r表示DCr的上传/下载带宽;
R
v表示包含v的副本的数据处理中心DC的集合;
DC之间的通信成本是在gather阶段和apply阶段的上传数据的成本之和,定义从DC r将数据上传至Internet的单元成本是P
r,总的通信成本可以表示为:
将地理分布图分割问题表述为约束优化问题,即约束条件为:
minT(i) (3)
C
comm(i)≤B (4)
要解决的地理分布图分割问题即公式(3)、(4)所描述的约束条件下的优化问题。
在定义完地理分布式图各个元素所代表的含义后,需要每一个顶点分配一个学习自动机(以下简称LA),通过训练为顶点找到最适合它的DC,每个顶点在所有DC的可能性服从一定的概率分布在每个迭代过程中主要包含:动作选择、顶点迁移、分数计算、强化信号生成、概率更新五个步骤,在优化地理分布式图处理系统的性能以及成本时的整个工作流程图如图2所示,各步骤的主要功能以及步骤之间的联系如下所述。
步骤S11:为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心。
在一实施例中,LA采用轮盘赌算法为其顶点选择合适的动作(DC)。LA首先根据顶点的概率分布求得顶点对于各DC的积累概率,再随机生成一个浮点数r∈[0,1]。如果r小于等于Q(v
0),则DC0将被选中;如果r介于Q(v
k-1)与Q(v
k)(k≥1)之间时,则DCk将被选中。通过这种方式,概率越大的动作被选中的机会越大,但概率小的动作也可能会被选中。当LA选中好的动作(概率大的动作)时,图分割结果更可能会往优化目标的方向进行;当LA选中坏的动作(概率小的动作),此过程为一个试错过程,在当前看似结果不好的选择可能探索到更好的状态空间。
在另一实施例中,动作选择还可以采用另一种方式:定义试错参数τ=0.1;随机生成一个浮点数r∈[0,1]。如果r≤τ,则LA会为其顶点随机选择一个DC;如果r>τ,则LA会为其顶点选择P(v
i)值最大的DC。
步骤S12:学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作。
本申请实施例LA将从步骤S11中得到的动作与其顶点当前所在的DC作比较,如果不一致,则将顶点迁移至动作对应的DC中,否则,不做任何操作。
步骤S13:每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定。
本申请实施例对于每一个LA,都会给其顶点计算顶点在每一个DC时的分数score,首先定义L
v表示顶点v当前所在的DC,T
b表示计算分数之前系统整体的数据传输时间,按公式(1)计算得到,
表示计算顶点在DC i时系统整体的数据传输时间,C
b表示计算分数之前系统整体的数据传输成本,按公式(2)计算得到,
表示计算顶点在DC i时系统整体的数据传输成本。
以及
的计算方式为:将顶点v移动至DC i,再分别按照公式(1)、公式(2)进行计算,最后将顶点v移回L
v。
表示顶点v在DC i时的分数,计算方法如下:
在公式(5)中,B表示资金预算,tw与cw分别表示时间权重以及成本权重。在C
b≥B时,cw随着迭代次数的增加从1均匀减少至0,tw随着迭代次数的增加从0均匀增加至1,目的是优先优化图处理系统整体的交流成本以及探索更多能够降低系统成本的图分区状态;当C
b<B时,tw随着迭代次数的增加从1均匀减少至0,cw随着迭代次数的增加从0均匀增加至1,目的是优先优化图处理系统整体的数据传输时间以及减缓传输时间的优化速度,从而达到更好的优化效果。
步骤14:每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号。
实际应用中,每个LA都会与其它LA进行通信,从而为其顶点生成对于所有DC的强化信号,在计算强化信号之前需要计算顶点对于所有DC的权重向量。每个LA计算完所有DC的分数后,会将最大分数对应的DC号传播给其顶点的邻居所属的LA,这些LA立刻生成相应的权重项向量。
在本实施例中,定义ρ
v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合,
为将顶点v移动至ρ
v,再将顶点u移动至ρ
v后系统整体的数据传输时间;
表示将顶点v移动至ρ
v后系统整体的数据传输时间;
表示将顶点v移动至ρ
v后系统整体的资金成本;
为将顶点v移动至ρ
v,再将顶点u移动至ρ
v后系统整体的资金成本;
表示当顶点u收到其邻居v传播的标签ρ
v时,其计算权重向量的参考标准,计算公式如下:
需要说明的是,tw、cw、sign(B-C
b)的值和步骤S13中公式(5)的值一样,因为它们在同一个迭代中。顶点u在计算完参考标准后,其权重向量更新公式如下:
在计算完顶点对于所有DC的权重向量,LA会根据权重向量计算出相应的强化信号,公式如下:
步骤15:学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代。
在本是实施例中,LA会利用步骤14中得到的权重向量以及强化信号去更新其顶点在每一个DC的概率值,从而指导下一次的动作选择。在此之前,需要先计算正则化权重,分为奖励和惩罚正则化权重两部分。
其中Neg()是取反函数。
本实施例在得到正则化权重之后,就可以开始对顶点v的概率进行更新。定义
表示顶点v在第n次迭代中对于DC i的概率,LA会首先更新顶点对于其强化信号为
的DC,更新顺序按照对于DC的奖励正则化权重从小到大进行。假设给定顶点v以及DC i,
在所有奖励正则化权重中最小,则优先使用
对所有DC进行概率更新,更新公式如下:
其中α表示奖励权重,公式(11)对DC i的概率进行了增加,对其它DC的概率进行了下调。接着,LA会依次找到更大的
再使用它对所有的DC进行概率更新。这种实施方式的有益效果是最终能够使得
最大的那个DC的概率最大。
接着,LA会更新那些顶点对于其强化信号为
的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行。假设给定顶点v以及DC i、DC k,
在所有惩罚正则化权重中最大,
在所有惩罚正则化权重中最小,则优先使用
对所有DC进行概率更新,更新公式如下:
其中β表惩罚权重,
表示顶点v在第n次迭代中对于DC j的概率,上述公式(12)对DC k的概率进行了下调,对其它DC的概率进行了增加。接着,LA会依次找到更大的
以及对应的DC k,再使用
对所有的DC进行概率更新。这种实施方式的有益效果是最终能够使得
最小的DC的概率最小。
步骤16:直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
本申请实施例如果达到最大迭代次数或者约束条件已经收敛,那么判断迭代结束。否则,进入N+1次迭代,第N+1次迭代中的动作选择会以第N次迭代更新后的概率 为参考,继续执行顶点迁移、分数计算、强化信号计算、概率更新、下一次迭代等操作,直到迭代结束,生成一个满足资金预算且数据传输时间极小的地理分布式图分割结果。
为了验证本申请实施例提供的分布图处理方法的有效性和效率,在真实云和云模拟器上采用真实图形数据集来评估,具体的使用了5种真实图:Gnutella(GN)、WikiVote(WV)、GoogleWeb(GW)、LiveJournal(LJ)和Twitter(TW),在Amazon EC2和Windows Azure两个云平台上进行真实云的实验,采用基于GAS的PowerGraph系统来执行图处理算法,包括pagerank、sssp、subgraph等经典图算法。在PowerGraph中实现了集成了本申请实施例提供的分布图处理方法,在加载时对图进行分割。真实的地理分布DCs和仿真中对真实图形的评估表明,与最先进的地理分布式图处理系统的性能以及成本优化算法Geo-Cut相比,本申请实施例提供的分布图处理方法,可以减少高达72%的DC间数据传输时间和高达63%的资金成本,而且负载比较均衡。
本申请提供的实施例可以应用到多个场景,例如:Facebook每天从世界各地的用户那里接收tb级的文本、图像和视频数据。Facebook构建了四个地理分布的DC来维护和管理这些数据。如果考虑这些DC的负载能力以及系统响应时间,可以使用本申请实施例提供的方法对图进行分割优化,可以使得DC稳定工作的同时给用户带来好的体验。如果考虑地理分布式环境下的网络异构和成本预算以及系统性能,也可以使用本申请实施例提供的方法对图进行分割优化,可以在传输时间和成本预算两个方面得到很好地性能提升。
需要说明的是,本申请实施例只是以地理分布式图切割过程系的性能以及成本优问题作为举例,对分布图处理方法的工作原理做出说明。实际上,本实施例提出的分布式图处理方法形成的处理模型是一个通用的模型,该模型不仅可以解决上述地理分布式图处理系统的性能以及成本优化问题,也能解决负载均衡以及性能优化等问题,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量计算方案。
实施例2
本申请实施例提供一种基于强化学习的通用分布式图处理系统,如图3所示,包括:
分布式图定义及约束条件设置模块10,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割。此模块执行实施例1中的步骤S10所描述的方法,在此不再赘述。
动作选择模块11,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心。此模块执行实施例1中的步骤S11所描述的方法,在此不再赘述。
顶点迁移模块12,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作。此模块执行实施例1中的步骤S12所描述的方法,在此不再赘述。
分数计算模块13,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数, 所述分数根据所述预设约束条件确定。此模块执行实施例1中的步骤S13所描述的方法,在此不再赘述。
强化信号计算模块14,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;此模块执行实施例1中的步骤S14所描述的方法,在此不再赘述。
概率更新模块15,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;此模块执行实施例1中的步骤S15所描述的方法,在此不再赘述。
分割结果获取模块16,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。此模块执行实施例1中的步骤S16所描述的方法,在此不再赘述。
本申请实施例提供的基于强化学习的通用分布式图处理系统,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件利用强化学习的方式对分布式图切割,给每一个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个系统在每个迭代过程中包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或者约束条件已经收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个通用的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。
实施例3
本申请实施例提供一种计算机设备,如图4所示,该设备可以包括处理器51和存储器52,其中处理器51和存储器52可以通过总线或者其他方式连接,图4以通过总线连接为例。
处理器51可以为中央处理器(Central Processing Unit,CPU)。处理器51还可以为其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等芯片,或者上述各类芯片的组合。
存储器52作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块,如本申请实施例中的对应的程序指令/模块。处理器51通过运行存储在存储器52中的非暂态软件程序、指令以及模块,从而执行处理器的各种功能应用以及数据处理,即实现上述方法实施例中的基于强化学习的通用分布式图处理方法。
存储器52可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储处理器51所创建的数据等。此外,存储器52可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器52可 选包括相对于处理器51远程设置的存储器,这些远程存储器可以通过网络连接至处理器51。上述网络的实例包括但不限于互联网、企业内部网、企业内网、移动通信网及其组合。
一个或者多个模块存储在存储器52中,当被处理器51执行时,执行实施例1中的基于强化学习的通用分布式图处理方法。
上述计算机设备具体细节可以对应参阅实施例1中对应的相关描述和效果进行理解,此处不再赘述。
本领域技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;存储介质还可以包括上述种类的存储器的组合。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引申出的显而易见的变化或变动仍处于本申请的保护范围之中。
Claims (12)
- 一种基于强化学习的通用分布式图处理方法,其特征在于,包括如下步骤:基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作;每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
- 根据权利要求1所述的基于强化学习的通用分布式图处理方法,其特征在于,所述预设图切割模型为hybrid-cut图切割模型,所述预设图处理模型为GAS图处理模型,利用GAS图处理模型迭代执行顶点计算,所述约束条件为资金预算成本及数据传输时间最小。
- 根据权利要求2所述的基于强化学习的通用分布式图处理方法,其特征在于,所述数据传输时间表示为收集阶段和应用阶段的数据传输时间之和,第i次迭代的数据传输时间T(i)的计算公式为:a v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量的大小;U r/D r表示DCr的上传/下载带宽;R v表示包含v的副本的数据处理中心DC的集合;数据处理中心DC之间的通信成本为在收集阶段和应用阶段的上传数据的成本之和,从DC r将数据上传至网络的单元成本为P r,所述资金预算成本表示为:约束条件为:min T(i) (3)C comm(i)≤B (4)其中,B为使用网络资源的资金预算。
- 根据权利要求3所述的基于强化学习的通用分布式图处理方法,其特征在于,初 始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:预设一试错参数τ,随机生成一个浮点数r∈[0,1],如果r≤τ,则学习自动机为其顶点随机选择一个DC;如果r>τ,则学习自动机为其顶点选择P(v i)值最大的数据处理中心DC。
- 根据权利要求4或5所述的基于强化学习的通用分布式图处理方法,其特征在于,每个学习自动机计算其顶点在每一个数据处理中心时的分数,通过以下公式计算:
- 根据权利要求6所述的基于强化学习的通用分布式图处理方法,其特征在于,每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号的步骤,包括:计算权重向量的参考标准,通过如下公式计算:其中, 表示当顶点u收到其邻居v传播的标签ρ v时,其计算权重向量的参考标准,ρ v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合; 为将顶点v移动至ρ v,再将顶点u移动至ρ v后系统整体的数据传输时间; 表示将顶点v移动至ρ v后系统整体的数据传输时间; 表示将顶点v移动至ρ v后系统整体的资金成本; 为将顶点v移动至ρ v,再将顶点u移动至ρ v后系统整体的资金成本;顶点u在计算完参考标准后,其权重向量更新公式如下:在计算完顶点对于所有数据处理中心的权重向量后,学习自动机根据权重向量计算出相应的强化信号,计算公式如下:
- 根据权利要求8所述的基于强化学习的通用分布式图处理方法,其特征在于,根据正则化权重对顶点v的概率进行更新,更新顺序按照对于数据处理中心DC的奖励正则化权重从小到大进行,给定顶点v以及DC i, 在所有奖励正则化权重中最小,优先使用 对所有DC进行概率更新,更新公式如下:接着学习自动机依次找到更大的 再使用它对所有的DC进行概率更新;学习自动机更新顶点对于其强化信号为 的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行,假设给定顶点v以及DC i、DC k, 在所有惩罚正则化权重中最大, 在所有惩罚正则化权重中最小,优先使用 对所有DC进行概率更新,更新公式如下:
- 一种基于强化学习的通用分布式图处理系统,其特征在于,包括:分布式图定义及约束条件设置模块,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;动作选择模块,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;顶点迁移模块,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作;分数计算模块,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;强化信号计算模块,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;概率更新模块,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;分割结果获取模块,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行如权利要求1-9任一项所述的基于强化学习的通用分布式图处理方法。
- 一种计算机设备,其特征在于,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行如权利要求1-9任一项所述的基于强化学习的通用分布式图处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010462112.4A CN111539534B (zh) | 2020-05-27 | 2020-05-27 | 一种基于强化学习的通用分布式图处理方法及系统 |
CN202010462112.4 | 2020-05-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021238305A1 true WO2021238305A1 (zh) | 2021-12-02 |
Family
ID=71980779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/076484 WO2021238305A1 (zh) | 2020-05-27 | 2021-02-10 | 一种基于强化学习的通用分布式图处理方法及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111539534B (zh) |
WO (1) | WO2021238305A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113726342A (zh) * | 2021-09-08 | 2021-11-30 | 中国海洋大学 | 面向大规模图迭代计算的分段差值压缩与惰性解压方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539534B (zh) * | 2020-05-27 | 2023-03-21 | 深圳大学 | 一种基于强化学习的通用分布式图处理方法及系统 |
CN113835899B (zh) * | 2021-11-25 | 2022-02-22 | 支付宝(杭州)信息技术有限公司 | 针对分布式图学习的数据融合方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2884453A1 (en) * | 2013-12-12 | 2015-06-17 | Telefonica Digital España, S.L.U. | A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network |
US9208257B2 (en) * | 2013-03-15 | 2015-12-08 | Oracle International Corporation | Partitioning a graph by iteratively excluding edges |
CN105590321A (zh) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | 一种基于块的子图构建及分布式图处理方法 |
CN106970779A (zh) * | 2017-03-30 | 2017-07-21 | 重庆大学 | 一种面向内存计算的流式平衡图划分方法 |
CN107222565A (zh) * | 2017-07-06 | 2017-09-29 | 太原理工大学 | 一种网络图分割方法及系统 |
CN109033191A (zh) * | 2018-06-28 | 2018-12-18 | 山东科技大学 | 一种面向大规模幂律分布图的分割方法 |
CN111539534A (zh) * | 2020-05-27 | 2020-08-14 | 深圳大学 | 一种基于强化学习的通用分布式图处理方法及系统 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106953801B (zh) * | 2017-01-24 | 2020-05-05 | 上海交通大学 | 基于层级结构学习自动机的随机最短路径实现方法 |
CN109889393B (zh) * | 2019-03-11 | 2022-07-08 | 深圳大学 | 一种地理分布式图处理方法和系统 |
-
2020
- 2020-05-27 CN CN202010462112.4A patent/CN111539534B/zh active Active
-
2021
- 2021-02-10 WO PCT/CN2021/076484 patent/WO2021238305A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9208257B2 (en) * | 2013-03-15 | 2015-12-08 | Oracle International Corporation | Partitioning a graph by iteratively excluding edges |
EP2884453A1 (en) * | 2013-12-12 | 2015-06-17 | Telefonica Digital España, S.L.U. | A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network |
CN105590321A (zh) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | 一种基于块的子图构建及分布式图处理方法 |
CN106970779A (zh) * | 2017-03-30 | 2017-07-21 | 重庆大学 | 一种面向内存计算的流式平衡图划分方法 |
CN107222565A (zh) * | 2017-07-06 | 2017-09-29 | 太原理工大学 | 一种网络图分割方法及系统 |
CN109033191A (zh) * | 2018-06-28 | 2018-12-18 | 山东科技大学 | 一种面向大规模幂律分布图的分割方法 |
CN111539534A (zh) * | 2020-05-27 | 2020-08-14 | 深圳大学 | 一种基于强化学习的通用分布式图处理方法及系统 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113726342A (zh) * | 2021-09-08 | 2021-11-30 | 中国海洋大学 | 面向大规模图迭代计算的分段差值压缩与惰性解压方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111539534A (zh) | 2020-08-14 |
CN111539534B (zh) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021238305A1 (zh) | 一种基于强化学习的通用分布式图处理方法及系统 | |
US11018979B2 (en) | System and method for network slicing for service-oriented networks | |
CN110968426B (zh) | 一种基于在线学习的边云协同k均值聚类的模型优化方法 | |
CN108667657B (zh) | 一种面向sdn的基于局部特征信息的虚拟网络映射方法 | |
CN113193984A (zh) | 一种空天地一体化网络资源映射方法及系统 | |
CN107710696A (zh) | 区域导向和变化容忍的快速最短路径算法和图形预处理框架 | |
CN112100450A (zh) | 一种图计算数据分割方法、终端设备及存储介质 | |
WO2021248937A1 (zh) | 一种基于差分隐私的地理分布式图计算方法及系统 | |
WO2019168692A1 (en) | Capacity engineering in distributed computing systems | |
CN114595049A (zh) | 一种云边协同任务调度方法及装置 | |
CN113821318A (zh) | 一种物联网跨域子任务组合协同计算方法及系统 | |
CN111510334B (zh) | 一种基于粒子群算法的vnf在线调度方法 | |
Xu et al. | A graph partitioning algorithm for parallel agent-based road traffic simulation | |
CN117707795B (zh) | 基于图的模型划分的边端协同推理方法及系统 | |
Garg et al. | Heuristic and reinforcement learning algorithms for dynamic service placement on mobile edge cloud | |
CN109889393B (zh) | 一种地理分布式图处理方法和系统 | |
CN115587222B (zh) | 分布式图计算方法、系统及设备 | |
KR102689100B1 (ko) | 시간 가변적 예측(anytime prediction)을 위한 얇은 하위 네트워크를 활용하는 방법 및 시스템 | |
CN116389266A (zh) | 一种基于强化学习的数字孪生网络切片的方法和装置 | |
Chen et al. | Deep reinforcement learning based container cluster placement strategy in edge computing environment | |
CN117910505A (zh) | 一种针对图神经网络的弹性训练方法 | |
CN115965070B (zh) | 计算图处理方法、装置、设备、存储介质以及程序产品 | |
CN117763214A (zh) | 一种局部社区动态检测方法、装置、电子设备及存储介质 | |
CN117992242B (zh) | 一种数据处理方法、装置、电子设备及存储介质 | |
CN118228762A (zh) | 深度神经网络推理的图替代和并行化联合优化方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21813316 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.03.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21813316 Country of ref document: EP Kind code of ref document: A1 |