WO2023005590A1 - 用于图计算的自适应控制 - Google Patents

用于图计算的自适应控制 Download PDF

Info

Publication number
WO2023005590A1
WO2023005590A1 PCT/CN2022/102927 CN2022102927W WO2023005590A1 WO 2023005590 A1 WO2023005590 A1 WO 2023005590A1 CN 2022102927 W CN2022102927 W CN 2022102927W WO 2023005590 A1 WO2023005590 A1 WO 2023005590A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
graph
historical
round
processed
Prior art date
Application number
PCT/CN2022/102927
Other languages
English (en)
French (fr)
Inventor
成强
游东海
刘志臻
梁磊
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2023005590A1 publication Critical patent/WO2023005590A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the embodiments of this specification relate to the technical field of graph computing, and in particular to an adaptive control method and system for graph computing.
  • knowledge graph as a typical graph data, has a wide range of applications in different fields, such as medical care, finance, social analysis, natural science and transportation.
  • Graph computing is a computing mode on this data structure.
  • graph computing can include rule reasoning, graph representation learning, etc.
  • the scale of graph data has grown exponentially, possibly reaching billions of nodes and trillions of edges. As the scale of graph data continues to increase, the requirements for graph computing capabilities are getting higher and higher.
  • An aspect of the embodiments of this specification provides an adaptive control method for graph computing, the method includes controlling the graph computing engine to perform multiple rounds of graph computing operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • one round of control includes: obtaining the historical calculation task amount and historical execution information of one or more rounds of graph computing operations before the current round; the historical execution information reflects the performance status of the graph computing engine when executing the corresponding round of graph computing operations; Determine the current round of computing tasks based on the amount of historical computing tasks, historical execution information, and remaining computing tasks of the graph data to be processed; send the current round of computing tasks to the graph computing engine, so that the graph computing engine is based on the graph to be processed The data executes the graph computing operation corresponding to the current round of computing tasks.
  • An aspect of the embodiments of this specification provides an adaptive control system for graph computing, the system is used to control the graph computing engine to perform multiple rounds of graph computing operations on the graph data to be processed, so as to complete the processing of the graph data to be processed, so
  • the system includes: a first acquisition module, configured to acquire the historical calculation task amount and historical execution information of one or more rounds of graph computing operations before the current round in a round of control; the historical execution information reflects the execution of the corresponding graph computing engine The performance state during the round graph calculation operation; the current round calculation task determination module is used to determine the current round calculation task based on the historical calculation task amount, historical execution information and remaining calculation tasks of the graph data to be processed in a round of control;
  • the first sending module is configured to send the current round of computing tasks to a graph computing engine in a round of control, so that the graph computing engine performs graph computing operations corresponding to the current round of computing tasks based on the graph data to be processed.
  • An aspect of the embodiments of this specification provides an adaptive control method for graph calculation, the method includes controlling the graph calculation engine to perform multiple rounds of graph calculation operations on the graph data to be processed, so as to complete the processing of the graph data to be processed, wherein A round of control, including: obtaining the historical calculation task volume and historical execution information of one or more rounds of graph computing operations before the current round; the historical execution information reflects the performance status of the graph computing engine when executing the corresponding round of graph computing operations; at least Determine the current round of computing tasks based on the historical computing tasks and historical execution information; send the current round of computing tasks to the graph computing engine, so that the graph computing engine executes the current round of computing based on the graph data to be processed Graph computing operations for task volumes.
  • An aspect of the embodiments of this specification provides an adaptive control system for graph computing, the system is used to control the graph computing engine to perform multiple rounds of graph computing operations on the graph data to be processed, so as to complete the processing of the graph data to be processed, so
  • the system includes: a second acquisition module, configured to acquire the historical calculation task amount and historical execution information of one or more rounds of graph calculation operations before the current round; the historical execution information reflects when the graph calculation engine executes the corresponding round of graph calculation operations The performance status of the current round of computing tasks; the module for determining the current round of computing tasks is configured to determine the current round of computing tasks based on at least the historical computing tasks and historical execution information; the second sending module is used to send the current round of computing tasks to the graph computing engine, so that the graph computing engine executes graph computing operations of the current round of computing tasks based on the graph data to be processed.
  • One aspect of the embodiments of this specification provides an adaptive control device for graph computing, the device includes at least one processor and at least one storage device, and the storage device is used to store instructions. When the at least one processor executes the When the instruction is executed, the device is caused to implement the method described in any one of the preceding items.
  • An aspect of the embodiments of this specification provides a graph computing device, including a graph computing engine and an adaptive control device; the graph computing engine is used for graph computing operations; the adaptive control device is used for controlling the graph computing engine to process the graph Multiple rounds of graph computing operations are performed on the data to complete the processing of the graph data to be processed, wherein the computing task or amount of computing tasks of each round of graph computing operations is specified by the adaptive control device.
  • Fig. 1 is an exemplary schematic diagram of a graph computing device according to some embodiments of the present specification
  • Fig. 2 is an exemplary flowchart of an adaptive control method for map calculation according to some embodiments of the present specification
  • Fig. 3 is an exemplary flow chart of determining a computing task amount according to some embodiments of the present specification
  • Fig. 4 is an exemplary flow chart of an adaptive control method for map calculation according to some embodiments of the present specification.
  • system means for distinguishing different components, elements, parts, parts or assemblies of different levels.
  • the words may be replaced by other expressions if other words can achieve the same purpose.
  • graph computing can include graph reasoning or graph representation learning.
  • Knowledge graph is a typical application of graph data. This manual will mainly introduce knowledge graph. Unless otherwise specified, the relevant content of knowledge graph is also applicable to other graph data.
  • a knowledge graph (or simply referred to as a graph) can refer to representing objects (for example, entities, which include concepts and attribute values) as nodes in the graph, and the relationship between objects is represented as edges connected between nodes in the graph, The acquired knowledge is then represented in a networked structure.
  • Graph reasoning can refer to inferring new facts based on known facts in existing knowledge graphs, for example, given an entity and relationship, determining another entity that has this relationship with the entity.
  • Graph representation learning can refer to learning vectorized representations of entities and relations.
  • graph computing can be implemented by using a graph computing engine to perform calculations on the nodes of the entire graph.
  • Different graph computing frameworks stipulate different computing units for graph computing.
  • the computing units can include nodes, edge or subgraph.
  • the graph computing framework completes graph computing based on computing units. Take the graph computing framework with nodes as computing units as an example. Under this graph computing framework, graph algorithms are divided into fine-grained computing operations on each node, and all nodes independently execute computing operations in parallel.
  • the knowledge graph can be split into n-hop subgraphs centered on each node, each n-hop subgraph is used as a calculation unit, and the processing of the whole graph data is completed by processing each subgraph.
  • the graph calculation operation of the whole graph can be realized by the following methods: (1) use the graph computing engine to directly calculate the whole graph; (2) divide all the computing units in the whole graph into fixed N parts, one by one Trigger the graph computing operation of the graph computing engine.
  • the above method (1) is not suitable for large-scale graph data scenarios. When the amount of data is huge, a large number of intermediate results will be generated during the calculation process of graph computing, resulting in low computing performance and even crashing the computing task.
  • the above method (2) cannot adapt to the fluctuation of computing resources well because it calculates the same number of computing units in each iterative calculation.
  • One or more embodiments of this specification disclose an adaptive control method for graph computing. Based on the historical computing task amount and historical execution information, the computing task amount of each round of iteration is adaptively determined, so as to dynamically adjust the Computing tasks of the graph computing engine during one round of iterative computing. In order to avoid the problem of too many intermediate results caused by too much graph data and too dense graph structure; and through this dynamic adjustment, it can increase the stability of the computing system when the timeliness of the task output is lower than expected during the peak period. When resources are sufficient during off-peak periods, the number of computing units is automatically increased to make full use of computing resources and approximately achieve the global optimal computing effect.
  • Fig. 1 is an exemplary schematic diagram of a graph computing device according to some embodiments of the present specification.
  • the graph computing device 100 can be used to implement one or more embodiments of this specification discloses an adaptive control method for graph computing.
  • the graph computing device 100 may be a device that performs data processing on graph structure data (ie, graph data), for example, graph data storage, graph data analysis, and the like.
  • graph structure data ie, graph data
  • Graph computing devices can be used to mine potential behaviors and connections between data. For example, the user's transfer relationship or the user's purchase behavior, etc.
  • the graph calculation device 100 can perform multiple rounds of iterative calculations based on the whole graph. Through continuous iterations, the calculation of all nodes in the whole graph can be realized, and the results of the calculations can be obtained through each round of iterations, and then the result of the whole graph can be obtained. Calculation results.
  • the graph computing device 100 may include a graph computing engine 110 and an adaptive control device 120 .
  • a graph computing engine may be used to perform graph computing operations.
  • the graph computing engine can perform multiple rounds of graph computing operations on the graph data to be processed, through which the calculation of all nodes in the graph data to be processed can be realized, so as to complete the processing of the graph data to be processed.
  • step 202 For specific details about the graph data to be processed, reference may be made to step 202 and its related descriptions, which will not be repeated here.
  • the graph computing engine may include, but not limited to, one or more combinations of the following: stand-alone full-memory graph computing engine, stand-alone memory+disk graph computing engine, distributed memory graph computing engine, and the like.
  • different graph computing engines may utilize different graph algorithms to perform graph computing operations.
  • the stand-alone full-memory graph computing engine may include the Page Rank algorithm
  • the distributed memory graph computing engine may include the Label Propagation Algorithm (LPA), etc.
  • LPA Label Propagation Algorithm
  • the physical carrier of the graph computing engine 110 can be a single server; for a distributed graph computing engine, the physical carrier of the graph computing engine 110 can be multiple servers or computing nodes.
  • a node can be a program deployed on one or more servers.
  • the graph computing engine may perform graph computing operations corresponding to each round of computing tasks based on the graph data to be processed. For specific details on executing the graph computing operations corresponding to each round of computing tasks, refer to the following step 206 and its related descriptions, which will not be repeated here.
  • the adaptive control device may be used to control the graph computing engine to perform multiple rounds of graph computing operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • the adaptive control device may specify the computing tasks or the amount of computing tasks for each round of graph computing operations. For specific details about the control method of the adaptive control device and the designated computing task or amount of computing tasks, please refer to FIG. 2 or FIG. 4 and related descriptions, which will not be repeated here.
  • Fig. 2 is an exemplary flowchart of an adaptive control method for map calculation according to some embodiments of the present specification.
  • the method may include controlling the graph calculation engine to perform multiple rounds of graph calculation operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • one round of control 200 may include steps 202 to 206 .
  • steps 202 to 206 may be performed by an adaptive control device, for example, implemented by an adaptive control system in the adaptive control device.
  • Step 202 acquiring the historical calculation task amount and historical execution information of one or more rounds of graph computing operations before the current round.
  • step 202 may be performed by the first acquiring module.
  • the graph data to be processed may be graph data currently requiring graph calculation.
  • the graph data to be processed may be large-scale graph data.
  • the graph data to be processed may be graph data including tens of millions of edges or billions of edges.
  • the amount of computing tasks may reflect the amount of data in the graph data that participates in the computing tasks. Computing tasks can be reflected from two perspectives. One is the goal of performing computations, which can be specifically determined by the actual application scenario. The other can be characterized as the data that needs to be processed to achieve the aforementioned goals, such as the data involved in computations in graph data. For example, the calculation task can be to determine the users with transfer relationship in the graph data.
  • the calculation task can be to traverse all the edges in the graph data, find the transfer relationship edges and obtain the user nodes connected to the transfer relationship edges .
  • Computing task mainly takes its second meaning in this specification.
  • computing tasks may be represented by computing units, and correspondingly, in some embodiments, the amount of computing tasks may reflect the number of computing units.
  • a computing unit may be one or more of the following: a node, an edge, and a subgraph.
  • the calculation task amount may reflect one or a combination of the following: the number of nodes participating in the current calculation, the number of edges, and the number of subgraphs.
  • the amount of historical computing tasks may be the number of computing units of one or more rounds of graph computing operations before the current round, for example, the number of nodes or edges.
  • One or more rounds of graph computing operations before the current round may be graph computing operations on the graph data to be processed. For specific details of graph calculation operations, refer to FIG. 1 and related descriptions, and details are not repeated here.
  • the historical execution information may reflect the performance status of the graph computing engine when executing corresponding round graph computing operations.
  • a performance state may be related to the amount of computation or complexity associated with a computing task.
  • different calculation tasks may be provided, and the calculation complexity of different calculation tasks may be different. For example, taking the graph data representing the relationship between commodity purchases as an example, if a certain round of computing tasks is to determine the user’s behavior of purchasing commodities, it is one-degree propagation calculation; if a certain round of computing tasks is to determine the current user’s possible shopping Behavior, it is the second-degree propagation calculation. It can be understood that the complexity of the second-degree propagation calculation is greater than that of the first-degree propagation calculation.
  • performance status may be reflected by execution time or execution speed.
  • execution time When the execution time is longer, it can be considered that the performance status is worse, and vice versa.
  • execution speed may indirectly reflect the execution time. It can be understood that the adaptive control device obtains the execution speed by acquiring historical calculation task amount and execution time. In some embodiments, the adaptive control device may also obtain its execution speed from the graph computing engine, for example, from an execution log of the graph computing engine.
  • the adaptive control device itself can determine the calculation tasks of each round, and then obtain the historical calculation task amount of the graph calculation engine in each round of graph calculation. And the adaptive control device can send each round of computing tasks to the graph computing engine, and then trigger the graph computing operation of the graph computing engine. Therefore, the adaptive control device itself can know the start time and time of each round of graph computing operations performed by the graph computing engine The end time (the time when the graph calculation engine returns the calculation result), and then, the execution time can be obtained by itself, and the execution speed can be obtained through calculation.
  • Step 204 Determine the current round of computing tasks based on the amount of historical computing tasks, historical execution information, and remaining computing tasks of the graph data to be processed. In some embodiments, step 204 may be performed by the current-round computing task determination module.
  • step 202 For specific details about the calculation task, refer to the above step 202 and related descriptions, which will not be repeated here.
  • the adaptive control device may determine the computing task amount based on the historical computing task amount and historical execution information. For specific details about the calculation task, refer to the above step 202 and related descriptions, which will not be repeated here.
  • the adaptive control device may use a preset algorithm to determine the calculation task amount based on the historical calculation task amount and historical execution information.
  • the preset algorithm may include but not limited to gradient descent algorithm and/or slow start algorithm and the like.
  • the historical computing task amount and historical execution information may come from a historical round of graph computing operations performed on the graph data to be processed, or from previous graph computing operations performed on other graph data.
  • the gradient descent algorithm can be used to determine the local optimum of the objective function.
  • the objective function may reflect the total execution time for processing the graph data to be processed or the execution time of each round of computing tasks, which is related to each round of computing tasks or the amount of computing tasks.
  • the calculation task amount of each round can be determined based on the gradient descent algorithm, so that the execution time of the current round of graph calculation operations performed based on the calculation task amount or the total execution time for completing the processing of the graph data to be processed is locally minimum, Or the execution speed is at a local maximum.
  • Slow start is originally a blocking control mechanism used by the transmission control protocol. In this specification, it can also be used to determine the amount of computing tasks for each round of graph computing to optimize the performance status of the graph computing engine. For the specific details of determining the calculation task amount by the gradient descent algorithm or the slow start algorithm, please refer to FIG. 3 and its related descriptions, which will not be repeated here.
  • the current round of computing task determination module may determine the current round of computing tasks based on the remaining computing tasks of the graph data to be processed and the amount of computing tasks.
  • the remaining computing tasks of the graph data to be processed include computing units in the graph data to be processed that do not perform graph computing. For example, taking computing units as nodes, if the graph data to be processed includes 10 million nodes, and graph computing has been performed on 8 million of them through historical graph computing operations, then the remaining computation of the graph data to be processed Tasks can include up to two million nodes. As mentioned above, nodes or edges can have different categories, therefore, the remaining computing tasks of the data to be processed can also reflect the categories of computing units that are not performing graph calculations. For example, the respective categories of two million nodes.
  • the adaptive control device may determine a number of computing units from the remaining computing tasks of the data to be processed based on the amount of computing tasks, and use them as the current round of computing tasks. For example, if the amount of computing tasks is 1000 nodes, and the remaining computing tasks of data to be processed include 20,000 user nodes and 40,000 merchant nodes, the adaptive control device can determine 300 user nodes and 700 merchant nodes from the remaining computing tasks Commodity nodes, as several computing units, and the total number of user nodes and commodity nodes is 1000. In some embodiments, several computing units may be randomly selected from the remaining computing tasks of the corresponding category to obtain the current round of computing tasks. For example, 300 user nodes may be randomly selected from 20,000 user nodes in the remaining computing tasks as user nodes for the current round of computing tasks.
  • the adaptive control device can also judge whether the calculation task amount exceeds the remaining calculation task amount of the graph data to be processed; when the judgment result is not exceeded, the current round of calculation tasks can be determined according to the aforementioned steps. It can be understood that, At this time, the number of computing units in the current round of computing tasks is the same as the amount of computing tasks; when the judgment result is exceeded, the remaining computing tasks of the graph data to be processed can be directly used as the current round of computing tasks.
  • the adaptive control device may send the calculation task amount to the graph calculation engine, and the graph calculation engine determines and executes the current round of calculation tasks based on the calculation task amount. For the specific content of this manner, refer to the relevant description of FIG. 4 .
  • Step 206 Send the current round of computing tasks to a graph computing engine, so that the graph computing engine executes a graph computing operation corresponding to the current round of computing tasks based on the graph data to be processed.
  • step 206 may be performed by the first sending module.
  • step 206 For specific details of step 206, reference may be made to FIG. 1 and its related descriptions, which will not be repeated here.
  • the computing task amount can be obtained based on preset rules, and the preset rules can be specifically set according to actual conditions.
  • the preset rule may be to determine all categories to which the computing units included in the graph data to be processed belong, and select a preset number of copies from computing nodes of each category as the computing task amount.
  • the preset number of copies may be an empirical value, for example, the preset number of copies may be a proportional value, such as 1500 or 1800th.
  • the calculation task volume may include 20,000 user nodes and 40,000 commodity nodes, and the number of nodes (that is, calculation units) reflected by the calculation task volume is 60,000.
  • the adaptive control device can take 20,000 user nodes and 40,000 commodity nodes from the graph data to be processed as the current round of computing tasks; send the current round of computing tasks to the graph computing engine, so that the graph computing engine is based on the graph data to be processed Execute the graph computing operations corresponding to the current round of computing tasks.
  • steps 202 to 206 are one round of control performed by the adaptive control device.
  • the adaptive control device can dynamically determine the amount of computing tasks in the current round based on the amount of historical computing tasks and historical execution information, so that when the output timeliness of tasks in the peak period is lower than expected, there will be back pressure to increase the performance of graph computing equipment. Stability, automatically increases the amount of computing tasks when resources are sufficient during off-peak periods, so as to make full use of computing resources and approximately achieve the global optimal computing effect of the graph computing engine.
  • the graph computing operation is only performed based on the amount of computing tasks in the current round, so as to realize the fragmentation of the graph data to be processed, and submit them to the graph computing engine one by one for graph computing, so as to avoid the large scale of the graph data to be processed and the graph structure. Too dense, leading to problems such as too many intermediate results in the calculation process or the inability to complete the calculation.
  • the adaptive control system and its modules involved in the process 200 can be implemented in various ways.
  • the device and its modules may be implemented by hardware, software, or a combination of software and hardware.
  • the hardware part can be implemented by using dedicated logic;
  • the software part can be stored in a memory and executed by an appropriate instruction executing device, such as a microprocessor or specially designed hardware.
  • an appropriate instruction executing device such as a microprocessor or specially designed hardware.
  • processor control code for example on a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read-only memory (firmware ) or on a data carrier such as an optical or electronic signal carrier.
  • the device and its modules in this specification can not only be realized by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. , can also be realized by software executed by various types of processors, for example, and can also be realized by a combination of the above-mentioned hardware circuits and software (for example, firmware).
  • Fig. 3 is an exemplary flow chart of determining a computing task amount according to some embodiments of the present specification.
  • the process 300 may include steps 302 , 304 and 306 .
  • steps 302, 304, and 306 may be performed by an adaptive control device, for example, may be implemented by a current round calculation task determination module of an adaptive control system.
  • Step 302 based on the historical computing task amounts and historical execution information of the previous two rounds of the current round, determine the gradient information of the execution information relative to the computing task amount.
  • the execution time of , n i represents the calculation amount of the i-th round of graph calculation operations.
  • t(n i ) can be considered to be a proportional function of n i .
  • the adaptive control device can determine the gradient information of the execution time with respect to the amount of computational work.
  • the gradient can be obtained by the following formula (1):
  • T i-1 and T i-2 respectively represent the execution time of two rounds of historical graph computing operations
  • n i-1 and ni-2 represent the corresponding historical computing tasks.
  • the historical calculation task amount and historical execution information of the first two rounds of the current round may be obtained, and the gradient information of the execution time relative to the calculation task amount may be calculated based on formula (1).
  • Step 304 determining an adjustment amount based on the gradient information.
  • the adaptive control device may directly determine the value represented by the gradient information as the adjustment amount. In some embodiments, the adaptive control device may determine the adjustment amount based on preset learning rates and gradients. For example, the product of the preset learning rate and the gradient is determined as the adjustment amount. In some embodiments, the adaptive control device may determine the adjustment amount through the formula p*s; wherein, p is a preset learning rate, which is a value of (0,1), and s is a gradient.
  • Step 306 based on the adjustment amount, update the historical calculation task amount of the previous round of the current round to obtain the calculation task amount.
  • the adaptive control device may complete the updating based on the difference between the historical calculation task amount and the adjustment amount of the previous round of the current round, and obtain the calculation task amount.
  • the adaptive control device may determine the computing task amount through a slow start algorithm based on the historical computing task amount and historical execution information.
  • the historical calculation task amount and historical execution time of the previous round of graph calculation can be obtained.
  • the current round of computing task volume is rapidly increased based on the historical computing task volume, for example, it is determined that the computing task volume is twice or more than the historical computing task volume .
  • the threshold window may have a preset value and be dynamically updated during the control process.
  • the historical calculation task amount of a preset ratio (such as 0.5) is used as the updated threshold window, and the calculation task amount of the current round is set to a predetermined threshold.
  • Set a small initial task amount such as 5000 computing units.
  • Fig. 4 is an exemplary flowchart of an adaptive control method for map calculation according to some embodiments of the present specification.
  • the method may include controlling the graph calculation engine to perform multiple rounds of graph calculation operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • one round of control process 400 may include steps 402 , 404 and 406 .
  • steps 402, 404 and 406 may be performed by an adaptive control device, for example, implemented by an adaptive control system disposed on the adaptive control device.
  • Step 402 obtaining the historical calculation task amount and historical execution information of one or more rounds of graph computing operations before the current round; the historical execution information reflects the performance status of the graph computing engine when executing the corresponding round of graph computing operations.
  • step 402 may be performed by the second obtaining module.
  • the calculation task amount may reflect the number of calculation units, and the calculation units are one or a combination of the following: nodes, edges, and subgraphs.
  • Step 402 is similar to step 202, and for specific details, refer to step 202 and its related descriptions, which will not be repeated here.
  • Step 404 based on at least the historical computing task amount and historical execution information, determine the current round of computing task amount.
  • step 404 may be performed by the module for determining the amount of computing tasks in the current round.
  • the adaptive control device may determine the calculation task amount based on the historical calculation task amount and historical execution information. For specific details on determining the amount of computing tasks, refer to step 204 and its related descriptions, which will not be repeated here. In some embodiments, the adaptive control device may directly determine the computing task amount as the current round computing task amount and send it to the graph computing engine. In yet other embodiments, the adaptive control device may determine the current round of computing tasks based on the remaining computing tasks of the graph data to be processed and the computing tasks.
  • the adaptive control device can judge whether the calculation task amount exceeds the remaining calculation task amount of the graph data to be processed; when the judgment result is not exceeded, the calculation task amount is determined as the current round of calculation task amount; When , the remaining computing task amount of the graph data to be processed is taken as the current round computing task amount.
  • Step 406 Send the current round of computing tasks to a graph computing engine, so that the graph computing engine performs graph computing operations for the current round of computing tasks based on the graph data to be processed.
  • step 406 may be performed by the second sending module.
  • the adaptive control device can send the current round of computing tasks to the graph computing engine, so that the graph computing engine can determine the current round of computing tasks based on the current round of computing tasks, and compare the calculation tasks with the graph data to be processed
  • the computing unit corresponding to the task performs graph computing operations.
  • the method for the graph computing engine to determine the current round of computing tasks based on the current round of tasks is similar to the process of determining the current round of computing tasks based on the amount of computing tasks in the related description of step 204 , and will not be repeated here.
  • the adaptive control system and its modules involved in the process 400 can be implemented in various ways.
  • the device and its modules may be implemented by hardware, software, or a combination of software and hardware.
  • the hardware part can be implemented by using dedicated logic;
  • the software part can be stored in a memory and executed by an appropriate instruction executing device, such as a microprocessor or specially designed hardware.
  • an appropriate instruction executing device such as a microprocessor or specially designed hardware.
  • processor control code for example on a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read-only memory (firmware ) or on a data carrier such as an optical or electronic signal carrier.
  • the device and its modules in this specification can not only be realized by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. , can also be realized by software executed by various types of processors, for example, and can also be realized by a combination of the above-mentioned hardware circuits and software (for example, firmware).
  • the embodiment of this specification also provides an adaptive control system for graph calculation, the system is used to control the graph calculation engine to perform multiple rounds of graph calculation operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • the system includes a first acquisition module, a current round computing task determination module, and a first sending module.
  • the first obtaining module can be used to obtain the historical calculation task amount and historical execution information of one or more rounds of graph computing operations before the current round in a round of control; the historical execution information reflects the execution of the corresponding round graph by the graph computing engine. The performance state of the compute operation.
  • the current-round computing task determination module can be used to determine the current round of computing tasks based on the historical computing task amount, historical execution information, and remaining computing tasks of graph data to be processed in a round of control.
  • the first sending module may be configured to send the current round of computing tasks to a graph computing engine in a round of control, so that the graph computing engine performs graph computing operations corresponding to the current round of computing tasks based on the graph data to be processed .
  • the embodiment of this specification also provides an adaptive control system for graph calculation, the system is used to control the graph calculation engine to perform multiple rounds of graph calculation operations on the graph data to be processed, so as to complete the processing of the graph data to be processed.
  • the system includes a second acquisition module, a current round calculation task amount determination module, and a second sending module.
  • the second acquisition module can be used to acquire the historical calculation task amount and historical execution information of one or more rounds of graph calculation operations before the current round; the historical execution information reflects the performance of the graph calculation engine when executing the corresponding round of graph calculation operations state.
  • the current round computing task amount determining module may be configured to determine the current round computing task amount based at least on the historical computing task amount and historical execution information.
  • the second sending module may be configured to send the current round of computing tasks to a graph computing engine, so that the graph computing engine performs a graph computing operation of the current round of computing tasks based on the graph data to be processed.
  • the graph computing adaptive control system and its modules can be implemented in various ways.
  • the system and its modules may be implemented by hardware, software, or a combination of software and hardware.
  • the hardware part can be implemented by using dedicated logic;
  • the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or specially designed hardware.
  • an appropriate instruction execution system such as a microprocessor or specially designed hardware.
  • the methods and systems described above can be implemented using computer-executable instructions and/or contained in processor control code, for example on a carrier medium such as a magnetic disk, CD or DVD-ROM, such as a read-only memory (firmware ) or on a data carrier such as an optical or electronic signal carrier.
  • the system and its modules in this specification can not only be realized by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. , can also be realized by software executed by various types of processors, for example, and can also be realized by a combination of the above-mentioned hardware circuits and software (for example, firmware).
  • hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc.
  • programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc.
  • software for example, and can also be realized by a combination of the above-mentioned hardware circuits and software (for example, firmware).
  • the embodiment of this specification also provides an adaptive control device for graph calculation, the device includes at least one processor and at least one storage device, the storage device is used to store instructions, when the at least one processor executes the When the instruction is executed, the device is caused to implement the method described in any one of the preceding items.
  • the embodiment of this specification also provides a graph computing device, including a graph computing engine and an adaptive control device; the graph computing engine is used for graph computing operations; the adaptive control device is used to control the graph computing engine to execute the graph data to be processed Multiple rounds of graph computing operations are used to complete the processing of the graph data to be processed, wherein the computing task or amount of computing tasks of each round of graph computing operations is specified by the adaptive control device.
  • aspects of this specification can be illustrated and described by several patentable categories or situations, including any new and useful process, machine, product or combination of substances, or any combination of them Any new and useful improvements.
  • various aspects of this specification may be entirely executed by hardware, may be entirely executed by software (including firmware, resident software, microcode, etc.), or may be executed by a combination of hardware and software.
  • the above hardware or software may be referred to as “block”, “module”, “engine”, “unit”, “component” or “system”.
  • aspects of this specification may be embodied as a computer product comprising computer readable program code on one or more computer readable media.
  • a computer storage medium may contain a propagated data signal embodying a computer program code, for example, in baseband or as part of a carrier wave.
  • the propagated signal may have various manifestations, including electromagnetic form, optical form, etc., or a suitable combination.
  • a computer storage medium may be any computer-readable medium, other than a computer-readable storage medium, that can be used to communicate, propagate, or transfer a program for use by being coupled to an instruction execution system, apparatus, or device.
  • Program code residing on a computer storage medium may be transmitted over any suitable medium, including radio, electrical cable, fiber optic cable, RF, or the like, or combinations of any of the foregoing.
  • the computer program codes required for the operation of each part of this manual can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python etc., conventional procedural programming languages such as C language, Visual Basic, Fortran2003, Perl, COBOL2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
  • the program code may run entirely on the user's computer, or as a stand-alone software package, or run partly on the user's computer and partly on a remote computer, or entirely on the remote computer or processing device.
  • the remote computer can be connected to the user computer through any form of network, such as a local area network (LAN) or wide area network (WAN), or to an external computer (such as through the Internet), or in a cloud computing environment, or as a service Use software as a service (SaaS).
  • LAN local area network
  • WAN wide area network
  • SaaS service Use software as a service
  • numbers describing the quantity of components and attributes are used. It should be understood that such numbers used in the description of the embodiments use the modifiers "about”, “approximately” or “substantially” in some examples. grooming. Unless otherwise stated, “about”, “approximately” or “substantially” indicates that the stated figure allows for a variation of ⁇ 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that can vary depending upon the desired characteristics of individual embodiments. In some embodiments, numerical parameters should take into account the specified significant digits and adopt the general digit reservation method. Although the numerical ranges and parameters used in some embodiments of this specification to confirm the breadth of the range are approximations, in specific embodiments, such numerical values are set as precisely as practicable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

本说明书实施例公开了一种用于图计算的自适应控制方法和系统。该方法包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。其中,一轮控制,包括:获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;基于历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务;将当前轮计算任务发送给图计算引擎,使得图计算引擎基于待处理图数据执行当前轮计算任务对应的图计算操作。

Description

用于图计算的自适应控制 技术领域
本说明书实施例涉及图计算技术领域,特别涉及一种用于图计算的自适应控制方法及系统。
背景技术
目前,知识图谱作为一种典型的图数据,在不同领域有着广泛的应用,例如,医疗、金融、社交分析、自然科学以及交通领域。图计算是在这种数据结构上的计算模式。对于知识图谱而言,图计算可以包括规则推理、图表示学习等。近年来,图数据规模呈指数级增长,可能达到数十亿的节点和数万亿的边,随着图数据规模的不断增大,对于图计算能力的要求越来越高。
为此,在大规模图数据的图计算场景下,如何高效地利用图计算设备的计算资源是亟待解决的问题。
发明内容
本说明书实施例的一个方面提供一种用于图计算的自适应控制方法,所述方法包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。其中,一轮控制包括:获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务;将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
本说明书实施例的一个方面提供一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,所述系统包括:第一获取模块,用于在一轮控制中获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;当前轮计算任务确定模块,用于在一轮控制中基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务;第一发送模块,用于在一轮控制中将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
本说明书实施例的一个方面提供一种用于图计算的自适应控制方法,所述方法包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中的一轮控制,包括:获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量;将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。
本说明书实施例的一个方面提供一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,所述系统包括:第二获取模块,用于获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;当前轮计算任务量确定模块,用于至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量;第二发送模块,用于将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。
本说明书实施例的一个方面提供一种图计算的自适应控制装置,所述装置包括至少一个处理器和至少一个存储设备,所述存储设备用于存储指令,当所述至少一个处理器执行所述指令时,导致所述装置实现如前任意一项所述的方法。
本说明书实施例的一个方面提供一种图计算设备,包括图计算引擎与自适应控制装置;所述图计算引擎用于图计算操作;所述自适应控制装置用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中每轮图计算操作的计算任务或计算任务量由所述自适应控制装置指定。
附图说明
本说明书将以示例性实施例的方式进一步描述,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构,其中:
图1是根据本说明书的一些实施例所示的图计算设备的示例性示意图;
图2是根据本说明书的一些实施例所示的用于图计算的自适应控制方法的示例性流程图;
图3是根据本说明书的一些实施例所示的确定计算任务量的示例性流程图;
图4是根据本说明书的一些实施例所示的用于图计算的自适应控制方法的示例性流程图。
具体实施方式
为了更清楚地说明本说明书实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本说明书的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本说明书应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。
应当理解,本说明书中所使用的“系统”、“装置”、“单元”和/或“模组”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。
如本说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其它的步骤或元素。
本说明书中使用了流程图用来说明根据本说明书的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
本说明书的一个或多个实施例所披露的技术方案可以应用于图计算场景。一般地,图计算可以包括图谱推理或图表示学习等。知识图谱是图数据的一种典型应用,本说明书将以知识图谱为主进行介绍,在无特别说明的情况下,知识图谱的相关内容亦适用于其他图数据。知识图谱(或简称为图谱)可以是指将对象(例如,实体,其包括概念、属性值)表示为图中的节点,对象之间的关联关系表示为图中的节点之间连接的边,进而以一个网络化的结构表征所获得的知识。图谱推理可以是指根据现有的知识图谱中的已知事实,推断出新的事实,例如,给定一个实体和关系,确定与该实体有此关系的另一个实体。图表示学习可以是指学习实体和关系的向量化表示。在一些实施例中,图计算可以通过利用图计算引擎对全图的节点进行运算实现,不同的图计算框架(即图计算引擎)约定了图计算的不同计算单元,该计算单元可以包括节点、边或子图。图计算框架基于计算单元完成图计算。以节点为计算单元的图计算框架为例,在这种图计算框架 下,图算法会被细粒度划分为每个节点上的计算操作,所有节点独立的并行执行计算操作。又例如,可以将知识图谱拆分为以每个节点为中心的n跳子图,将各n跳子图作为计算单元,通过对各子图的处理,进而完成对全图数据的处理。
在一些实施例中,可以通过以下方法实现全图的图计算操作:(1)利用图计算引擎直接对全图进行计算;(2)将全图中的所有计算单元分成固定的N份,逐一触发图计算引擎的图计算操作。然而,上述方法(1)并不适用于大规模的图数据场景,当数据量巨大时,图计算的计算过程中会产生大量的中间结果,导致计算性能低下甚至导致计算任务崩溃。上述方法(2)由于每次迭代计算时计算相同数量的计算单元,无法很好地适应计算资源的波动,例如,在集群计算高峰期时由于资源抢占严重,图计算引擎的计算性能下降,而无法及时反压(如减少计算任务以适应计算压力的增加);集群计算低谷期时又无法充分利用图计算引擎的计算资源,进而无法充分利用集群资源。本说明书的一个或多个实施例披露了一种用于图计算的自适应控制方法,基于历史计算任务量和历史执行信息,自适应的确定每一轮迭代的计算任务量,以动态调整每一轮迭代计算时图计算引擎的计算任务。进而避免由于图数据过多、图结构过稠密导致的中间结果过多的问题;且通过该动态调整可以在高峰期任务产出时效性低于期望时反压,增加计算系统的稳定性,在低峰期资源充足时自动增加计算单元的数量以充分利用计算资源,近似地达到全局最优的计算效果。以下结合附图,以对本说明书所披露的技术方案进行详细阐述。
图1是根据本说明书的一些实施例所示的图计算设备的示例性示意图。在一些实施例中,该图计算设备100可以用于实现本说明书的一个或多个实施例披露了一种用于图计算的自适应控制方法。
图计算设备100可以是对图结构数据(即图数据)进行数据处理的设备,例如,图数据存储、图数据分析等。图计算设备可以用于挖掘数据之间潜在的行为和联系。例如,用户的转账关系或用户的购买行为等。在一些实施例中,图计算设备100可以基于全图进行多轮迭代计算,通过持续迭代可以实现对全图所有节点的计算,并通过每轮的迭代得到其计算的结果,进而得到全图的计算结果。如图1所示,图计算设备100可以包括图计算引擎110与自适应控制装置120。
在一些实施例中,图计算引擎可以用于执行图计算操作。在一些实施例中,图计算引擎可以对待处理图数据执行多轮图计算操作,通过多轮图计算操作可以实现对待处理图数据中所有节点的计算,以完成对待处理图数据的处理。关于待处理图数据的具体细节可以参见步骤202及其相关描述,在此不再赘述。
在一些实施例中,图计算引擎可以包括但不限于以下的一种或多种组合:单机全内存图计算引擎、单机内存+磁盘图计算引擎、分布式内存图计算引擎等。在一些实施例中,不同的图计算引擎可以利用不同的图算法执行图计算操作。例如,单机全内存图计算引擎可以包括Page Rank算法,分布式内存图计算引擎可以包括标签传播算法(Label Propagation Algorithm,LPA)等。对于单机图计算引擎而言,图计算引擎110的物理载体可以是单个服务器,对于分布式图计算引擎而言,图计算引擎110的物理载体可以是多个服务器或多个计算节点,多个计算节点可以是部署在一个或多个服务器上的程序。在一些实施例中,图计算引擎可以基于待处理图数据执行每轮计算任务对应的图计算操作。关于执行每轮计算任务对应的图计算操作的具体细节可以参见下述步骤206及其相关描述,在此不再赘述。
在一些实施例中,自适应控制装置可以用于控制图计算引擎对待处图理数据执行多轮图计算操作,以完成对待处理图数据的处理。在一些实施例中,自适应控制装置可以指定每轮图计算操作的计算任务或计算任务量。关于自适应控制装置的控制方法及指定计算任务或计算任务量的具体细节可以参见图2或图4及其相关描述,在此不再赘述。
图2是根据本说明书的一些实施例所示的用于图计算的自适应控制方法的示例性流程图。在一些实施例中,所述方法可以包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。如图2所示,其中的一轮控制200可以包括步骤202至步骤206。在一些实施例中,步骤202至步骤206可以由自适应控制装置执行,例如,由自适应控制装置中的自适应控制系统实现。
步骤202,获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息。在一些实施例中,步骤202可以由第一获取模块执行。
在一些实施例中,待处理图数据可以是当前需要进行图计算的图数据。在一些实施例中,待处理图数据可以是大规模图数据。例如,待处理图数据可以是包括千万级别的边或亿级别的边的图数据。在一些实施例,计算任务量可以反映图数据中参与计算任务的数据的数量。计算任务可以通过两个角度来反映,其一是执行计算的目标,可由实际应用场景具体确定,其二可以表征为为达到前述目标需要处理的数据,例如图数据中参与计算的数据。例如,计算任务可以是确定图数据中具有转账关系的用户,在另一个层面,计算任务可以是需要遍历图数据中所有边,找出转账关系的边进而得到与转账关系的边连接的用户节点。计算任务在本说明书中主要取其第二中含义。在一些实施例中,可以用计算单元表征计算任务,对应的,在一些实施例中,计算任务量可以反映计算单元的数量。
在一些实施例中,计算单元可以为以下的一种或多种的组合:节点、边、子图。对应的,计算任务量可以反映以下的一种或多种的组合:参与当前计算的节点的数量、边的数量、子图的数量。
在一些实施例中,历史计算任务量可以是当前轮以前的一轮或多轮图计算操作的计算单元的数量,例如,节点或边的数量。当前轮以前的一轮或多轮图计算操作可以是对待处理图数据的图计算操作。关于图计算操作的具体细节可以参见图1及其相关描述,在此不再赘述。
在一些实施例中,历史执行信息可以反映图计算引擎执行相应轮图计算操作时的性能状态。在一些实施例中,性能状态可以与计算任务相关的计算量或复杂度相关。在相同计算对象(例如,相同的图数据)的条件下,可以具备不同的计算任务,不同计算任务的计算复杂度可能不同。例如,以表示商品购买关系的图数据为例,若某轮计算任务为确定用户购买商品的行为,其为一度传播计算,若某轮计算任务为基于邻居用户的购物行为确定当前用户可能的购物行为,其为二度传播计算,可以理解的,二度传播计算的复杂度大于一度传播计算。
在一些实施例中,性能状态可以通过执行时间或执行速度来反映。当执行时间越长,可以认为性能状态越差,反之则越好。在一些实施例中,执行速度可以间接反映执行时间,可以理解的,自适应控制装置通过获取历史计算任务量和执行时间,得到执行速度。在一些实施例中,自适应控制装置也可以从图计算引擎获取其执行速度,如从图计算引擎的执行日志中获取。
在一些实施例中,自适应控制装置自身可以确定每轮计算任务,进而得到图计算引擎在每轮图计算中的历史计算任务量。且自适应控制装置可以将每轮计算任务发送给图计算引擎,进而触发图计算引擎的图计算操作,因此,自适应控制装置自身可以知晓图计算引擎执行每轮图计算操作的起始时间和结束时间(图计算引擎返回计算结果的时间),进而,其自身可以获得执行时间,且可以通过计算得到执行速度。
步骤204,基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务。在一些实施例中,步骤204可以由当前轮计算任务确定模块执行。
关于计算任务量的具体细节可以参见上述步骤202及其相关描述,在此不再赘述。
在一些实施例中,若当前轮不是对待处理图数据的第一轮图计算操作,则自适应控制装置可以基于历史计算任务量以及历史执行信息,确定计算任务量。关于计算任务量的具体细节可以参见上述步骤202及其相关描述,在此不再赘述。在一些实施例中,自 适应控制装置可以基于历史计算任务量以及历史执行信息,利用预设算法确定计算任务量。在一些实施例中,预设算法可以包括但不限于梯度下降算法和/或慢启动算法等。在一些实施例中,历史计算任务量以及历史执行信息可以来自对所述待处理图数据进行的历史轮次的图计算操作,也可以来自此前对其他图数据进行的图计算操作。
梯度下降算法可以用于确定目标函数的局部最优值。在一些实施例中,目标函数可以反映完成待处理图数据的处理的总执行时间或每轮计算任务的执行时间,其与每轮计算任务或计算任务量相关。在一些实施例中,可以基于梯度下降算法确定每轮计算任务量,使得基于此计算任务量执行的当前轮图计算操作的执行时间或完成待处理图数据的处理的总执行时间在局部最小,或执行速度在局部最大。
慢启动原是传输控制协议使用的一种阻塞控制机制,在本说明书中也可以用于确定每轮图计算的计算任务量以优化图计算引擎的性能状态。关于梯度下降算法或慢启动算法确定计算任务量的具体细节,可以参见图3及其相关描述,在此不再赘述。
在一些实施例中,当前轮计算任务确定模块可以基于所述待处理图数据的剩余计算任务和所述计算任务量,确定当前轮计算任务。
在一些实施例中,待处理图数据的剩余计算任务包括待处理图数据中未进行图计算的计算单元。例如,以计算单元为节点为例,若待处理图数据包括一千万个节点,通过历史的图计算操作已对其中的八百万个节点进行了图计算,则待处理图数据的剩余计算任务可以包括两百万个节点。如前所述,节点或边可以具备不同的类别,因此,待处理数据的剩余计算任务还可以反映未进行图计算的计算单元的类别。例如,两百万个节点各自的类别。
在一些实施例中,自适应控制装置可以基于计算任务量从待处理数据的剩余计算任务中确定若干个计算单元,并作为当前轮计算任务。示例地,若计算任务量为1000个节点,待处理数据的剩余计算任务包括20000个用户节点以及40000个商户节点,此时自适应控制装置可以从剩余计算任务中确定300个用户节点和700个商品节点,作为若干个计算单元,且用户节点和商品节点的总数为1000。在一些实施例中,可以从对应类别的剩余计算任务中随机选取若干个计算单元,得到当前轮计算任务。例如,可以从剩余计算任务中的20000个用户节点中随机选取300个用户节点作为当前轮计算任务的用户节点。
在一些实施例中,自适应控制装置还可以判断计算任务量是否超出待处理图数据的剩余计算任务量;当判断结果为不超出时,则可以按照前述步骤确定当前轮计算任务,可以理解,此时当前轮计算任务中的计算单元数量与计算任务量相同;当判断结果为超 出时,则可以将待处理图数据的剩余计算任务直接作为当前轮计算任务。在一些实施例中,自适应控制装置可以将计算任务量发送给图计算引擎,由图计算引擎基于计算任务量自行确定当前轮计算任务,并执行。关于这种方式的具体内容可以参见图4的相关说明。
步骤206,将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。在一些实施例中,步骤206可以由第一发送模块执行。
关于步骤206的具体细节可以参见图1及其相关描述,在此不再赘述。
在一些实施例中,若当前轮为对待处理图数据的第一轮图计算操作,计算任务量可以基于预设规则获取,预设规则可以根据实际情况具体设置。在一些实施例中,预设规则可以是确定待处理图数据包括的计算单元所属的所有类别,在各个类别的计算节点中分别选取预设份数作为计算任务量。预设份数可以为经验取值,例如,预设份数可以是比例值,如五百分之一或八百分之一等。示例地,以图数据表示商品购买关系,其所有节点类别包括用户和商品为例,假设即用户节点的总数为一千万,商品节点的总数为两千万,预设份数为五百分之一,则计算任务量可以包括两万个用户节点和四万个商品节点,该计算任务量反映的节点(即计算单元)的数量为六万个。自适应控制装置可以从待处理图数据中取两万个用户节点和四万个商品节点,作为当前轮计算任务;将当前轮计算任务发送给图计算引擎,使得图计算引擎基于待处理图数据执行当前轮计算任务对应的图计算操作。
如前文所述,步骤202至步骤206为自适应控制装置执行的其中一轮控制。通过该轮控制,自适应控制装置可以基于历史计算任务量及历史执行信息,动态确定当前轮的计算任务量,使得在高峰期任务产出时效低于期望时反压,以增加图计算设备的稳定性,在低峰期资源充足时自动增大计算任务量,以充分利用计算资源,近似地达到图计算引擎的全局最优的计算效果。同时,在每轮中仅基于当前轮计算任务量进行图计算操作,实现对待处理图数据的分片,逐一提交给图计算引擎进行图计算,以避免待处理图数据的规模过大,图结构过稠密,导致的计算过程中的中间结果过多或无法完成计算等问题。
应当理解,流程200涉及的自适应控制系统及其模块可以利用各种方式来实现。例如,在一些实施例中,装置及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行装置,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和装置可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在 诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的装置及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。
图3是根据本说明书的一些实施例所示的确定计算任务量的示例性流程图。如图3所示,流程300可以包括步骤302、304及306。在一些实施例中,步骤302、304及306可以由自适应控制装置执行,例如,可以由自适应控制系统的当前轮计算任务确定模块实现。
步骤302,基于当前轮前两轮的历史计算任务量以及历史执行信息,确定执行信息相对于计算任务量的梯度信息。
在一些实施例中,可以构建反映任务总耗时的目标函数,示例性的目标函数可以是:T=∑ it(n i);T表示总耗时,t(n i)表示第i轮的执行时间,n i表示第i轮图计算操作的计算量。仅作为示例,可以认为t(n i)是n i的正比例函数。
自适应控制装置可以确定执行时间相对于计算任务量的梯度信息。在一些实施例中,可以通过下述公式(1)得到梯度
Figure PCTCN2022102927-appb-000001
s表示梯度;T i-1、T i-2分别表示两轮历史图计算操作的执行时间;n i-1、n i-2分别表示对应的历史计算任务量。
在一些实施例中,可以获取当前轮的前两轮的历史计算任务量以及历史执行信息并基于公式(1)计算得到执行时间相对于计算任务量的梯度信息。
步骤304,基于所述梯度信息确定调整量。
在一些实施例中,自适应控制装置可以将梯度信息表征的数值直接确定为调整量。在一些实施例中,自适应控制装置可以基于预设学习率和梯度确定调整量。例如,将预设学习率与梯度的乘积确定为调整量。在一些实施例中,自适应控制装置可以通过公式p*s确定调整量;其中,p为预设学习率,其为(0,1)的数值,s为梯度。
步骤306,基于所述调整量更新当前轮的前一轮历史计算任务量,得到所述计算任务量。
在一些实施例中,自适应控制装置可以基于当前轮的前一轮历史计算任务量与调整量的差值,完成所述更新,得到计算任务量。仍以上述示例为例,则计算任务量可以通 过该公式得到:n i=n i-1-p*s;其中,n i为计算任务量,n i-1为当前轮的前一轮历史计算任务量,p*s为上述调整量。
在一些实施例中,自适应控制装置可以基于历史计算任务量以及历史执行信息,通过慢启动算法,确定计算任务量。
具体的,可以获取前一轮图计算的历史计算任务量及历史执行时间。当历史执行时间小于设定阈值,且历史计算任务量小于门限窗口时,则基于历史计算任务量快速增加当前轮的计算任务量,例如确定计算任务量是历史计算任务量的两倍或更多。如果历史执行时间小于设定阈值,但历史计算任务量已超出门限窗口,则基于历史计算任务量缓慢增加当前轮的计算任务量,如在历史计算任务量的基础上增加固定数量的计算单元,如增加10000个计算单元。在一些实施例中,门限窗口可以有一个预设的值,并在控制过程中动态更新。例如,当前一轮图计算的历史执行时间超出设定阈值时,则将预设比例(如0.5)的历史计算任务量作为更新后的门限窗口,同时将当前轮的计算任务量设置为一个预设的较小的起始任务量,如5000个计算单元。
图4是根据本说明书的一些实施例所示的用于图计算的自适应控制方法的示例性流程图。在一些实施例中,所述方法可以包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。如图4所示,其中的一轮控制流程400可以包括步骤402、404及406。在一些实施例中,步骤402、404及406可以由自适应控制装置执行,例如由设置于自适应控制装置上的自适应控制系统实现。
步骤402,获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态。在一些实施例中,步骤402可以由第二获取模块执行。
在一些实施例中,计算任务量可以反映计算单元的数量,计算单元为以下一种或多种的组合:节点、边、子图。步骤402与步骤202类似,其具体细节可以参见步骤202及其相关描述,在此不再赘述。
步骤404,至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量。在一些实施例中,步骤404可以由当前轮计算任务量确定模块执行。
在一些实施例中,自适应控制装置可以基于历史计算任务量以及历史执行信息,确定计算任务量。关于确定计算任务量的具体细节可以参见步骤204及其相关描述,在此不再赘述。在一些实施例中,自适应控制装置可以直接将计算任务量确定为当前轮计算任务量,发送给图计算引擎。在又一些实施例中,自适应控制装置可以基于待处理图数据的剩余计算任务量和计算任务量,确定当前轮计算任务量。具体的,自适应控制装置 可以判断计算任务量是否超出待处理图数据的剩余计算任务量;当判断结果为不超出时,则将计算任务量确定为当前轮计算任务量;当判断结果为超出时,则将待处理图数据的剩余计算任务量作为当前轮计算任务量。
步骤406,将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。在一些实施例中,步骤406可以由第二发送模块执行。
在一些实施例中,自适应控制装置可以将当前轮计算任务量发送给图计算引擎,从而,图计算引擎可以基于当前轮计算任务量确定当前轮计算任务,并对待处理图数据中与该计算任务对应的计算单元进行图计算操作。关于图计算引擎基于当前轮任务量确定当前轮计算任务的方法类似于步骤204相关描述中的基于计算任务量确定当前轮计算任务的过程,在此不再赘述。
应当理解,流程400涉及的自适应控制系统及其模块可以利用各种方式来实现。例如,在一些实施例中,装置及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行装置,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和装置可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的装置及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。
本说明书实施例还提供一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。所述系统包括第一获取模块、当前轮计算任务确定模块、及第一发送模块。所述第一获取模块可以用于在一轮控制中获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态。所述当前轮计算任务确定模块可以用于在一轮控制中基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务。所述第一发送模块可以用于在一轮控制中将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
本说明书实施例还提供一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理。所述系统包括第二获取模块、当前轮计算任务量确定模块、及第二发送模块。所述第二获取模块可以用于获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态。所述当前轮计算任务量确定模块可以用于至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量。所述第二发送模块可以用于将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。
应当理解,图计算的自适应控制系统及其模块可以利用各种方式来实现。例如,在一些实施例中,系统及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和系统可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本说明书的系统及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。
本说明书实施例还提供一种用于图计算的自适应控制装置,所述装置包括至少一个处理器和至少一个存储设备,所述存储设备用于存储指令,当所述至少一个处理器执行所述指令时,导致所述装置实现如前任意一项所述的方法。
本说明书实施例还提供一种图计算设备,包括图计算引擎与自适应控制装置;所述图计算引擎用于图计算操作;所述自适应控制装置用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中每轮图计算操作的计算任务或计算任务量由所述自适应控制装置指定。
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露仅仅作为示例,而并不构成对本说明书的限定。虽然此处并没有明确说明,本领域技术人员可能会对本说明书进行各种修改、改进和修正。该类修改、改进和修正在本说明书中被建议,所以该类修改、改进、修正仍属于本说明书示范实施例的精神和范围。
同时,本说明书使用了特定词语来描述本说明书的实施例。如“一个实施例”、“一实 施例”、和/或“一些实施例”意指与本说明书至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本说明书的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。
此外,本领域技术人员可以理解,本说明书的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本说明书的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本说明书的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。
计算机存储介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等,或合适的组合形式。计算机存储介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机存储介质上的程序编码可以通过任何合适的介质进行传播,包括无线电、电缆、光纤电缆、RF、或类似介质,或任何上述介质的组合。
本说明书各部分操作所需的计算机程序编码可以用任意一种或多种程序语言编写,包括面向对象编程语言如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等,常规程序化编程语言如C语言、Visual Basic、Fortran2003、Perl、COBOL2002、PHP、ABAP,动态编程语言如Python、Ruby和Groovy,或其他编程语言等。该程序编码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或处理设备上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。
此外,除非权利要求中明确说明,本说明书所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本说明书流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本说明书实施例实质和范围的修正和等价组合。例如,虽然以上所描述的系统 组件可以通过硬件设备实现,但是也可以只通过软件的解决方案得以实现,如在现有的处理设备或移动设备上安装所描述的系统。
同理,应当注意的是,为了简化本说明书披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本说明书实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。但是,这种披露方法并不意味着本说明书对象所需要的特征比权利要求中提及的特征多。实际上,实施例的特征要少于上述披露的单个实施例的全部特征。
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本说明书一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。
针对本说明书引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本说明书作为参考。与本说明书内容不一致或产生冲突的申请历史文件除外,对本说明书权利要求最广范围有限制的文件(当前或之后附加于本说明书中的)也除外。需要说明的是,如果本说明书附属材料中的描述、定义、和/或术语的使用与本说明书所述内容有不一致或冲突的地方,以本说明书的描述、定义和/或术语的使用为准。
最后,应当理解的是,本说明书中所述实施例仅用以说明本说明书实施例的原则。其他的变形也可能属于本说明书的范围。因此,作为示例而非限制,本说明书实施例的替代配置可视为与本说明书的教导一致。相应地,本说明书的实施例不仅限于本说明书明确介绍和描述的实施例。

Claims (18)

  1. 一种用于图计算的自适应控制方法,所述方法包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中的一轮控制,包括:
    获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;
    基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务;
    将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
  2. 如权利要求1所述的方法,其中,计算任务量反映计算单元的数量;所述待处理图数据的剩余计算任务包括所述待处理图数据中未进行图计算的计算单元。
  3. 如权利要求2所述的方法,所述计算单元为以下一种或多种的组合:节点、边、子图。
  4. 如权利要求2所述的方法,所述基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务,包括:
    基于所述历史计算任务量以及所述历史执行信息,确定计算任务量;
    基于所述计算任务量从所述待处理数据的剩余计算任务中确定若干个计算单元,作为所述当前轮计算任务。
  5. 如权利要求4所述的方法,所述基于所述计算任务量从所述待处理数据的剩余计算任务中确定若干个计算单元,作为所述当前轮计算任务,包括:
    判断所述计算任务量是否超出待处理图数据的剩余计算任务量;
    当判断结果为不超出时,当前轮计算任务中的计算单元数量与所述计算任务量相同;
    当判断结果为超出时,当前轮计算任务包含所述待处理图数据的剩余计算任务。
  6. 如权利要求1所述的方法,所述性能状态为执行时间或执行速度。
  7. 如权利要求4所述的方法,所述基于所述历史计算任务量以及历史执行信息,确定计算任务量,包括:
    基于所述历史计算任务量以及历史执行信息,确定执行信息相对于计算任务量的梯度信息;
    基于所述梯度信息确定调整量;
    基于所述调整量更新当前轮的前一轮历史计算任务量,得到所述计算任务量。
  8. 如权利要求4所述的方法,所述基于所述历史计算任务量以及历史执行信息, 确定计算任务量,包括:
    如果当前轮前一轮的历史执行时间小于设定阈值,且历史计算任务量小于门限窗口时,则基于历史计算任务量以第一增长率确定所述计算任务量;
    如果当前轮前一轮的历史执行时间小于设定阈值,但历史计算任务量超出门限窗口,则基于历史计算任务量以第二增长率确定所述计算任务量;第一增长率大于第二增长率;
    如果当前轮前一轮的历史执行时间超出设定阈值,则将预设比例的历史计算任务量作为更新后的门限窗口,同时将所述计算任务量设置为预设的起始任务量。
  9. 如权利要求1所述的方法,其中第一轮控制包括:
    从所述待处理图数据获取预设数量的计算单元,作为当前轮计算任务;
    将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
  10. 一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,所述系统包括:
    第一获取模块,用于在一轮控制中获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;
    当前轮计算任务确定模块,用于在一轮控制中基于所述历史计算任务量、历史执行信息以及待处理图数据的剩余计算任务,确定当前轮计算任务;
    第一发送模块,用于在一轮控制中将所述当前轮计算任务发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务对应的图计算操作。
  11. 一种用于图计算的自适应控制装置,所述装置包括至少一个处理器和至少一个存储设备,所述存储设备用于存储指令,当所述至少一个处理器执行所述指令时,导致所述装置实现如权利要求1~9中任意一项所述的方法。
  12. 一种用于图计算的自适应控制方法,所述方法包括控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中的一轮控制,包括:
    获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;
    至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量;
    将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。
  13. 如权利要求12所述的方法,计算任务量反映计算单元的数量;所述计算单元 为以下一种或多种的组合:节点、边、子图。
  14. 如权利要求12所述的方法,所述至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量,包括:
    基于所述历史计算任务量以及历史执行信息,确定计算任务量;
    基于所述待处理图数据的剩余计算任务量和所述计算任务量,确定当前轮计算任务量。
  15. 一种用于图计算的自适应控制系统,所述系统用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,所述系统包括:
    第二获取模块,用于获取当前轮以前的一轮或多轮图计算操作的历史计算任务量以及历史执行信息;所述历史执行信息反映图计算引擎执行相应轮图计算操作时的性能状态;
    当前轮计算任务量确定模块,用于至少基于所述历史计算任务量以及历史执行信息,确定当前轮计算任务量;
    第二发送模块,用于将所述当前轮计算任务量发送给图计算引擎,使得图计算引擎基于所述待处理图数据执行所述当前轮计算任务量的图计算操作。
  16. 一种用于图计算的自适应控制装置,所述装置包括至少一个处理器和至少一个存储设备,所述存储设备用于存储指令,当所述至少一个处理器执行所述指令时,导致所述装置实现如权利要求12~14中任意一项所述的方法。
  17. 一种图计算设备,包括图计算引擎与自适应控制装置;
    所述图计算引擎用于图计算操作;
    所述自适应控制装置用于控制图计算引擎对待处理图数据执行多轮图计算操作,以完成对待处理图数据的处理,其中每轮图计算操作的计算任务或计算任务量由所述自适应控制装置指定。
  18. 如权利要求17所述的设备,所述自适应控制装置通过如权利要求1~9以及权利要求12~14中任一项所述的方法控制所述图计算引擎。
PCT/CN2022/102927 2021-07-27 2022-06-30 用于图计算的自适应控制 WO2023005590A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110850924.0A CN113434702A (zh) 2021-07-27 2021-07-27 一种用于图计算的自适应控制方法和系统
CN202110850924.0 2021-07-27

Publications (1)

Publication Number Publication Date
WO2023005590A1 true WO2023005590A1 (zh) 2023-02-02

Family

ID=77761958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102927 WO2023005590A1 (zh) 2021-07-27 2022-06-30 用于图计算的自适应控制

Country Status (2)

Country Link
CN (1) CN113434702A (zh)
WO (1) WO2023005590A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434702A (zh) * 2021-07-27 2021-09-24 支付宝(杭州)信息技术有限公司 一种用于图计算的自适应控制方法和系统
CN114742691B (zh) * 2022-05-19 2023-08-18 支付宝(杭州)信息技术有限公司 一种图数据采样方法和系统
CN115221211B (zh) * 2022-09-21 2023-02-28 国网智能电网研究院有限公司 一种图计算处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033476A (zh) * 2016-05-19 2016-10-19 西安交通大学 一种云计算环境中分布式计算模式下的增量式图计算方法
CN108683738A (zh) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 图数据处理方法和图数据的计算任务发布方法
US20210073028A1 (en) * 2019-09-11 2021-03-11 Google Llc Job scheduling on distributed computing devices
CN113434702A (zh) * 2021-07-27 2021-09-24 支付宝(杭州)信息技术有限公司 一种用于图计算的自适应控制方法和系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6662754B2 (ja) * 2016-11-02 2020-03-11 日本電信電話株式会社 L1グラフ計算装置、l1グラフ計算方法及びl1グラフ計算プログラム
CN109391680B (zh) * 2018-08-31 2021-07-09 创新先进技术有限公司 一种定时任务数据处理方法、装置及系统
CN112764935B (zh) * 2021-01-29 2023-06-30 中国平安人寿保险股份有限公司 大数据处理方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033476A (zh) * 2016-05-19 2016-10-19 西安交通大学 一种云计算环境中分布式计算模式下的增量式图计算方法
CN108683738A (zh) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 图数据处理方法和图数据的计算任务发布方法
US20210073028A1 (en) * 2019-09-11 2021-03-11 Google Llc Job scheduling on distributed computing devices
CN113434702A (zh) * 2021-07-27 2021-09-24 支付宝(杭州)信息技术有限公司 一种用于图计算的自适应控制方法和系统

Also Published As

Publication number Publication date
CN113434702A (zh) 2021-09-24

Similar Documents

Publication Publication Date Title
WO2023005590A1 (zh) 用于图计算的自适应控制
KR102170105B1 (ko) 신경 네트워크 구조의 생성 방법 및 장치, 전자 기기, 저장 매체
Walsh et al. Exploring compact reinforcement-learning representations with linear regression
US11200511B1 (en) Adaptive sampling of training data for machine learning models based on PAC-bayes analysis of risk bounds
CN112513886B (zh) 信息处理方法、信息处理装置和信息处理程序
CN110852438A (zh) 模型生成方法和装置
CN113361721B (zh) 模型训练方法、装置、电子设备、存储介质及程序产品
US11657118B2 (en) Systems and methods for learning effective loss functions efficiently
US10275423B2 (en) Method and system for continuous optimization using a binary sampling device
KR101935765B1 (ko) 적응식 가중을 이용하는 상이한 문서―유사도 계산 방법들에 기초하여 문서들을 비교하기 위한 방법 및 시스템
US20200241878A1 (en) Generating and providing proposed digital actions in high-dimensional action spaces using reinforcement learning models
CN115081598B (zh) 算子处理方法及装置、电子设备、计算机可读存储介质
CN114707654B (zh) 基于人工智能框架的算法训练推理性能可视化方法及装置
US20190096102A1 (en) Interactive tool for causal graph construction
US20200151615A1 (en) Machine learning based process flow engine
TWI758223B (zh) 具有動態最小批次尺寸之運算方法,以及用於執行該方法之運算系統及電腦可讀儲存媒體
CN114080609A (zh) 基于编码知识的非线性因果建模
JP2020205073A (ja) データセットの属性を予測するためのデータセットの正規化
CN109886299B (zh) 一种用户画像方法、装置、可读存储介质及终端设备
US12008125B2 (en) Privacy filters and odometers for deep learning
US20170344883A1 (en) Systems and methods for control, analysis, and/or evaluation of dynamical systems
JP2016018323A (ja) パラメータ推定方法、装置、及びプログラム
Bäck et al. Evolution strategies
US11249684B2 (en) Computation of solution to sparse matrix
CN109492759B (zh) 神经网络模型预测方法、装置和终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22848195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE