CN111935005A

CN111935005A - Data transmission method, device, processing equipment and medium

Info

Publication number: CN111935005A
Application number: CN202010793589.0A
Authority: CN
Inventors: 姜曦楠; 朱子霖; 周飞虎; 郭振宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2020-11-13
Anticipated expiration: 2040-08-07
Also published as: CN111935005B

Abstract

The embodiment of the invention discloses a data transmission method, a data transmission device, processing equipment and a medium based on a cloud technology, wherein the method comprises the following steps: obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; according to the reachability information of each target node, aggregating at least two target nodes into a target aggregation node; updating the computational graph by using the target aggregation node, and sending the updated computational graph to a computing device, where the updated computational graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the calculation process of the object according to the indication of the target aggregation node, and transmits an aggregation result. The embodiment of the invention can instruct the computing equipment to carry out aggregate transmission on the execution result data of the target node through the updated computation graph, thereby reducing the times of data transmission, saving network resources and shortening the total transmission time.

Description

Data transmission method, device, processing equipment and medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to the field of computer technologies, and in particular, to a data transmission method, a data transmission device, a processing device, and a computer storage medium.

Background

In mathematical graph theory, a graph is used to express an abstraction of a relationship between objects, which is mainly composed of nodes representing the objects and edges representing the relationship between the objects; a Graph in which each edge has a direction may be referred to as a Directed Graph (Directed Graph). With the development of the graph technology and the internet technology, the calculation graph is generated; the computation Graph may also be referred to as a Data Flow Graph (Data Flow Graph), which is a directed Graph used for Data Flow computation for characterizing a target object. The nodes in the calculation graph are used for representing data processing operations involved in the process of calculating the target object, and one data processing operation corresponds to one execution result data; edges in the computational graph are used to represent dependencies between data processing operations (nodes), such as data dependencies and control dependencies. The computational graph will typically have some special target nodes that represent data processing operations that require the transfer of execution result data.

At present, before a computing device computes a target object, a computation graph of the target object is usually constructed, and the constructed computation graph is directly sent to the computing device; and in the process of calculating the target object, the computing equipment directly transmits corresponding execution result data after executing the data processing operation represented by one target node. Such a data transmission mode may result in excessive data transmission times and excessive consumption of network resources; also, each transmission typically has a network delay, which also results in a longer overall transmission duration.

Disclosure of Invention

Embodiments of the present invention provide a data transmission method, an apparatus, a processing device, and a medium, which may instruct a computing device to perform aggregate transmission on execution result data of a target node through an updated computation graph, so as to reduce the number of data transmission times, save network resources, and shorten the total transmission time.

In one aspect, an embodiment of the present invention provides a data transmission method, where the method includes:

obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and the execution result data of the data processing operation represented by each target node needs to be transmitted; wherein, the reachability information of any target node is used for indicating that: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

aggregating at least two target nodes into a target aggregation node according to the reachability information of each target node, wherein the target aggregation node is used for indicating that execution result data of data processing operations represented by the aggregated target nodes are aggregated;

updating the computational graph by using the target aggregation node, and sending the updated computational graph to a computing device, where the updated computational graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object according to the indication of the target aggregation node, and transmits an aggregation result.

In another aspect, an embodiment of the present invention provides a data transmission apparatus, where the apparatus includes:

an acquisition unit configured to acquire reachability information of a plurality of target nodes in a computation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and the execution result data of the data processing operation represented by each target node needs to be transmitted; wherein, the reachability information of any target node is used for indicating that: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

an aggregation unit configured to aggregate at least two target nodes into a target aggregation node according to reachability information of each target node, the target aggregation node being configured to instruct aggregation of execution result data of data processing operations represented by the aggregated target nodes;

a processing unit, configured to update the computation graph with the target aggregation node, and send the updated computation graph to a computing device, where the updated computation graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the calculation process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In an embodiment, when the aggregating unit is configured to aggregate at least two target nodes into a target aggregation node according to the reachability information of each target node, the aggregating unit may be specifically configured to:

extracting aggregation level information according to the reachability information of each target node; the aggregation level information includes: n layers of node groups required by aggregation, wherein N is a positive integer; each node group includes at least one of: the target node and an aggregation node aggregated by at least two target nodes; and the reachability information of each node in each node group meets reachability conditions;

and performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information to obtain target aggregation nodes.

In another embodiment, when the aggregation unit is configured to extract the aggregation level information according to the reachability information of each target node, the aggregation unit may be specifically configured to:

selecting nodes of which the reachability information meets the reachability condition from the node set related to the ith layer aggregation, and adding the selected nodes to at least one ith node group required by the ith layer aggregation; an ith node group corresponds to an ith aggregation node, and the value of i belongs to [1, N ]; when the value of i is 1, the node set related to the layer 1 aggregation comprises the plurality of target nodes;

replacing the selected nodes in the node set by the ith aggregation nodes corresponding to the ith node groups to update the node set;

and if the node with the reachability information meeting the reachability condition exists in the updated node set, performing an addition operation on the current value of i to update i, and performing a step of selecting the node with the reachability information meeting the reachability condition from the node set related to the ith layer aggregation.

In another embodiment, each node in the node set corresponding to the ith layer forms a directed graph corresponding to the ith layer, and the value of i belongs to [1, N ]; the reachability information of any node in any node group of the i-th layer includes at least one of: a reachable node corresponding to the any node and a reachable node corresponding to the any node;

wherein, the reachable node corresponding to any node is: a node reached by the any node through at least one edge in the directed graph of the ith layer; the reachable node corresponding to any node is as follows: reaching a node of the any node through at least one edge in the directed graph of the ith layer;

the step of enabling the reachability information of each node in each node group to meet the reachability condition includes: and according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of each node in each node group, calculating that the node affinity among the nodes in each node group is greater than an affinity threshold value.

In another embodiment, when the aggregation unit is configured to perform at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information to obtain a target aggregation node, the aggregation unit may be specifically configured to:

determining at least one nth node group required by nth layer aggregation according to the aggregation level information, and determining the traffic sum of each nth node group according to the traffic of each node in each nth node group; n belongs to [1, N ];

selecting an nth node group with the traffic sum smaller than or equal to a traffic threshold value from the at least one nth node group; and carrying out aggregation processing on each node in the selected nth node group to obtain an nth aggregation node;

and if the current value of N is less than N and the sum of the communication traffic of each (N + 1) th node group required by the (N + 1) th layer aggregation acquired according to the aggregation level information is greater than the communication traffic threshold, acquiring a target aggregation node according to the N aggregation node.

In yet another embodiment, the polymeric unit is further specifically useful for:

if the current value of N is smaller than N and the sum of the communication volume of at least one N +1 th node group is smaller than or equal to the communication volume threshold, performing an operation of adding one to the current value of N to update N, and performing a step of determining at least one N-th node group required by the N-th layer aggregation according to the aggregation level information;

and if the current value of N is equal to N, obtaining a target aggregation node according to the nth aggregation node.

In another embodiment, when the aggregation unit is configured to obtain the target aggregation node according to the nth aggregation node, the aggregation unit may be specifically configured to:

if the value of n is 1, taking the 1 st aggregation node as a target aggregation node;

if the value of n is not 1, acquiring at least one historical aggregation node obtained by the previous n-1 layer aggregation, and selecting a historical aggregation node which is not subjected to aggregation processing from the at least one historical aggregation node, wherein the nth aggregation node is used as the target aggregation node.

In another embodiment, the obtaining unit, when configured to obtain the reachability information of the target nodes in the computation graph of the target object, may be specifically configured to:

acquiring a target directed graph formed by a plurality of target nodes in a computational graph of a target object;

and acquiring the reachability information of the target nodes based on the target directed graph.

In another embodiment, the obtaining unit, when configured to obtain a target directed graph composed of a plurality of target nodes in a computation graph of a target object, may be specifically configured to:

obtaining a computational graph of a target object, the computational graph comprising the following computational nodes: a plurality of target nodes and non-target nodes;

calculating a target reachable matrix comprising the plurality of target nodes according to the topological relation of the calculation graph; the target reachable matrix is used for indicating the reachability relation among target nodes;

and according to the construction principle of the minimum number of edges, constructing the target directed graph formed by the plurality of target nodes according to the target reachable matrix.

In another embodiment, when the obtaining unit is configured to calculate the target reachable matrix including the plurality of target nodes according to the topological relation of the computation graph, the obtaining unit may be specifically configured to:

calculating an adjacency matrix containing each calculation node in the calculation graph according to the topological relation of the calculation graph; the adjacency matrix is used for indicating the connection relation between the computing nodes in the computation graph;

solving a transfer closure for the adjacent matrix to obtain an initial reachable matrix containing each computing node in the computing graph; the initial reachable matrix is used for indicating reachability relation among the computing nodes in the computing graph;

and removing the non-target nodes in the initial reachable matrix to obtain a target reachable matrix containing the plurality of target nodes.

In another embodiment, when the processing unit is configured to update the computation graph with the target aggregation node, the processing unit may specifically be configured to:

adding the target aggregation node in the calculation graph, and connecting the target aggregation node and the aggregated target node by adopting a directed edge;

adding a matched communication node for the target node which is not aggregated in the computational graph, and adding a matched communication node for the target aggregation node in the computational graph; the communication node is configured to represent a data transfer operation.

In another embodiment, the target object includes a neural network model to be subjected to distributed machine learning, and the execution result data of the data processing operation represented by each target node includes: gradients generated by the neural network model in the distributed machine learning.

In another aspect, an embodiment of the present invention provides a processing device, where the processing device includes an input interface and an output interface, and the processing device further includes:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

In yet another aspect, an embodiment of the present invention provides a computer storage medium, where one or more instructions are stored, and the one or more instructions are adapted to be loaded by a processor and execute the following steps:

According to the embodiment of the invention, at least two target nodes can be aggregated into a target aggregation node according to the reachability information of the target nodes in the calculation graph of the target object; the target aggregation node is used for indicating the aggregation of the execution result data of the data processing operation represented by the aggregated target node. The computational graph may then be updated with the target aggregation node and the updated computational graph may be sent to the computing device. In the process of calculating the target object, after the data processing operation represented by the aggregated target node is executed, the computing device can perform aggregated transmission on the execution result data of the data processing operation represented by the aggregated target node according to the indication of the target aggregation node, so that the number of data transmission is reduced, network resources are saved, and the total transmission time is shortened.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1a is a schematic structural diagram of a data transmission system according to an embodiment of the present invention;

FIG. 1b is a block diagram of a data transmission system according to another embodiment of the present invention;

fig. 2 is a schematic flowchart of a data transmission method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a computational graph provided by an embodiment of the present invention;

fig. 4 is a flowchart illustrating a data transmission method according to another embodiment of the present invention;

FIG. 5a is a schematic diagram of a adjacency matrix according to another embodiment of the present invention;

FIG. 5b is a diagram illustrating an initial reachable matrix according to another embodiment of the present invention;

FIG. 5c is a diagram illustrating a target reachable matrix according to another embodiment of the invention;

FIG. 5d is a schematic diagram illustrating the construction of a target directed graph according to another embodiment of the present invention;

fig. 5e is a schematic diagram of a first aggregation matrix and a first directed graph according to another embodiment of the present invention;

FIG. 5f is a schematic diagram of a second aggregation matrix and a second directed graph according to another embodiment of the present invention;

FIG. 5g is a diagram illustrating an aggregate hierarchy information according to another embodiment of the present invention;

fig. 5h is a schematic diagram of generating a target aggregation node according to another embodiment of the present invention;

fig. 5i is a schematic diagram of an addition target aggregation node according to another embodiment of the present invention;

fig. 5j is a schematic diagram of an add communication node according to another embodiment of the present invention;

fig. 6 is a schematic view of an application scenario of distributed machine learning according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a data transmission apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a processing device according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

In order to better transmit execution result data of data processing operations represented by each target node in a calculation process of a target object, an embodiment of the present invention first provides a data transmission system. The target object refers to any object involved in multiple data processing operations in the calculation process, for example, the target object may be a neural network model involved in multiple data processing operations such as convolution operation and pooling operation in the model training process; for another example, the target object may be an application program that involves multiple data processing operations such as a test operation on the application function 1, a test operation on the application function 2, and the like in the application test process; as another example, the target object may be a hardware device that involves multiple data processing operations such as a test operation on the module 1, a test operation on the module 2, and the like during a hardware test process.

Specifically, the data transmission system may include: a processing device 11 and one or more computing devices 12; the processing device 11 and the computing devices 12 may communicate with each other. The processing device 11 is mainly configured to generate and update a computation graph (i.e., a dataflow graph) of a target object, and send the computation graph to each computing device 12; which may be any terminal or server having data processing capabilities. The computing device 12 is mainly configured to execute multiple data processing operations on a target object, and transmit execution result data of part or all of the data processing operations according to an instruction of a computation graph; which may be any terminal or server having data computing functionality as well as communication functionality. In one specific implementation, when each computing device 12 is configured to transmit execution result data of part or all of the data processing operations according to the instructions of the computation graph, the execution result data of the part or all of the data processing operations may be transmitted back to the processing device 11, so that the processing device 11 may perform subsequent processing on the target object according to the execution result data sent by each computing device 12, such as model updating processing, application test analysis processing, module test analysis processing, and the like; in this embodiment, the system architecture of the data transmission system can be seen in fig. 1 a. In another specific implementation, when each computing device 12 is configured to transmit execution result data of part or all of the data processing operations according to the instruction of the computation graph, the execution result data of the part or all of the data processing operations may be transmitted to another management device 13, so that the management device 13 may perform subsequent processing according to the execution result data sent by each computing device 12; in this embodiment, the system architecture of the data transmission system can be seen in fig. 1 b. For convenience of illustration, the system architecture shown in FIG. 1b will be described later.

It should be noted that fig. 1a and fig. 1b are only exemplary and represent a specific architecture of the data transmission system, and are not limited thereto. For example, both FIG. 1a and FIG. 1b are physically deployed with a single processing device 11 to perform the computation graph generation and update operations; however, in other embodiments, any one of the plurality of computing devices 12 may be used as a processing device to perform the operation of generating and updating the computation graph; in this case, it is not necessary to separately deploy one processing apparatus 11. It should also be noted that the above mentioned terminals may include but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, and the like. The above-mentioned server may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

Based on the data transmission system, the embodiment of the invention also provides a data transmission scheme. Specifically, the general principle of the data transmission scheme is as follows: the processing device may aggregate target nodes having the same or similar reachability into one aggregation node by comparing the reachability of each target node that needs to transmit synchronization data (i.e., execution result data) in the computation graph of the target object, and update the computation graph with the aggregation node. Wherein reachability refers to the ability of one target node to reach another target node along a series of edges in the computational graph; a target node a is considered reachable to a target node B if one target node a can reach another target node B through a series of edges. Otherwise, the target node a is considered unreachable to the target node B. The aggregation node is used for indicating the aggregation of the execution result data of the data processing operation represented by the aggregated target node. Then, the updated calculation graph can be issued to each computing device; and enabling each computing device to perform aggregate transmission on the execution result data of the data processing operation represented by the aggregated target node according to the indication of the aggregation node in the process of computing the target object. Therefore, the data transmission scheme provided by the embodiment of the invention can realize the aggregation transmission of the execution result data corresponding to at least two target nodes, so that the data transmission times can be effectively reduced, the network resources can be saved, and the total transmission time can be shortened.

Based on the above description, an embodiment of the present invention proposes a data transmission method that can be executed by the above-mentioned processing device. Referring to fig. 2, the data transmission method may include the following steps S201 to S203:

s201, reachability information of a plurality of target nodes in the computation graph of the target object is acquired.

In the embodiment of the present invention, each target node may be configured to represent one data processing operation that needs to be executed by the target object in the calculation process, and the execution result data of the data processing operation represented by each target node needs to be transmitted. Wherein, the reachability information of any target node is used for indicating that: the ability of any target node to reach other target nodes along at least one edge in the computational graph, and the ability of any target node to be reached by other target nodes along at least one edge in the computational graph. Specifically, the reachability information of any target node may include at least one of the following: the reachable target node reached by the any target node through at least one edge in the computational graph, and the reachable target node reached by the any target node through at least one edge in the computational graph.

Taking the computation graph shown in fig. 3 as an example, the computation graph may include a plurality of computation nodes, where a number on each computation node is used to represent an operation duration, a connection line (i.e., a directed edge) between the computation nodes represents a dependency relationship, and a number on the connection line is used to represent a number of the directed edge. Wherein, the black calculation nodes in the calculation graph are all target nodes; assuming that any target node is the target node L, since the target node L can reach the target node O through the two edges numbered 29 and 30 in the computational graph, the reachable target node of the target node L can include the target node O. Since the target node E can reach the target node L by calculating the two edges numbered 14 and 24 in the graph, the reachable target node of the target node L can include the target node E; and since the target node B can reach the target node L by calculating two edges numbered 6, 14 and 24 in the graph, the reachable target node of the target node L can also include the target node B. Then, the reachability information of the target node L may BE (O, BE). Based on this, the reachability information of other target nodes in the computation graph shown in fig. 3 can also be obtained as follows: the reachability information of the target node B may BE (EGJLO, x), the reachability information of the target node E may BE (GJLO, B), the reachability information of the target node G may BE (JO, BE), the reachability information of the target node J may BE (O, BEG), and the reachability information of the target node O may BE (×, BEGJL). Where "x" indicates null, i.e., no corresponding target node.

In the embodiment of the present invention, for any two target nodes, after filtering out the two target nodes from the reachability information of the two target nodes, the reachable target node and the reachable target node in the filtered reachability information of the two target nodes are the same; the reachability information of the two target nodes can be considered to be the same. For example, the computation graph shown in fig. 3 is still used as an example: for the target node G and the target node J, the reachability information of the target node G is (JO, BE), and the reachability information of the target node J is (O, BEG). If the target node J is filtered out from the reachability information of the target node G, the filtered reachability information of the target node G is obtained as (O, BE); if the target node G is filtered out of the reachability information of the target node J, the filtered reachability information of the target node J is obtained as (O, BE). Since the filtered reachability information of the target node G and the filtered reachability information of the target node J are the same, the reachability information of the target node G and the target node J can be considered to be the same. Similarly, for any two target nodes, if the two target nodes are filtered from the reachability information of the two target nodes, and the filtered reachability information of the two target nodes includes more same reachable target nodes and more reachable nodes, the reachability information of the two target nodes may be considered similar.

S202, according to the reachability information of each target node, aggregating at least two target nodes into a target aggregation node.

Research shows that, in the calculation process of a target object, if a reachable target node (e.g., target node J) and a reachable target node (e.g., target node E) exist in a certain target node (e.g., target node G), the data processing operation represented by the target node (e.g., target node G) may be executed only after the data processing operation represented by the reachable target node (e.g., target node E) is executed, and then the data processing operation of the reachable target node (e.g., target node J) is executed. It can be seen that for the target node (e.g., target node G), the target node (e.g., target node G) has the following dependencies: the execution of the data processing operation represented by the target node (e.g., target node G) depends on the reachable target node (e.g., target node E), and the execution of the data processing operation represented by the target node (e.g., target node G) depends on the reachable target node of the target node (e.g., target node J). However, it has been further shown that the target nodes having the same reachability information (or similar reachability information) generally have the same (or similar) dependency relationship, that is, the data processing operations represented by the target nodes having the same reachability information (or similar reachability information) need to be executed by the same reachable target nodes and be depended by the same reachable target nodes. Based on this, the embodiment of the present invention may aggregate target nodes having the same reachability information (or similar reachability information) into one target aggregation node according to the reachability information of each target node, so as to instruct the computing device to subsequently perform aggregate transmission on the execution result data of the data processing operations represented by these target nodes having the same reachability information (or similar reachability information) through the target aggregation node. I.e. the target aggregation node may be used to indicate that the execution result data of the data processing operations represented by the aggregated target nodes is aggregated.

In one specific implementation, at least one pair of node pairs with the same reachability information can be found from the plurality of target nodes directly according to the reachability information of each target node; and then, respectively aggregating the target nodes in the searched node pairs to obtain at least one target aggregation node. For example, still taking the computation graph shown in fig. 3 as an example, two pairs of node pairs having the same reachability information can be found from the plurality of target nodes according to the reachability information of each target node mentioned in step S201, which are respectively: a node pair consisting of target node B and target node E, and a node pair consisting of target node G and target node J. Then, the target nodes in the two pairs of node pairs may be aggregated separately, and two target aggregation nodes may be obtained: a target aggregation node (BE) and a target aggregation node (GJ). Similarly, in another specific implementation, at least one node group with similar reachability information can be found from the plurality of target nodes directly according to the reachability information of each target node; then, respectively aggregating the target nodes in the searched node group to obtain at least one target aggregation node; it should be noted that the number of target nodes included in the node group is at least two.

In another specific implementation, after the target nodes with the same reachability information or similar reachability information are aggregated once, the reachability information of the target nodes which are not aggregated may be changed; in this case, there may be at least two unaggregated target nodes whose changed reachability information is the same or similar, or there may be at least two unaggregated target nodes whose changed reachability information is the same or similar to the reachability information of the aggregated nodes. In this case, the at least two unaggregated target nodes, or the unaggregated target nodes and the aggregated nodes, may be further aggregated for the second time; and repeating the steps until the changed reachability information of the target nodes which are not aggregated is the same or similar, and the changed reachability information of the target nodes which are not aggregated is the same or similar to the reachability information of the aggregated nodes, so as to obtain the aggregated nodes. That is, in this specific implementation, the processing device may perform multiple aggregation iteration processes on multiple target nodes according to the reachability information of each target node, so as to obtain a target aggregation node.

It should be noted that the iteration stop conditions of the above-mentioned multiple aggregation iteration process are: until there are no more unaggregated target nodes and aggregated nodes with the same or similar reachability information. In other embodiments, the iteration stop condition of the multiple aggregation iteration process may also be: the sum of the traffic of each node required for current aggregation is greater than the traffic threshold, and other conditions are not limited in this embodiment of the present invention. In addition, in the process of carrying out multiple times of aggregation iterative processing on the target nodes according to the reachability information of each target node, the processing equipment can directly search the nodes to be aggregated in real time according to the reachability information of each target node and directly aggregate the searched nodes in real time; and when node aggregation is performed once, whether the iteration stopping condition is met or not is judged in real time. Or, the processing device may also extract the aggregation level information according to the reachability information of each target node without considering the iteration stop condition; the aggregation level information is used to indicate nodes required for each layer of aggregation. And then, performing at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information, and judging whether an iteration stop condition is met or not when each layer of aggregation iterative processing is executed in the process.

S203, updating the calculation graph by adopting the target aggregation node, and sending the updated calculation graph to the computing equipment; the updated computation graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object according to the indication of the target aggregation node, and transmits the aggregation result.

Fig. 4 is a schematic flow chart of another data transmission method according to an embodiment of the present invention. The data transmission method may be performed by the above-mentioned processing device. Referring to fig. 4, the data transmission method may include the following steps S401 to S405:

s401, reachability information of a plurality of target nodes in the computation graph of the target object is acquired.

In an embodiment of the present invention, the computation graph of the target object may include the following compute nodes: a plurality of target nodes and non-target nodes; each compute node may be used to represent a data processing operation that a target object needs to be performed during the computation process. The target node is a computing node which indicates that the execution result of the data processing operation needs to be transmitted, and the non-target node is a computing node which indicates that the execution result of the data processing operation does not need to be transmitted. The reachability information for any target node may include at least one of: the reachable target node reached by the any target node through at least one edge in the computational graph, and the reachable target node reached by the any target node through at least one edge in the computational graph. In a specific implementation, the specific implementation of step S401 may include the following steps:

the first implementation mode comprises the following steps: and directly acquiring the reachability information of the target nodes in the calculation graph according to the calculation graph of the target object. Specifically, for any target node, all target nodes except the target node in the computational graph may be traversed. If the currently traversed target node is detected to reach any one target node along at least one edge in the calculation graph, taking the currently traversed target node as a reachable target node of the any one target node and adding the reachable target node into reachability information of the any one target node; and if the any target node reaches the currently traversed target node along at least one edge in the computational graph, adding the currently traversed target node as a reachable target node of the any target node to the reachability information of the any target node. After all the target nodes except any target node in the calculation graph are traversed, the reachability information of any target node can be obtained.

The second embodiment: an adjacency matrix containing each computation node in the computation graph may be computed according to the topological relation of the computation graph of the target object, as shown in fig. 5 a. Wherein the adjacency matrix can be used for indicating the connection relation between the computation nodes in the computation graph; specifically, if the element in the x-th row and the y-th column in the adjacency matrix is a non-zero element, it may be indicated that the target node corresponding to the x-th row and the target node corresponding to the y-th column are connected in the computational graph, that is, the target node corresponding to the x-th row may reach the target node corresponding to the y-th column through one directed edge in the computational graph. It can be seen that, if the elements in the x-th row and the y-th column in the adjacency matrix are non-zero elements, it can be shown that the target node corresponding to the x-th row is the reachable target node of the target node corresponding to the y-th column, and the target node corresponding to the y-th column is the reachable target node of the target node corresponding to the x-th row. Wherein x and y are both greater than 0 and less than or equal to the number of compute nodes. Reachability information of a plurality of target nodes in the computation graph may then be obtained directly from the adjacency matrix. It should be noted that fig. 5a is a calculation diagram illustrating a target object and a corresponding adjacency matrix; and is not limited thereto. For example, each directed edge in the computational graph shown in FIG. 5a has a corresponding number; however, in other embodiments, each directed edge may not be labeled with a number. In this case, the adjacency matrix may use only "0" and "1" to represent the connection relationship between the computation nodes; where "0" indicates unconnected and "1" indicates connected. That is, if the target node corresponding to the x-th row and the target node corresponding to the y-th column are connected in the calculation graph, the element of the x-th row and the y-th column in the adjacency matrix is "1".

The third embodiment is as follows: a target directed graph, which is composed of a plurality of target nodes in a computational graph of a target object, may be obtained first. In a specific implementation, the non-target nodes may be directly deleted from the computation graph, and the connection relationship between the target nodes may be adjusted according to the deleted non-target nodes, so as to obtain the target directed graph formed by a plurality of target nodes. In yet another specific implementation, a computational graph of the target object may be obtained; and according to the topological relation of the calculation graph, calculating a target reachable matrix comprising a plurality of target nodes, wherein the target reachable matrix is used for indicating the reachability relation between the target nodes. In a specific implementation process, an adjacency matrix including each computation node in the computation graph may be computed according to the topological relation of the computation graph. And then, solving the transitive closure of the adjacency matrix to obtain an initial reachable matrix containing each computing node in the computation graph, wherein the initial reachable matrix can be used for indicating the reachability relation among the computing nodes in the computation graph. The transitive closure is the minimum transitive relationship containing the transitive relationship between any two nodes; the transitive closure is called: and searching for the computing nodes with the transfer relation according to the connection relation indicated by the adjacency matrix, and determining the reachability relation among the computing nodes according to the searched transfer relation of the computing nodes.

For example, from the adjacency matrix shown in fig. 5 a: the computing node A is connected with the computing node C, and the computing node C is connected with the computing node F; then computing node C may be determined to be a computing node with a transitive relationship: from compute node a to compute node C and from compute node C to compute node F. Then, based on this transfer relationship, it can be determined that there is a reachability relationship between compute node a and compute node C. Based on this, reachability relationships between the compute nodes may be obtained, resulting in an initial reachability matrix as shown in fig. 5 b. Through the initial reachability matrix, it can be known which computing nodes can be reached by any computing node along a path (i.e. at least one directed edge), and it can also be known which computing nodes are reached by any computing node; for example, it is known that compute node F can reach four compute nodes KMNO, or six compute nodes ABCDGE. After the initial reachable matrix is obtained, the non-target nodes in the initial reachable matrix may be removed, and a target reachable matrix including a plurality of target nodes is obtained, as shown in fig. 5 c. Then, an object directed graph composed of a plurality of object nodes can be constructed according to the construction principle of the minimum number of edges and the object reachable matrix, as shown in fig. 5 d. Wherein, the construction principle of the minimum number of edges is as follows: and constructing the principle that the number of the directed edges contained in the obtained target directed graph is the least.

After the target directed graph is obtained, reachability information of a plurality of target nodes can be obtained based on the target directed graph. Specifically, for any target node, all target nodes in the target directed graph except for the target node may be traversed. If the currently traversed target node is detected to reach any one target node along at least one edge in the target directed graph, taking the currently traversed target node as a reachable target node of the any one target node and adding the reachable target node into reachability information of the any one target node; and if the target node reaches the currently traversed target node along at least one edge is detected in the target directed graph, taking the currently traversed target node as a reachable target node of the target node and adding the reachable target node into the reachability information of the target node. After all target nodes except any target node in the target directed graph are traversed, the reachability information of any target node can be obtained.

It should be noted that, similar to the initial reachable matrix, it is also directly known through the target reachable matrix which target nodes can be reached by any target node along the path (i.e., at least one directed edge), and which target nodes are reached by any target node. Therefore, in other embodiments, after the target reachable matrix is obtained, the reachability information of the plurality of target nodes may also be directly obtained according to the target reachable matrix.

S402, extracting the aggregation level information according to the reachability information of each target node.

Wherein, the aggregation level information may include: n layers of node groups required by aggregation, wherein N is a positive integer; each node group includes at least one of: a target node and an aggregation node aggregated by at least two target nodes. As can be seen from the foregoing, nodes (e.g., target nodes) having the same reachability information (or similar reachability information) typically have the same (or similar) dependencies; so those nodes having the same reachability information (or similar reachability information) can be aggregated together. Based on this, in order to make the nodes in each node group required for each layer of aggregation have the same reachability information or similar reachability information, it is convenient that these nodes having the same reachability information (or similar reachability information) can be subsequently aggregated in accordance with the aggregation level information. The embodiment of the invention can set a reachability condition according to the information characteristics among the nodes with the same reachability information and similar reachability information; the reachability conditions may include: and according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of the selected nodes, calculating that the node affinity among the selected nodes is greater than an affinity threshold value.

The calculation method of the node affinity includes, but is not limited to: and calculating a first reference value according to the number of reachable nodes in the reachable information of each selected node, and calculating a second reference value according to the number of reachable nodes in the reachable information of each selected node. Wherein the first reference value may include: the sum of the number of reachable nodes in the reachable information of each selected node, or the mean of the number of reachable nodes in the reachable information of each selected node, and so on; similarly, the second reference value may include: the sum of the number of reachable nodes in the reachable information of each selected node, or the average of the number of reachable nodes in the reachable information of each selected node, and so on. Second, a first ratio between the number of identical reachable nodes and the first reference value may be calculated, and a second ratio between the number of identical reachable nodes and the second reference value may be calculated. Then, the node affinity can be calculated according to the first ratio and the second ratio. Specifically, the sum of the first ratio and the second ratio can be obtained to obtain the node affinity; or calculating the mean value of the first ratio and the second ratio to obtain the node affinity; alternatively, the first ratio and the second ratio may be weighted and summed according to the weight values of the same reachable node indicators and the weight values of the same reachable node indicators to obtain the node affinity, and so on. It should be understood that the embodiments of the present invention are merely exemplary of several specific implementations of computing node affinity, and are not exhaustive.

Then, when step S402 is executed, the reachability condition may be referred to extract the aggregation level information from the reachability information of each target node; specifically, the specific implementation of step S402 may include the following steps S11-S13:

s11, selecting nodes with reachability information meeting reachability conditions from the node set related to the ith layer aggregation, and adding the selected nodes to at least one ith node group required by the ith layer aggregation. An ith node group corresponds to an ith aggregation node, and the value of i belongs to [1, N ]; when the value of i is 1, the node set related to the layer 1 aggregation comprises a plurality of target nodes; the node set involved in layer 2 aggregation includes: the aggregation node 1 corresponds to each node group 1, and the target node in at least one node group i which is not selected to be needed by the aggregation of the layer 1 is not selected, and so on.

s12, replacing the selected node in the node set with the ith aggregation node corresponding to each ith node group to update the node set. It should be noted that, after the selected node in the node set is replaced by the ith aggregation node corresponding to each ith node group, the updated reachability information of each node in the node set is also updated.

s13, if there is a node whose reachability information satisfies the reachability condition in the updated node set, then executing an add operation to the current value of i to update i, and executing the step of selecting the node whose reachability information satisfies the reachability condition from the node set related to the i-th layer aggregation. If there is no node whose reachability information satisfies the reachability condition in the updated node set, the extraction of the aggregation level information may be stopped.

As can be seen from the above description of steps s11-s13, the reachability information of each node in each node group satisfies the reachability condition. Each node in the node set corresponding to the ith layer forms a directed graph corresponding to the ith layer; the reachability information of any node in any node group of the i-th layer includes at least one of: a reachable node corresponding to any node and a reachable node corresponding to any node. Wherein, the reachable node corresponding to any node means: a node reached by any node through at least one edge in the directed graph of the ith layer; the reachable node corresponding to any node is as follows: and reaching the node of any node through at least one edge in the directed graph of the ith layer. Correspondingly, the condition that the reachability information of each node in each node group meets the reachability condition includes: and according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of each node in each node group, calculating that the node affinity among the nodes in each node group is greater than an affinity threshold value.

Based on the above description of steps S11-S13, in order to more clearly understand the implementation process of step S402, the implementation process of step S402 will be further described with reference to the following example:

the value of i is 1:

first, a target node whose reachability information satisfies the reachability condition may be selected from a node set (i.e., a plurality of target nodes) involved in the first layer aggregation (layer 1 aggregation), and then the selected target node may be added to at least one first node group (layer 1 node group) required for the first layer aggregation. For example, providing a plurality of target nodes includes: target node B, target node E, target node G, target node J, target node L and target node O shown in FIG. 5 d; the reachability information of the target node B is the same as that of the target node E, and the reachability information of the target node G is the same as that of the target node J; that is, the target nodes whose reachability information in the node set related to the first layer aggregation satisfies the reachability condition include: target node B and target node E, and target node G and target node J. Then target node B and target node E may be added to a first node group (denoted by B ═ E), and target node G and target node J may be added to another first node group (denoted by G ═ J); the first aggregation node corresponding to the first node group (B ═ E) is (BE), and the first aggregation node corresponding to the first node group (G ═ J) is (GJ).

Then, the first aggregation nodes corresponding to the two first node groups may be used to replace the selected node in the node set involved in the first layer aggregation to update the node set. Specifically, a first aggregation node (BE) may BE used to replace a target node B and a target node E selected from a node set involved in the first layer aggregation; and replacing the selected target node G and the target node J in the node set related to the first layer of aggregation by using a first aggregation node (GJ), so that the updated node set can be obtained and comprises the following nodes: a first aggregation node (BE), a first aggregation node (GJ), a target node L and a target node O. Reachability information for each node in the updated set of nodes may then be obtained. Specifically, the target nodes related to each first node group may be aggregated in the target reachable matrix to obtain a first aggregation matrix; and obtaining the reachability information of each node in the updated node set according to the first aggregation matrix. Or, the target nodes related to each first node group may be virtually aggregated in the target directed graph to obtain a first directed graph; and may obtain reachability information for each node in the updated node set according to the first directed graph. Taking the target reachable matrix or the target directed graph shown in fig. 5d as an example, the first aggregation matrix or the first directed graph shown in fig. 5e can be obtained. Then, according to the first aggregation matrix or the first directed graph, the reachability information of each node in the updated node set may be obtained as follows: the reachability information of the first aggregation node (BE) is ((GJ) LO, x), the reachability information of the first aggregation node (GJ) is (O, (BE)), the reachability information of the target node L is (O, (BE)), and the reachability information of the target node O is (×, (BE) (GJ) L).

Then, whether the nodes with the reachability information meeting the reachability condition exist in the updated node set can be detected; if yes, performing an adding operation on the current value of i to update i, and performing a step of selecting a node of which the reachability information meets the reachability condition from the node set related to the ith layer; otherwise, stopping extracting the aggregation level information. Taking advantage of the above example, since the reachability information of the first aggregation node (GJ) and the target node L in the updated node set satisfies the reachability condition, an operation of adding one to the current value (value is "1") of i may be performed to update i, that is, the value of i after being updated is 2 at this time. Then, the step of selecting the nodes with the reachability information meeting the reachability condition from the node set related to the layer 2 aggregation can be executed; see in particular the description below.

The value of (II) i is 2:

first, a target node whose reachability information satisfies the reachability condition may be selected from the node set involved in the second-layer aggregation (layer 2 aggregation), and then the selected target node may be added to at least one second node group (layer 2 node group) required for the second-layer aggregation. In accordance with the above example, the node set related to the second layer aggregation includes: a first aggregation node (BE), a first aggregation node (GJ), a target node L and a target node O; the target nodes of which the reachability information in the node set related to the second-layer aggregation meets the reachability condition include: a first aggregation node (GJ) and a target node L. Then the first aggregate node (GJ) and the target node L may be added to the second node group (denoted with GJ ═ L); the second node group (GJ ═ L) corresponds to the second aggregation node (GJL).

Then, the second aggregation node corresponding to the second node group may be used to replace the selected node in the node set related to the second layer aggregation to update the node set. Specifically, the second aggregation node (GJL) may be used to replace the first aggregation node (GJ) and the target node L selected from the node set involved in the second layer aggregation, so that the updated node set may include the following nodes: a first aggregation node (BE), a second aggregation node (GJL), and a target node O. Reachability information for each node in the updated set of nodes may then be obtained. Specifically, the nodes related to each second node group may be aggregated in the first aggregation matrix to obtain a second aggregation matrix; and obtaining the reachability information of each node in the updated node set according to the second aggregation matrix. Or, the nodes related to each second node group may be virtually aggregated in the first directed graph to obtain a second directed graph; and may obtain reachability information for each node in the updated node set according to the second directed graph. Taking the first aggregation matrix or the first directed graph shown in fig. 5e as an example, a second aggregation matrix or a second directed graph as shown in fig. 5f can be obtained. Then, according to the second aggregation matrix or the second directed graph, the reachability information of each node in the updated node set may be obtained as follows: the reachability information of the first aggregation node (BE) is ((GJL) O, x), the reachability information of the second aggregation node (GJL) is (O, (BE)), and the reachability information of the target node O is (×, (BE) (GJL).

Then, whether the nodes with the reachability information meeting the reachability condition exist in the updated node set can be detected; if yes, performing an adding operation on the current value of i to update i, and performing a step of selecting a node of which the reachability information meets the reachability condition from the node set related to the ith layer; otherwise, stopping extracting the aggregation level information. Taking the above example as a support, since the reachability information of the first aggregation node (BE) and the second aggregation node (GJL) in the updated node set satisfies the reachability condition, and the reachability information of the second aggregation node (GJL) and the target node O satisfies the reachability condition, an add operation may BE performed on the current value of i (whose value is "2") to update i, that is, the value of i after updating is 3 at this time. Then, the step of selecting the nodes with the reachability information meeting the reachability condition from the node set related to the layer 3 aggregation can be executed; see in particular the description below.

And (III) the value of i is 3:

first, a target node whose reachability information satisfies the reachability condition may be selected from the node set related to the third-layer aggregation (layer 3 aggregation), and then the selected target node may be added to at least one third node group (node group 3) required for the third-layer aggregation. In accordance with the above example, the node set related to the third layer aggregation includes: a first aggregation node (BE), a second aggregation node (GJL), and a target node O; the target nodes of which the reachability information in the node set related to the third-layer aggregation meets the reachability condition include: a first aggregation node (BE) and a second aggregation node (GJL), and a second aggregation node (GJL) and a target node O. The first aggregation node (BE) and the second aggregation node (GJL) may BE added to the third node group (denoted by BE GJL), and the second aggregation node (GJL) and the target node O may BE added to the third node group (denoted by GJL ═ O). Further, since the same node (i.e., the second aggregation node (GJL)) exists in the two third node groups, the two third node groups may be combined into one third node group to reduce the aggregation level. In this case, the number of the third node group involved in the third layer aggregation is 1, which includes the following nodes: a first aggregation node (BE), a second aggregation node (GJL), and a target node O; correspondingly, the third aggregation node corresponding to the third node group is (BEGJLO).

And secondly, replacing the selected node in the node set related to the third layer aggregation by using the third aggregation node corresponding to the third node group to update the node set. Specifically, the third aggregation node (BEGJLO) may BE used to replace the first aggregation node (BE), the second aggregation node (GJL), and the target node O selected from the node set related to the third layer aggregation, so that the updated node set may include the following nodes: the third aggregation node is (BEGJLO). Since the updated node set only includes the third aggregation node (BEGJLO), there is certainly no node whose reachability information satisfies the reachability condition in the updated node set; then the extraction of the aggregation level information may be stopped at this point to obtain the final aggregation level information. Alternatively, the aggregate hierarchy information may be represented using a hierarchy information map as shown in fig. 5 g.

And S403, performing at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information to obtain target aggregation nodes.

In a specific implementation process, node aggregation may be performed by using a traffic threshold as a granularity, starting from the innermost aggregation (i.e., the first layer aggregation) of the aggregation level information. Specifically, the specific implementation of step S403 may include the following steps S21-S25:

s21, determining at least one nth node group required by the nth layer aggregation according to the aggregation level information, and determining the traffic sum of each nth node group according to the traffic of each node in each nth node group; wherein N belongs to [1, N ]. It is to be noted that, as described above: the nodes in any one of the node groups may include at least one of: the node comprises a target node and an aggregation node obtained by aggregating at least two target nodes. For a target node, the traffic of the target node is determined according to the data size of the execution result data corresponding to the target node; for an aggregation node, the traffic of the aggregation node is obtained by summing the traffic of a target node corresponding to the aggregation node.

s22, selecting the n node group with the traffic sum less than or equal to the traffic threshold from at least one n node group; and carrying out aggregation processing on each node in the selected nth node group to obtain an nth aggregation node.

s23, if the current value of N is less than N, and the sum of the traffic of each N +1 th node group required by the N +1 th aggregation obtained according to the aggregation level information is greater than the traffic threshold, then the target aggregation node can be obtained according to the N-th aggregation node.

s24, if the current value of N is less than N and the sum of the traffic of at least one N +1 th node group is less than or equal to the traffic threshold, then adding one to the current value of N to update N, and executing the step of determining at least one N-th node group required by the N-th aggregation according to the aggregation level information.

s25, if the current value of N is equal to N, the target aggregation node can be obtained according to the nth aggregation node.

The specific implementation of the step of obtaining the target aggregation node according to the nth aggregation node mentioned in the above steps s21-s25 may be: and if the value of n is 1, taking the 1 st aggregation node as a target aggregation node. If the value of n is not 1, acquiring at least one historical aggregation node obtained by the previous n-1 layer aggregation, and selecting a historical aggregation node which is not subjected to aggregation processing from the at least one historical aggregation node, and taking the nth aggregation node as a target aggregation node. The first n-1 layer polymerization means: all layer polymerizations between layer 1 polymerization to layer n-1 polymerization.

Based on the above description of steps S21-S25, in order to more clearly understand the implementation process of step S403, the implementation process of step S403 will be further described with reference to specific examples. Specifically, the above example is still carried out, and the traffic threshold is set to 100 and the traffic volume of each target node is set as follows: the traffic volume of the target node B is 50 (i.e., B is 50), the traffic volume of the target node E is 20 (i.e., E is 50), the traffic volume of the target node L is 120 (i.e., L is 120), the traffic volume of the target node G is 10 (i.e., G is 10), the traffic volume of the target node J is 80 (i.e., J is 80), and the traffic volume of the target node O is 50 (i.e., O is 50). Accordingly, the specific implementation process of step S403 is as follows:

the value of n is 1:

first, determining at least one first node group required for the first-layer aggregation according to the aggregation level information may include: a first node group (B ═ E) and a first node group (G ═ J); and determining the traffic sum of each first node group according to the traffic of each node in each first node group, that is, determining the traffic sum of the first node group (B ═ E) to be 70 and determining the traffic sum of the first node group (G ═ J) to be 90. Secondly, a first node group with the traffic sum smaller than or equal to a traffic threshold value can be selected from at least one first node group; and carrying out aggregation processing on each node in the selected first node group to obtain a first aggregation node. Since the sum of the traffic volume of the first node group (B ═ E) and the sum of the traffic volume of the first node group (G ═ J) are both smaller than the traffic volume threshold (100), the nodes in the two first node groups can BE aggregated separately, resulting in a first aggregation node (BE) and a first aggregation node (GJ), as shown in fig. 5 h.

Then, because the current value (value is 1) of N is smaller than that (value is 3), the traffic sum of each N +1 node group required by the N +1 layer aggregation can be obtained according to the aggregation level information; that is, the traffic sum of each second node group required for the second layer aggregation can be obtained according to the aggregation level information. Specifically, the number of second node groups required for second-layer aggregation may be obtained as 1 according to the aggregation level information, specifically, the number of second node groups (GJ ═ L); then, the traffic volume of the second node group may be calculated to sum up to 210 based on the traffic volume (90) of the first aggregation node (GJ) in the second node group and the traffic volume (120) of the target node L. Stopping the aggregation iteration because the sum of the obtained traffic of the second node group required by the second layer aggregation is larger than the traffic threshold (100); at this time, the target aggregation node may be obtained according to the first aggregation node, where the target aggregation node includes: a first aggregation node (BE) and a first aggregation node (GJ).

It should be noted that, in other embodiments, the traffic threshold is 300; then since the sum of the traffic (210) of the second group of nodes is less than the traffic threshold (300) and the current value of N is less than N, an add operation may also be performed on the current value of N to update N, so that the value of N is updated to 2. Then, a step of determining at least one second node group required for the second layer aggregation according to the aggregation level information may be performed, which may be specifically referred to in the following description.

The value of (two) n is 2:

first, it may be determined from the aggregation level information that at least one second node group required for the second layer aggregation includes: a second node group (GJ ═ L); and determines the total traffic volume of each second node group from the traffic volume of each node in the second node group, that is, the total traffic volume of the second node group (GJ ═ L) can be determined to be 210. Secondly, selecting a second node group with the traffic sum less than or equal to a traffic threshold from at least one second node group; and carrying out aggregation processing on each node in the selected second node group to obtain a second aggregation node. Since the sum of traffic of the second node group (GJ — L) is smaller than the traffic threshold (300), the nodes in the second node group can be aggregated to obtain a second aggregated node (GJL). Then, because the current value (value is 2) of N is smaller than N (value is 3), the traffic sum of each N +1 node group required by the N +1 layer aggregation can be obtained according to the aggregation level information; that is, the traffic sum of each third node group required for the third-layer aggregation can be obtained according to the aggregation level information. Specifically, the number of the third node groups required for the third-layer aggregation may be obtained as 1 according to the aggregation level information; then, the traffic volume of the third node group may BE calculated to sum 330 based on the traffic volume (70) of the first aggregation node (BE), the traffic volume (210) of the second aggregation node (GJL) and the traffic volume (50) of the target node O in the third node group. Stopping the aggregation iteration because the sum of the traffic of the third node group required by the obtained third-layer aggregation is larger than the traffic threshold (300); the target aggregation node may be obtained from the second aggregation node at this time. Specifically, at least one history aggregation node obtained by the previous layer 1 aggregation can BE obtained, namely a first aggregation node (BE) and a first aggregation node (GJ); and adopting the history aggregation node (namely, a first aggregation node (BE)) which is not subjected to aggregation processing and the nth aggregation node (namely, a second aggregation node (GJL)) as a target aggregation node from the at least one history aggregation node. That is, the target aggregation node in this case includes: a first aggregation node (BE) and a second aggregation node (GJL).

And S404, updating the calculation graph by adopting the target aggregation node.

In a specific implementation process, a target aggregation node can be added in a computational graph, and a directed edge is adopted to connect the target aggregation node and the aggregated target node; taking the target aggregation nodes shown in fig. 5h (i.e., the first aggregation node (BE) and the first aggregation node (GJ)) as an example, a schematic diagram of adding the target aggregation node can BE seen in fig. 5 i. Then, a matched communication node can be added to the target node which is not aggregated in the computational graph, and a matched communication node can be added to the target aggregation node in the computational graph; wherein the communication node is configured to represent a data transfer operation. Taking over the above example, the unaggregated target nodes include: a target node L and a target node O; the target aggregation node comprises a first aggregation node (BE) and a first aggregation node (GJ); a schematic diagram of the addition of a communication node can be seen in fig. 5 j.

And S405, sending the updated calculation graph to the computing equipment.

The embodiment of the invention can firstly acquire the reachability information of a plurality of target nodes in the calculation graph of the target object; next, aggregation level information may be extracted from reachability information of a plurality of target nodes in the computation graph of the target object. And secondly, performing at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information so as to improve the accuracy of the target aggregation nodes. The computational graph may then be updated with the target aggregation node and the updated computational graph may be sent to the computing device. In the process of calculating the target object, after the data processing operation represented by the aggregated target node is executed, the computing device can perform aggregated transmission on the execution result data of the data processing operation represented by the aggregated target node according to the indication of the target aggregation node, so that the number of data transmission is reduced, network resources are saved, and the total transmission time is shortened.

In practical applications, the above mentioned data transmission method can be applied in different application scenarios; for example, an application scenario for distributed machine learning, an application scenario for testing an application program with one or more computing devices, an application scenario for testing a hardware device with one or more computing devices, and so forth. Wherein, distributed machine learning refers to: and distributing the machine learning tasks of the neural network model to a plurality of computing devices for parallel processing. Distributed machine learning can support a plurality of modes such as a Data Parallelism (Data Parallelism) mode, a model Parallelism (model Parallelism) mode and the like. In data parallel mode: different computing devices have multiple copies of the same model, each computing device model-trains the respective copies in parallel using different training data to enable the respective copies to machine-learn, and then merges in some manner the computational results (e.g., gradients) involved in the model training by all the computing devices. In the model parallel mode: different parts of the same model are distributed to different computing devices, for example, different network layers or different parameters of the same network layer are distributed to different computing devices, model training is carried out on the respectively responsible parts by the computing devices in parallel so that the respectively responsible parts carry out machine learning, and then training results of all the computing devices are combined.

The machine learning is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like; the method specially studies how computer equipment simulates or realizes human learning behaviors to acquire new knowledge or skills, and reorganizes the existing knowledge structure to continuously improve the performance of the knowledge structure. Machine learning is the core of Artificial Intelligence (AI), which refers to the theory, method, technique and application system of simulating, extending and expanding human Intelligence, sensing the environment, acquiring knowledge and using knowledge to obtain the best results using a digital computer or a machine controlled by a digital computer. In other words, AI is an integrated technique of computer science; the intelligent machine is mainly produced by knowing the essence of intelligence and can react in a manner similar to human intelligence, so that the intelligent machine has multiple functions of perception, reasoning, decision making and the like.

Taking the application of the above mentioned data transmission method to the application scenario of distributed machine learning as an example, the specific application of the data transmission method is explained below; in the application scenario of distributed machine learning, the target object may be a neural network model to be subjected to distributed machine learning, and the execution result data of the data processing operation represented by each target node includes: the neural network model produces gradients in distributed machine learning. Specifically, the general principle of the data transmission method can be seen together with fig. 6:

the processing device may first obtain a computational graph of the neural network model, which may include a plurality of target nodes representing data processing operations requiring transmission of execution result data (e.g., gradients). Second, the target nodes having the same or similar reachability information can be aggregated into one target aggregation node (concatee node) by comparing the reachability information of each target node in the computation graph that needs to transmit synchronization data (i.e., gradient). Then, the target aggregation node may be added to the computation graph, and a communication node (All Reduce node) may be added to the tensor requiring communication (i.e., the gradient corresponding to the target node that is not aggregated and the aggregation result corresponding to the aggregation node) to update the computation graph. During operation, the processing device may issue the updated computation graph to each computing device; in the process of carrying out model training on the copy of the neural network model held by each computing device, gradient fusion can be carried out on the gradient corresponding to each aggregated target node according to the indication of the aggregated node in the updated computation graph; by gradient fusion is meant: and fusing different gradients in one communication data segment to carry out communication transmission processing together. After gradient fusion, the communication nodes can be operated; and each computing device can synchronously communicate with the management device when computing to the communication node so as to transmit the corresponding tensor (the gradient corresponding to the target node which is not aggregated and the aggregation result corresponding to the aggregation node) to the management device.

Correspondingly, after receiving the tensors transmitted by the computing devices, if the tensors transmitted by the computing devices are gradients corresponding to the target nodes which are not aggregated, the management device may directly perform combination calculation (such as mean value calculation) on the gradients transmitted by the computing devices, and update the network parameters of the neural network model (i.e., the target object) by using the combined gradients. If the tensor transmitted by each computing device is the aggregation result corresponding to the aggregation node, the management device can separate the aggregation result to obtain each fused gradient. Then, the gradients of the same data processing operation transmitted by each computing device can be respectively combined and calculated (such as mean value calculation), and the network parameters of the neural network model (i.e. the target object) are respectively updated by using the combined gradients. After updating the network parameters, the management device may issue the updated network parameters to each computing device; or after receiving the pull request of each computing device, issuing the updated network parameters to each computing device, so that each computing device executes the next round of model training by adopting the updated network parameters, and repeatedly executing the steps until the model training is finished.

Therefore, when the data transmission method provided by the embodiment of the invention is applied to the application scene of distributed machine learning, the gradient obtained by each computing device in the model training process can be effectively subjected to fusion transmission, so that the transmission delay can be effectively reduced, and the communication is accelerated. Moreover, the gradient fusion method can adapt to a more complex topological structure of the computation graph and different traffic threshold conditions, and can realize flexible fusion of communication information and make computation communication parallel. It should be understood that the data transmission method provided by the embodiment of the present invention can be reasonably and flexibly applied to machine learning platforms such as a distributed machine learning framework, and can also be extended to other distributed systems requiring parallel computing and communication; the embodiments of the present invention are not limited in this regard.

Based on the description of the above data transmission method embodiment, the embodiment of the present invention also discloses a data transmission apparatus, which may be a computer program (including a program code) running in a processing device. The data transmission device may perform the method shown in fig. 2 or fig. 4. Referring to fig. 7, the data transmission apparatus may operate as follows:

an acquisition unit 701 configured to acquire reachability information of a plurality of target nodes in a computation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and the execution result data of the data processing operation represented by each target node needs to be transmitted; wherein, the reachability information of any target node is used for indicating that: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

an aggregation unit 702 configured to aggregate at least two target nodes into a target aggregation node according to reachability information of each target node, where the target aggregation node is configured to instruct to aggregate execution result data of data processing operations represented by the aggregated target nodes;

a processing unit 703, configured to update the computation graph with the target aggregation node, and send the updated computation graph to a computing device, where the updated computation graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the calculation process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In an embodiment, when the aggregating unit 702 is configured to aggregate at least two target nodes into a target aggregation node according to the reachability information of each target node, it may specifically be configured to:

In another embodiment, the aggregation unit 702, when configured to extract the aggregation level information according to the reachability information of each target node, may specifically be configured to:

In another embodiment, when the aggregation unit 702 is configured to perform at least one layer of aggregation iterative processing on the multiple target nodes according to the aggregation level information to obtain a target aggregation node, the aggregation unit may be specifically configured to:

In yet another embodiment, the polymerization unit 702 can be further specifically configured to:

In another embodiment, when the aggregation unit 702 is configured to obtain the target aggregation node according to the nth aggregation node, it may specifically be configured to:

In another embodiment, the obtaining unit 701, when configured to obtain the reachability information of the target nodes in the computation graph of the target object, may be specifically configured to:

In another embodiment, the obtaining unit 701, when configured to obtain a target directed graph composed of a plurality of target nodes in a computation graph of a target object, may specifically be configured to:

In another embodiment, when the obtaining unit 701 is configured to calculate, according to the topological relation of the computation graph, a target reachable matrix including the plurality of target nodes, the obtaining unit may be specifically configured to:

In another embodiment, when the processing unit 703 is configured to update the computation graph with the target aggregation node, it may specifically be configured to:

According to an embodiment of the present invention, each step involved in the method shown in fig. 2 or fig. 4 may be performed by each unit in the data transmission apparatus shown in fig. 7. For example, steps S201 to S203 shown in fig. 2 may be performed by the acquisition unit 701, the aggregation unit 702, and the processing unit 703 shown in fig. 7, respectively; as another example, step S401 shown in fig. 4 may be performed by the acquisition unit 701 shown in fig. 7, steps S402-S403 may be performed by the aggregation unit 702 shown in fig. 7, and steps S404-S405 may be performed by the processing unit 703 shown in fig. 7.

According to another embodiment of the present invention, the units in the data transmission apparatus shown in fig. 7 may be respectively or entirely combined into one or several other units to form another unit, or some unit(s) therein may be further split into multiple units with smaller functions to form another unit, which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present invention. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present invention, the data transmission device may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present invention, the data transmission apparatus device shown in fig. 7 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method shown in fig. 2 or fig. 4 on a general-purpose processing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and a storage element, and the data transmission method of the embodiment of the present invention is implemented. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed in the processing apparatus described above via the computer-readable recording medium.

Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides a processing device. Referring to fig. 8, the processing device includes at least a processor 801, an input interface 802, an output interface 803, and a computer storage medium 804. The processor 801, the input interface 802, the output interface 803, and the computer storage medium 804 within the processing device may be connected by a bus or other means.

A computer storage medium 804 may be stored in the memory of the processing device, the computer storage medium 804 being for storing a computer program comprising program instructions, the processor 801 being for executing the program instructions stored by the computer storage medium 804. The processor 801 (or CPU) is a computing core and a control core of the Processing device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 801 according to the embodiment of the present invention may be configured to perform a series of data transmission processes, including: obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and the execution result data of the data processing operation represented by each target node needs to be transmitted; wherein, the reachability information of any target node is used for indicating that: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph; aggregating at least two target nodes into a target aggregation node according to the reachability information of each target node, wherein the target aggregation node is used for indicating that execution result data of data processing operations represented by the aggregated target nodes are aggregated; updating the computational graph by using the target aggregation node, and sending the updated computational graph to a computing device, where the updated computational graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object according to the indication of the target aggregation node, and transmits the aggregation result, and the like.

An embodiment of the present invention further provides a computer storage medium (Memory), which is a Memory device in a processing device and is used to store programs and data. It will be appreciated that the computer storage media herein may comprise both built-in storage media within the processing device and, of course, extended storage media supported by the processing device. The computer storage medium provides a memory space that stores an operating system of the processing device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 801. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, one or more instructions stored in computer storage 804 may be loaded and executed by processor 801 to implement the corresponding method steps described above in connection with the data transfer method embodiment shown in FIG. 2 or FIG. 4; in particular implementations, one or more instructions in the computer storage 804 are loaded by the processor 801 and perform the following steps:

updating the computational graph by using the target aggregation node, and sending the updated computational graph to a computing device, where the updated computational graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the calculation process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In one embodiment, when aggregating at least two target nodes into a target aggregation node according to reachability information of each target node, the one or more instructions may be loaded and specifically executed by the processor 801:

In yet another embodiment, when extracting the aggregation level information according to the reachability information of each target node, the one or more instructions may be loaded and specifically executed by the processor 801:

In another embodiment, when at least one layer of aggregation iterative processing is performed on the plurality of target nodes according to the aggregation level information to obtain a target aggregation node, the one or more instructions may be loaded and specifically executed by the processor 801:

In yet another embodiment, the one or more instructions may be further loaded and specifically executed by the processor 801:

In another embodiment, when obtaining the target aggregation node according to the nth aggregation node, the one or more instructions may be loaded and specifically executed by the processor 801:

In yet another embodiment, when obtaining reachability information of a plurality of target nodes in a computation graph of a target object, the one or more instructions may be loaded and specifically executed by the processor 801:

In yet another embodiment, when obtaining a target directed graph comprising a plurality of target nodes in a computational graph of a target object, the one or more instructions may be loaded and specifically executed by the processor 801:

In another embodiment, when the target reachable matrix including the target nodes is calculated according to the topological relation of the computation graph, the one or more instructions may be loaded and specifically executed by the processor 801:

In another embodiment, when the target aggregation node is used to update the computation graph, the one or more instructions may be loaded and specifically executed by the processor 801:

It should be noted that according to an aspect of the present application, a computer program product or a computer program is also provided, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method provided in the various alternatives in the aspect of the data transmission method embodiment shown in fig. 2 or fig. 4 described above.

It should be understood, however, that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Claims

1. A method of data transmission, comprising:

2. The method of claim 1, wherein the aggregating at least two target nodes into a target aggregation node according to reachability information of each target node comprises:

3. The method of claim 2, wherein said extracting aggregation level information based on reachability information of each target node comprises:

4. The method of claim 2, wherein each node in the node set corresponding to the ith layer forms a directed graph corresponding to the ith layer, and the value of i is e [1, N ]; the reachability information of any node in any node group of the i-th layer includes at least one of: a reachable node corresponding to the any node and a reachable node corresponding to the any node;

5. The method of claim 2, wherein performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information to obtain a target aggregation node comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method according to claim 5 or 6, wherein the obtaining a target aggregation node according to the nth aggregation node comprises:

8. The method of claim 1, wherein the obtaining reachability information for a plurality of target nodes in a computational graph of a target object comprises:

9. The method of claim 8, wherein obtaining a target directed graph comprised of a plurality of target nodes in a computational graph of a target object comprises:

10. The method of claim 9, wherein said computing a target reachable matrix containing the plurality of target nodes according to the topological relation of the computation graph comprises:

11. The method of claim 1, wherein said updating the computational graph with the target aggregation node comprises:

12. The method of claim 1, wherein the target objects comprise a neural network model to be subjected to distributed machine learning, and the execution result data of the data processing operation represented by each target node comprises: gradients generated by the neural network model in the distributed machine learning.

13. A data transmission apparatus, comprising:

14. A processing device comprising an input interface and an output interface, further comprising:

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the data transfer method of any of claims 1-12.

15. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform a data transfer method according to any of claims 1-12.