CN111935005B

CN111935005B - Data transmission method, device, processing equipment and medium

Info

Publication number: CN111935005B
Application number: CN202010793589.0A
Authority: CN
Inventors: 姜曦楠; 朱子霖; 周飞虎; 郭振宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2023-10-24
Anticipated expiration: 2040-08-07
Also published as: CN111935005A

Abstract

The embodiment of the application discloses a data transmission method, a device, processing equipment and a medium based on cloud technology, wherein the method comprises the following steps: obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; according to the reachability information of each target node, at least two target nodes are aggregated into a target aggregation node; updating the computational graph by adopting the target aggregation node, and sending the updated computational graph to computing equipment, wherein the updated computational graph is used for indicating: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the object according to the indication of the target aggregation node, and transmits an aggregation result. According to the embodiment of the application, the computing equipment can be instructed to aggregate and transmit the execution result data of the target node through the updated computation graph, so that the number of data transmission times is reduced, network resources are saved, and the total transmission time is shortened.

Description

Data transmission method, device, processing equipment and medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to the field of computer technologies, and in particular, to a data transmission method, a data transmission device, a processing apparatus, and a computer storage medium.

Background

In mathematical graph theory, a graph is used to express an abstraction of an object-to-object relationship, consisting essentially of nodes for representing objects and edges for representing relationships between objects; the Graph in which each edge has a direction may be referred to as a Directed Graph (Directed Graph). With the development of graph technology and internet technology, computing graphs has been developed; the so-called computation Graph, which may also be referred to as a Data Flow Graph, refers in particular to a directed Graph of Data Flow computation for characterizing target objects. The nodes in the computation graph are used for representing data processing operations involved in the process of computing the target object, and one data processing operation corresponds to one execution result data; edges in the computational graph are used to represent dependencies, such as data dependencies and control dependencies, between data processing operations (nodes). A computational graph typically has specific target nodes that represent data processing operations that require the transmission of execution result data.

At present, before a computing device calculates a target object, a computing graph of the target object is generally constructed, and the constructed computing graph is directly sent to the computing device; the computing device is enabled to directly transmit corresponding execution result data after each data processing operation represented by one target node is executed in the process of computing the target object. Such a data transmission manner may result in excessive data transmission times and excessive consumption of network resources; and, each transmission typically has a network delay, which also results in a longer overall transmission duration.

Disclosure of Invention

The embodiment of the invention provides a data transmission method, a device, processing equipment and a medium, which can instruct computing equipment to carry out aggregation transmission on execution result data of a target node through an updated computing graph, thereby reducing the times of data transmission, saving network resources and shortening the total transmission time.

In one aspect, an embodiment of the present invention provides a data transmission method, where the method includes:

obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and execution result data of the data processing operation represented by each target node needs to be transmitted; wherein the reachability information of any target node is used for indicating: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

according to the reachability information of each target node, aggregating at least two target nodes into a target aggregation node, wherein the target aggregation node is used for indicating to aggregate the execution result data of the data processing operation represented by the aggregated target nodes;

Updating the computational graph by adopting the target aggregation node, and sending the updated computational graph to computing equipment, wherein the updated computational graph is used for indicating: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object according to the indication of the target aggregation node, and transmits an aggregation result.

In another aspect, an embodiment of the present invention provides a data transmission apparatus, including:

an acquisition unit configured to acquire reachability information of a plurality of target nodes in a computation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and execution result data of the data processing operation represented by each target node needs to be transmitted; wherein the reachability information of any target node is used for indicating: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

The aggregation unit is used for aggregating at least two target nodes into a target aggregation node according to the reachability information of each target node, and the target aggregation node is used for indicating to aggregate the execution result data of the data processing operation represented by the aggregated target node;

the processing unit is used for updating the computational graph by adopting the target aggregation node and sending the updated computational graph to the computing equipment, wherein the updated computational graph is used for indicating: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In one embodiment, the aggregation unit, when configured to aggregate at least two target nodes into a target aggregation node according to the reachability information of each target node, may be specifically configured to:

extracting aggregation level information according to the reachability information of each target node; the aggregation level information includes: n layers of node groups required by aggregation, wherein N is a positive integer; each node group includes at least one of the following nodes: the target node and an aggregation node aggregated by at least two target nodes; and reachability information of each node in each node group satisfies reachability conditions;

And performing at least one layer of aggregation iteration processing on the target nodes according to the aggregation level information to obtain target aggregation nodes.

In still another embodiment, the aggregation unit, when configured to extract the aggregation level information according to the reachability information of each target node, may be specifically configured to:

selecting nodes with reachability information meeting the reachability condition from a node set related to the i-th layer aggregation, and adding the selected nodes to at least one i-th node group required by the i-th layer aggregation; an ith node group corresponds to an ith aggregation node, and the value of i is E [1, N ]; when the value of i is 1, the node set related to the layer 1 aggregation comprises the plurality of target nodes;

adopting an ith aggregation node corresponding to each ith node group to replace selected nodes in the node set so as to update the node set;

if the updated node sets have nodes with the reachability information meeting the reachability condition, executing an operation of adding one to the current value of i to update i, and executing a step of selecting the nodes with the reachability information meeting the reachability condition from the node sets related to the i-th layer aggregation.

In another embodiment, each node in the node set corresponding to the ith layer forms a directed graph corresponding to the ith layer, and the value of i is epsilon [1, N ]; the reachability information for any node in any node group of the i-th layer includes at least one of: the reachable node corresponding to any node and the reachable node corresponding to any node;

Wherein, the reachable node corresponding to any node means: a node reached by the any node through at least one edge in the directed graph of the ith layer; the reachable node corresponding to any node refers to: the node of any node is reached through at least one edge in the directed graph of the ith layer;

the reachability information of each node in each node group meets the reachability condition, which comprises the following steps: and according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of each node in each node group, calculating that the node affinity among the nodes in each node group is larger than an affinity threshold.

In still another embodiment, when the aggregation unit is configured to perform at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information, the aggregation unit may be specifically configured to:

determining at least one nth node group required by nth layer aggregation according to the aggregation level information, and determining the traffic sum of each nth node group according to the traffic of each node in each nth node group; n is E [1, N ];

Selecting an nth node group with the sum of the traffic less than or equal to a traffic threshold from the at least one nth node group; performing aggregation treatment on each node in the selected nth node group to obtain an nth aggregation node;

and if the current value of N is smaller than N and the sum of the traffic of each n+1th node group required by the n+1th layer aggregation acquired according to the aggregation level information is larger than the traffic threshold, acquiring a target aggregation node according to the N aggregation node.

In yet another embodiment, the polymerization unit may further be specifically configured to:

if the current value of N is smaller than N and the sum of the traffic of at least one n+1th node group is smaller than or equal to the traffic threshold, executing an operation of adding one to the current value of N to update N, and executing a step of determining at least one nth node group required by the nth layer aggregation according to the aggregation level information;

and if the current value of N is equal to N, obtaining a target aggregation node according to the nth aggregation node.

In yet another embodiment, the aggregation unit, when configured to obtain the target aggregation node according to the nth aggregation node, may be specifically configured to:

if the value of n is 1, the 1 st aggregation node is taken as a target aggregation node;

If the value of n is not 1, at least one history aggregation node obtained by the previous n-1 layer aggregation is obtained, and the history aggregation node which is not subjected to aggregation processing is selected from the at least one history aggregation node, and the nth aggregation node is used as the target aggregation node.

In still another embodiment, the obtaining unit, when configured to obtain reachability information of a plurality of target nodes in the computation graph of the target object, may be specifically configured to:

acquiring a target directed graph formed by a plurality of target nodes in a calculation graph of a target object;

and acquiring reachability information of the target nodes based on the target directed graph.

In still another embodiment, the obtaining unit, when used for obtaining a target directed graph formed by a plurality of target nodes in a calculation graph of a target object, may be specifically configured to:

obtaining a computational graph of a target object, the computational graph comprising the following computational nodes: a plurality of target nodes and non-target nodes;

calculating a target reachable matrix comprising the plurality of target nodes according to the topological relation of the calculation graph; the target reachability matrix is used for indicating reachability relation among all target nodes;

and constructing a target directed graph formed by the plurality of target nodes according to the target reachable matrix according to the construction principle of the minimum edge number.

In still another embodiment, the obtaining unit, when configured to calculate, according to the topology relationship of the calculation graph, a target reachability matrix including the plurality of target nodes, may be specifically configured to:

calculating an adjacency matrix containing each calculation node in the calculation graph according to the topological relation of the calculation graph; the adjacency matrix is used for indicating the connection relation between the computing nodes in the computing graph;

solving a transfer closure for the adjacent matrix to obtain an initial reachable matrix containing each computing node in the computing graph; the initial reachability matrix is used for indicating reachability relation among all computing nodes in the computing graph;

and removing non-target nodes in the initial reachable matrix to obtain a target reachable matrix containing the target nodes.

In yet another embodiment, the processing unit, when configured to update the computational graph with the target aggregation node, may be specifically configured to:

adding the target aggregation node in the calculation graph, and connecting the target aggregation node and the aggregated target node by adopting a directed edge;

adding a matched communication node to the target node which is not aggregated in the calculation graph, and adding a matched communication node to the target aggregation node in the calculation graph; the communication node is configured to represent a data transfer operation.

In yet another embodiment, the target object includes a neural network model to be subjected to distributed machine learning, and the execution result data of the data processing operation represented by each target node includes: gradients generated by the neural network model in the distributed machine learning.

In still another aspect, an embodiment of the present invention provides a processing apparatus, including an input interface and an output interface, the processing apparatus further including:

a processor adapted to implement one or more instructions; the method comprises the steps of,

a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

In yet another aspect, embodiments of the present invention provide a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of:

According to the embodiment of the invention, at least two target nodes can be aggregated into a target aggregation node according to the reachability information of a plurality of target nodes in the calculation graph of the target object; the target aggregation node is configured to instruct aggregation of execution result data of the data processing operation represented by the aggregated target node. The computational graph may then be updated with the target aggregation node and the updated computational graph sent to the computing device. In the process of calculating the target object, after the data processing operation represented by the aggregated target node is executed, the computing device can aggregate and transmit the execution result data of the data processing operation represented by the aggregated target node according to the indication of the target aggregation node, so that the number of times of data transmission is reduced, network resources are saved, and the total transmission time is shortened.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1a is a schematic diagram of a data transmission system according to an embodiment of the present invention;

fig. 1b is a schematic diagram of a data transmission system according to another embodiment of the present invention;

fig. 2 is a schematic flow chart of a data transmission method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a computational graph provided by an embodiment of the present invention;

fig. 4 is a flow chart of a data transmission method according to another embodiment of the present invention;

FIG. 5a is a schematic diagram of an adjacency matrix according to another embodiment of the present invention;

FIG. 5b is a schematic diagram of an initial reachability matrix provided by another embodiment of the present invention;

FIG. 5c is a schematic diagram of a target reachability matrix provided by another embodiment of the present invention;

FIG. 5d is a schematic diagram of the construction of a target directed graph according to another embodiment of the present invention;

FIG. 5e is a schematic diagram of a first aggregation matrix and a first directed graph according to another embodiment of the present invention;

FIG. 5f is a schematic illustration of a second aggregation matrix and a second directed graph provided in accordance with another embodiment of the present invention;

FIG. 5g is a schematic diagram of aggregation level information according to another embodiment of the present invention;

FIG. 5h is a schematic diagram of generating a target aggregation node according to another embodiment of the present invention;

FIG. 5i is a schematic diagram of an adding target aggregation node according to another embodiment of the present invention;

FIG. 5j is a schematic diagram of an add communication node according to another embodiment of the present invention;

FIG. 6 is a schematic diagram of an application scenario of distributed machine learning according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a data transmission device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

In the calculation process of the target object, in order to better transmit the execution result data of the data processing operation represented by each target node, the embodiment of the invention firstly provides a data transmission system. The target object herein refers to any object involved in multiple data processing operations in the calculation process, for example, the target object may be a neural network model involved in multiple data processing operations such as convolution operation and pooling operation in the model training process; as another example, the target object may be an application program that involves multiple data processing operations such as a test operation on the application function 1, a test operation on the application function 2, and the like in the application test process; for another example, the target object may be a hardware device involved in a plurality of data processing operations such as a test operation on the module 1, a test operation on the module 2, and the like in a hardware test process.

Specifically, the data transmission system may include: one processing device 11 and one or more computing devices 12; the processing device 11 and the computing devices 12 may communicate with each other. The processing device 11 is mainly configured to generate and update a computation graph (i.e. a data flow graph) of a target object, and send the computation graph to each computing device 12; which may be any terminal or server having data processing functionality. The computing device 12 is mainly used for executing multiple data processing operations on the target object, and transmitting execution result data of part or all of the data processing operations according to instructions of the computational graph; which may be any terminal or server having a data calculation function and a communication function. In a specific implementation, when each computing device 12 is configured to transmit the execution result data of a part or all of the data processing operations according to the instruction of the computation graph, the execution result data of the part or all of the data processing operations may be returned to the processing device 11, so that the processing device 11 may perform subsequent processing on the target object according to the execution result data sent by each computing device 12, such as model updating processing, application test analysis processing, module test analysis processing, and so on; in this particular implementation, the system architecture of the data transmission system can be seen in fig. 1 a. In still another specific implementation, when each computing device 12 is configured to transmit execution result data of a part or all of the data processing operations according to an instruction of the computation graph, the execution result data of the part or all of the data processing operations may be transmitted to another management device 13, so that the management device 13 may perform subsequent processing according to the execution result data sent by each computing device 12; in this particular implementation, the system architecture of the data transmission system can be seen in fig. 1 b. For ease of illustration, the system architecture shown in FIG. 1b is described below.

It should be noted that fig. 1a and fig. 1b are only exemplary and not limiting for the specific architecture of the data transmission system. For example, FIGS. 1a and 1b each physically deploy a single processing device 11 to perform both computational graph generation and updating operations; in other embodiments, any one of the plurality of computing devices 12 may be used as a processing device to perform the generation and updating operations of the computational graph; in this case, one processing apparatus 11 may not be deployed alone. It should be further noted that the above-mentioned terminals may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, and the like. The servers mentioned above may be independent physical servers, or may be server clusters or distributed systems formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and so on.

Based on the data transmission system, the embodiment of the invention also provides a data transmission scheme. Specifically, the general principle of the data transmission scheme is as follows: the processing device may aggregate target nodes having the same or similar reachability into one aggregate node by comparing the reachability of each target node in the computational graph of the target object that needs to transmit synchronization data (i.e., execution result data), and update the computational graph with the aggregate node. Reachability refers to the ability of one target node to reach another target node along a series of edges in a computational graph; a target node a to a target node B is considered reachable if it can reach another target node B through a series of edges. Otherwise, the target node A is considered to be unreachable to the target node B. The aggregation node is used for indicating to aggregate the execution result data of the data processing operation represented by the aggregated target node. The updated computational graph may then be issued to each computing device; in the process of calculating the target object, each computing device can aggregate and transmit the execution result data of the data processing operation represented by the aggregated target node according to the instruction of the aggregation node. Therefore, the data transmission scheme provided by the embodiment of the invention can realize the aggregate transmission of the execution result data corresponding to at least two target nodes, so that the number of data transmission times can be effectively reduced, network resources can be saved, and the total transmission time is shortened.

Based on the above description, the embodiments of the present invention propose a data transmission method, which can be performed by the processing device mentioned above. Referring to fig. 2, the data transmission method may include the following steps S201 to S203:

s201, obtaining reachability information of a plurality of target nodes in a calculation graph of a target object.

In the embodiment of the invention, each target node can be used for representing one data processing operation of a target object to be executed in the computing process, and execution result data of the data processing operation represented by each target node needs to be transmitted. Wherein the reachability information of any target node is used for indicating: the ability of any target node to reach other target nodes along at least one edge in the computational graph, and the ability of any target node to be reached by other target nodes along at least one edge in the computational graph. Specifically, the reachability information of any target node may include at least one of: an reachable target node reached by the any target node through at least one edge in the computational graph, and an reachable target node reached by any target node through at least one edge in the computational graph.

Taking the calculation graph shown in fig. 3 as an example, the calculation graph may include a plurality of calculation nodes, wherein the number on each calculation node is used to represent the operation duration, the connection (i.e., the directed edge) between the calculation nodes represents the dependency relationship, and the number on the connection is used to represent the number of the directed edge. Wherein, the black calculation nodes in the calculation graph are all target nodes; assuming that any target node is the target node L, since the target node L can reach the target node O through two edges numbered 29 and 30 in the computation graph, the reachable target nodes of the target node L can include the target node O. Since target node E can reach target node L through two edges numbered 14 and 24 in the computational graph, then the reachable target node of target node L can include target node E; also, since the target node B can reach the target node L through two edges numbered 6, 14 and 24 in the computation graph, the reachable target node of the target node L can further include the target node B. Then, the reachability information of the target node L may BE (O, BE). Based on this, reachability information of other target nodes in the computation graph shown in fig. 3 can also be obtained as follows: the reachability information of the target node B may BE (EGJLO, ×), the reachability information of the target node E may BE (GJLO, B), the reachability information of the target node G may BE (JO, BE), the reachability information of the target node J may BE (O, BEG), the reachability information of the target node O may BE (×, BEGJL). Wherein "×" indicates null, i.e. no corresponding target node.

In the embodiment of the invention, for any two target nodes, if the two target nodes are filtered out in the reachability information of the two target nodes, the reachable target nodes and the reachable target nodes in the filtered reachability information of the two target nodes are the same; the reachability information for both target nodes may be considered to be the same. For example, still taking the calculation diagram shown in fig. 3 as an example: for the target node G and the target node J, the reachability information of the target node G is (JO, BE), and the reachability information of the target node J is (O, BEG). If the target node J is filtered out from the reachability information of the target node G, the filtered reachability information of the target node G is obtained as (O, BE); if the target node G is filtered out of the reachability information of the target node J, the filtered reachability information of the target node J may BE obtained as (O, BE). Since the filtered reachability information of the target node G and the filtered reachability information of the target node J are the same, the reachability information of the target node G and the target node J can be regarded as the same. Similarly, for any two target nodes, if the reachability information of the two target nodes includes more same reachable target nodes and more reachable nodes after the two target nodes are filtered out, the reachability information of the two target nodes can be considered similar.

S202, according to the reachability information of each target node, at least two target nodes are aggregated into a target aggregation node.

It is proved by research that, in the calculation process of the target object, if a certain target node (such as target node G) has a reachable target node (such as target node J) and a reachable target node (such as target node E), after the data processing operation represented by the reachable target node (such as target node E) is performed, the data processing operation represented by the target node (such as target node G) can be performed, and then the data processing operation of the reachable target node (such as target node J) is performed. It can be seen that for the target node (e.g., target node G), the target node (e.g., target node G) has the following dependency: the execution of the data processing operation represented by the target node (e.g., target node G) depends on the reachable target node (e.g., target node E), and the execution of the data processing operation represented by the target node (e.g., target node G) depends on the reachable target node of the target node (e.g., target node J). It has further been shown that target nodes with identical (or similar) reachability information generally have identical (or similar) dependencies, i.e. the execution of data processing operations represented by these target nodes with identical (or similar) reachability information all need to depend on and be relied on by identical reachable target nodes. Based on this, the embodiment of the invention can aggregate the target nodes with the same reachability information (or similar reachability information) into one target aggregate node according to the reachability information of each target node, so as to instruct the computing device to perform subsequent aggregate transmission on the execution result data of the data processing operation represented by the target nodes with the same reachability information (or similar reachability information) through the target aggregate node. I.e. the target aggregation node may be used to indicate the aggregation of execution result data of the data processing operations represented by the aggregated target node.

In a specific implementation, at least one pair of node pairs with the same reachability information can be searched out from a plurality of target nodes directly according to the reachability information of each target node; and then, respectively aggregating the target nodes in the searched node pairs to obtain at least one target aggregation node. For example, taking the calculation chart shown in fig. 3 as an example, according to the reachability information of each target node mentioned in step S201, two pairs of nodes having the same reachability information can be found from the plurality of target nodes, which are respectively: a node pair consisting of a target node B and a target node E, and a node pair consisting of a target node G and a target node J. Then, the target nodes in the two pairs of nodes can be aggregated respectively, so that two target aggregated nodes can be obtained: a target aggregation node (BE) and a target aggregation node (GJ). Similarly, in another specific implementation, at least one group of node groups with similar reachability information can be searched out from a plurality of target nodes directly according to the reachability information of each target node; then, respectively aggregating the target nodes in the searched node groups to obtain at least one target aggregation node; it should be noted that the number of target nodes included in the node group is at least two.

In another specific implementation, after the target nodes with the same reachability information or similar reachability information are aggregated once, the reachability information of the target nodes which are not aggregated is changed; in this case, there may be at least two target nodes whose reachability information after change is the same or similar, or there may be at least two target nodes whose reachability information after change is the same or similar, as the reachability information of the nodes obtained by aggregation. In this case, the at least two target nodes that are not aggregated may be further aggregated, or the target nodes that are not aggregated and the nodes that are obtained by aggregation may be secondarily aggregated; and the like, until the changed reachability information of the target node which is not aggregated is not the same or similar, and the changed reachability information of the target node which is not aggregated is not the same or similar to the reachability information of the node which is obtained by aggregation, thereby obtaining the aggregated node. That is, in this specific implementation, the processing device may perform multiple aggregation iteration processing on multiple target nodes according to the reachability information of each target node, to obtain a target aggregate node.

It should be noted that the above-mentioned iteration stop conditions for the multiple polymerization iteration process are: until there are no more non-aggregated target nodes and aggregated nodes for which reachability information is the same or similar. In other embodiments, the iteration stop condition of the multiple polymerization iteration process may also be: other conditions, such as the sum of traffic of each node required for current aggregation is greater than a traffic threshold, are not limited in this embodiment of the present invention. In addition, in the process of performing multiple aggregation iteration processing on the target nodes according to the reachability information of each target node, the processing equipment can directly search the nodes to be aggregated in real time according to the reachability information of each target node and directly aggregate the searched nodes in real time; and judging whether the iteration stop condition is met or not in real time when node polymerization is executed once. Alternatively, the processing device may first extract the aggregation level information according to the reachability information of each target node without considering the iteration stop condition; the aggregation level information is used to indicate nodes required for aggregation of each layer. And then, performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information, and judging whether an iteration stop condition is met or not when each layer of aggregation iteration processing is executed in the process.

S203, updating the calculation graph by adopting the target aggregation node, and sending the updated calculation graph to the computing equipment; the updated computational graph is used for indicating: according to the indication of the target aggregation node, the computing device aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object, and transmits an aggregation result.

Fig. 4 is a flowchart of another data transmission method according to an embodiment of the present invention. The data transmission method may be performed by the processing device mentioned above. Referring to fig. 4, the data transmission method may include the following steps S401 to S405:

s401, obtaining reachability information of a plurality of target nodes in a calculation graph of a target object.

In an embodiment of the present invention, the computational graph of the target object may include the following computational nodes: a plurality of target nodes and non-target nodes; each compute node may be used to represent a data processing operation that the target object needs to perform during the computation. The target node refers to a computing node that the execution result of the indicated data processing operation needs to be transmitted, and the non-target node refers to a computing node that the execution result of the indicated data processing operation does not need to be transmitted. The reachability information for any target node may include at least one of: an reachable target node reached by the any target node through at least one edge in the computational graph, and an reachable target node reached by any target node through at least one edge in the computational graph. In a specific implementation, the specific implementation of step S401 may include the following:

Embodiment one: and obtaining the reachability information of a plurality of target nodes in the computational graph directly according to the computational graph of the target object. Specifically, for any target node, each target node in the computation graph other than the any target node may be traversed. If the current traversed target node is detected to reach any target node along at least one edge in the calculation graph, the current traversed target node is used as a reachable target node of any target node and added into reachability information of any target node; if the fact that the any target node reaches the target node which is currently traversed along at least one edge is detected in the calculation graph, the target node which is currently traversed is used as a reachable target node of the any target node to be added into reachability information of the any target node. After each target node except any target node in the calculation graph is traversed, the reachability information of any target node can be obtained.

Embodiment two: the adjacency matrix containing each computation node in the computation graph can be computed from the topology of the computation graph of the target object, as shown in fig. 5 a. Wherein the adjacency matrix can be used to indicate connection relationships between computing nodes in the computation graph; specifically, if the element in the x-th row and the y-th column in the adjacency matrix is a non-zero element, it may indicate that the target node corresponding to the x-th row and the target node corresponding to the y-th column are connected in the computation graph, that is, the target node corresponding to the x-th row may reach the target node corresponding to the y-th column through a directed edge in the computation graph. It can be seen that if the element of the x-th row and the y-th column in the adjacency matrix is a non-zero element, it can be indicated that the target node corresponding to the x-th row is a reachable target node of the target node corresponding to the y-th column, and the target node corresponding to the y-th column is a reachable target node of the target node corresponding to the x-th row. Wherein x and y are both greater than 0 and less than or equal to the number of compute nodes. Then, the reachability information of a plurality of target nodes in the computation graph can be obtained directly according to the adjacency matrix. It should be noted that fig. 5a is only an exemplary representation of a computational graph of a target object and a corresponding adjacency matrix; and is not limited thereto. For example, each directed edge in the computational graph shown in FIG. 5a has a corresponding number; however, in other embodiments, the directional edges may not be numbered. In this case, the adjacency matrix can only use "0" and "1" to represent the connection relationship between each computing node; wherein "0" means unconnected and "1" means connected. That is, if the target node corresponding to the x-th row and the target node corresponding to the y-th column are connected in the calculation map, the element of the x-th row and the y-th column in the adjacent matrix is "1".

Embodiment III: a target directed graph composed of a plurality of target nodes in a computational graph of a target object may be first obtained. In one specific implementation, non-target nodes can be deleted directly in the computational graph, and the connection relationship between the target nodes is adjusted according to the deleted non-target nodes, so that a target directed graph formed by a plurality of target nodes is obtained. In yet another specific implementation, a computational graph of a target object may be obtained; and a target reachability matrix including a plurality of target nodes, the target reachability matrix being used to indicate reachability relationships between the target nodes, may be calculated based on the topology relationship of the calculation map. In the implementation process, the adjacency matrix comprising each computing node in the computing graph can be calculated according to the topological relation of the computing graph. The transitive closure is then computed on the adjacency matrix to obtain an initial reachability matrix that includes each compute node in the computation graph, which may be used to indicate reachability relationships between each compute node in the computation graph. By transitive closure is meant the smallest transitive relationship that contains transitive relationships between any two nodes; the term "transitive closure" means: and searching the computing nodes with the transfer relationship according to the connection relationship indicated by the adjacency matrix, and determining the reachability relationship among the computing nodes according to the searched transfer relationship of the computing nodes.

For example, from the adjacency matrix shown in fig. 5a, it can be seen that: the computing node A is connected with the computing node C, and the computing node C is connected with the computing node F; then computing node C may be determined to be a computing node having a transfer relationship of: from computing node a to computing node C, and from computing node C to computing node F. Then from this transfer relationship, it may be determined that there is a reachability relationship between compute node a and compute node C. Based on this, reachability relationships between the computing nodes may be obtained, resulting in an initial reachability matrix as shown in fig. 5 b. Any computing node can be known to reach which computing node along a path (i.e., at least one directed edge) through the initial reachability matrix, and also can be known to reach which computing node; for example, it is known that the computing node F can reach four computing nodes of KMNO, or it is known that the computing node F can be reached by six computing nodes of abcmeg. After the initial reachability matrix is obtained, non-target nodes in the initial reachability matrix may be removed, and a target reachability matrix including a plurality of target nodes may be obtained, as shown in fig. 5 c. Then, a target directed graph formed by a plurality of target nodes can be constructed according to the construction principle of the minimum edge number and the target reachable matrix, as shown in fig. 5 d. The construction principle of the minimum edge number refers to that: the constructed target directed graph contains the principle of minimum number of directed edges.

After the target directed graph is obtained, reachability information of a plurality of target nodes may be obtained based on the target directed graph. Specifically, for any target node, each target node in the target directed graph other than the any target node may be traversed. If the current traversed target node is detected to reach any target node along at least one edge in the target directed graph, the current traversed target node is used as a reachable target node of any target node and added into reachability information of any target node; if the target directed graph detects that any target node reaches the target node currently traversed along at least one edge, the target node currently traversed is used as the reachable target node of any target node and added into the reachability information of any target node. After all target nodes except any target node in the target directed graph are traversed, the reachability information of any target node can be obtained.

It should be noted that, similar to the initial reachability matrix, it is also possible to directly know which target nodes any target node can reach along the path (i.e., the at least one directed edge), and which target nodes the any target node is reached by through the target reachability matrix. Therefore, in other embodiments, after the target reachability matrix is obtained, reachability information of a plurality of target nodes may also be directly obtained according to the target reachability matrix.

S402, extracting aggregation level information according to the reachability information of each target node.

Wherein, the aggregation level information may include: n layers of node groups required by aggregation, wherein N is a positive integer; each node group includes at least one of the following nodes: a target node and an aggregate node aggregated by at least two target nodes. From the foregoing, it can be seen that nodes (e.g., target nodes) having the same reachability information (or similar reachability information) typically have the same (or similar) dependencies; these nodes with the same reachability information (or similar reachability information) may be aggregated together. Based on this, in order to make the nodes in each node group required for each layer aggregation have the same reachability information or similar reachability information, it is convenient for the subsequent aggregation of these nodes having the same reachability information (or similar reachability information) in accordance with the aggregation level information. According to the embodiment of the invention, a reachability condition can be set according to the information characteristics among the nodes with the same reachability information and similar reachability information; the reachability condition may include: according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of each selected node, the calculated node affinities among the selected nodes are larger than an affinity threshold.

The node affinity calculation method includes, but is not limited to: the first reference value is calculated according to the number of reachable nodes in the reachable information of each selected node, and the second reference value is calculated according to the number of reachable nodes in the reachable information of each selected node. Wherein the first reference value may include: the sum of the number of reachable nodes in the reachable information of each selected node, or the average value of the number of reachable nodes in the reachable information of each selected node, and the like; similarly, the second reference value may include: the sum of the number of reachable nodes in the reachable information of each selected node, or the average of the number of reachable nodes in the reachable information of each selected node, etc. Second, a first ratio between the number of the same reachable nodes and a first reference value may be calculated, and a second ratio between the number of the same reachable nodes and a second reference value may be calculated. Then, the node affinity can be calculated according to the first ratio and the second ratio. Specifically, the sum of the first ratio and the second ratio can be obtained to obtain node affinity; alternatively, the average of the first ratio and the second ratio may be calculated to obtain the node affinity; alternatively, the first ratio and the second ratio may be weighted and summed to obtain node affinity, etc., based on the weight values of the same reachable node indicators and the weight values of the same reachable node indicators. It should be understood that the embodiments of the present invention are exemplary only and not exhaustive of the specific implementations in which node affinities are calculated.

Then, when step S402 is performed, the reachability condition may be referred to extract aggregation level information according to the reachability information of each target node; specifically, the specific embodiment of step S402 may include the following steps S11-S13:

s11, selecting nodes with reachability information meeting reachability conditions from the node set related to the i-th layer aggregation, and adding the selected nodes to at least one i-th node group required by the i-th layer aggregation. An ith node group corresponds to an ith aggregation node, and the value of i is E [1, N ]; when the value of i is 1, the node set related to layer 1 aggregation comprises a plurality of target nodes; the set of nodes involved in layer 2 aggregation includes: the 1 st aggregation node corresponding to each 1 st node group, and the target nodes not selected from at least one i-th node group required by the layer 1 aggregation, and the like.

s12, replacing the selected node in the node set with the ith aggregation node corresponding to each ith node group to update the node set. It should be noted that, after replacing the selected node in the node set with the ith aggregation node corresponding to each ith node group, the reachability information of each node in the updated node set is updated.

And s13, if nodes with the reachability information meeting the reachability condition exist in the updated node set, executing an operation of adding one to the current value of i to update i, and executing a step of selecting the nodes with the reachability information meeting the reachability condition from the node set related to the i-th layer aggregation. If the updated node set does not have nodes with reachability information meeting the reachability condition, the extraction of the aggregation level information can be stopped.

As is clear from the above description of steps s11-s13, the reachability information of each node in each node group satisfies the reachability condition. Wherein, each node in the node set corresponding to the ith layer forms a directed graph corresponding to the ith layer; the reachability information for any node in any node group of the i-th layer includes at least one of: the reachable node corresponding to any node and the reachable node corresponding to any node. The reachable node corresponding to any node refers to: a node reached by any node through at least one edge in the directed graph of layer i; the reachable node corresponding to any node means: at least one edge in the directed graph through the ith layer reaches a node of any node. Accordingly, the reachability information of each node in each node group satisfies the reachability condition including: according to the number of the same reachable nodes and the number of the same reachable nodes included in the reachability information of each node in each node group, the calculated node affinities between the nodes in each node group are larger than an affinity threshold.

Based on the description of steps S11-S13, in order to more clearly understand the implementation of step S402, the implementation of step S402 will be further described below with reference to the following examples:

(one) the value of i is 1:

first, a target node whose reachability information satisfies the reachability condition may be selected from a node set (i.e., a plurality of target nodes) involved in the first layer aggregation (layer 1 aggregation), and then the selected target node is added to at least one first node group (layer 1 node group) required for the first layer aggregation. For example, providing a plurality of target nodes includes: target node B, target node E, target node G, target node J, target node L, and target node O shown in fig. 5 d; the reachability information of the target node B is the same as that of the target node E, and the reachability information of the target node G is the same as that of the target node J; namely, the target nodes of which the node set reachability information related to the first layer aggregation meets the reachability condition are as follows: target node B and target node E, and target node G and target node J. Then target node B and target node E may be added to a first node group (denoted b=e) and target node G and target node J may be added to another first node group (denoted g=j); the first aggregation node corresponding to the first node group (b=e) is (BE), and the first aggregation node corresponding to the first node group (g=j) is (GJ).

And secondly, replacing the selected node in the node set related to the first layer aggregation by adopting the first aggregation nodes corresponding to the two first node groups so as to update the node set. Specifically, a first aggregation node (BE) may BE used to replace the selected target node B and target node E in the node set involved in the first layer aggregation; and replacing the selected target node G and target node J in the node set involved in the first layer aggregation with a first aggregation node (GJ), whereby an updated node set is obtained comprising the following nodes: a first aggregation node (BE), a first aggregation node (GJ), a target node L and a target node O. The reachability information for each node in the updated set of nodes may then be obtained. Specifically, target nodes related to each first node group can be aggregated in a target reachable matrix to obtain a first aggregation matrix; and obtaining the reachability information of each node in the updated node set according to the first aggregation matrix. Or, virtual aggregation can be performed on the target nodes related to each first node group in the target directed graph to obtain a first directed graph; and the reachability information of each node in the updated node set can be obtained according to the first directed graph. Taking the target reachable matrix or target directed graph shown in fig. 5d as an example, a first aggregation matrix or first directed graph as shown in fig. 5e can be obtained. Then, according to the first aggregation matrix or the first directed graph, reachability information of each node in the updated node set may be obtained as follows: the reachability information of the first aggregation node (BE) is ((GJ) LO, ×), the reachability information of the first aggregation node (GJ) is (O, (BE)), the reachability information of the target node L is (O, (BE)), and the reachability information of the target node O is (×, (BE) (GJ) L).

Then, whether nodes with the reachability information meeting the reachability condition exist in the updated node set or not can be detected; if so, executing an operation of adding one to the current value of i to update i, and executing a step of selecting nodes with reachability information meeting the reachability condition from the node set related to the i layer; otherwise, stopping extracting the aggregation level information. With the above example, since the reachability information of the first aggregation node (GJ) and the target node L in the updated node set satisfies the reachability condition, an operation may be performed to update i by adding one to the current value of i (the value is "1"), that is, the value of i after updating is 2 at this time. Then, a step of selecting nodes whose reachability information satisfies reachability conditions from the set of nodes involved in layer 2 aggregation may be performed; see in particular the description below.

(II) i has a value of 2:

first, a target node whose reachability information satisfies the reachability condition may be selected from the node set involved in the second-layer aggregation (layer 2 aggregation), and then the selected target node is added to at least one second node group (layer 2 node group) required for the second-layer aggregation. In carrying the above example, the node set involved in the second layer aggregation includes: a first aggregation node (BE), a first aggregation node (GJ), a target node L and a target node O; the target nodes with the reachability information meeting the reachability condition in the node set related to the second layer aggregation are as follows: a first aggregation node (GJ) and a target node L. Then the first aggregation node (GJ) and the target node L may be added to the second node group (denoted gj=l); the second aggregation node corresponding to the second node group (gj=l) is (GJL).

And secondly, replacing the selected node in the node set related to the second layer aggregation by adopting the second aggregation node corresponding to the second node group so as to update the node set. Specifically, the second aggregation node (GJL) may be used to replace the selected first aggregation node (GJ) and the target node L in the node set related to the second layer aggregation, so that the updated node set may include the following nodes: a first aggregation node (BE), a second aggregation node (GJL) and a target node O. The reachability information for each node in the updated set of nodes may then be obtained. Specifically, the nodes related to each second node group can be aggregated in the first aggregation matrix to obtain a second aggregation matrix; and obtaining the reachability information of each node in the updated node set according to the second aggregation matrix. Or, virtual aggregation can be performed on the nodes related to each second node group in the first directed graph to obtain a second directed graph; and the reachability information of each node in the updated node set can be obtained according to the second directed graph. Taking the first aggregation matrix or the first directed graph shown in fig. 5e as an example, a second aggregation matrix or a second directed graph as shown in fig. 5f may be obtained. Then, according to the second aggregation matrix or the second directed graph, reachability information of each node in the updated node set may be obtained as follows: the reachability information of the first aggregation node (BE) is ((GJL) O, ×), the reachability information of the second aggregation node (GJL) is (O, (BE)), and the reachability information of the target node O is (×, (BE) (GJL)).

Then, whether nodes with the reachability information meeting the reachability condition exist in the updated node set or not can be detected; if so, executing an operation of adding one to the current value of i to update i, and executing a step of selecting nodes with reachability information meeting the reachability condition from the node set related to the i layer; otherwise, stopping extracting the aggregation level information. With the above example, since the updated reachability information of the first aggregation node (BE) and the second aggregation node (GJL) in the node set satisfies the reachability condition, and the reachability information of the second aggregation node (GJL) and the target node O satisfies the reachability condition, an operation may BE performed to update i by adding one to the current value (value of "2") of i, that is, the updated value of i is 3 at this time. Then, a step of selecting nodes whose reachability information satisfies reachability conditions from the set of nodes involved in layer 3 aggregation may be performed; see in particular the description below.

(III) the value of i is 3:

first, a target node whose reachability information satisfies the reachability condition may be selected from the node set involved in the third-layer aggregation (layer 3 aggregation), and then the selected target node is added to at least one third node group (layer 3 node group) required for the third-layer aggregation. In carrying the above example, the node set involved in the third layer aggregation includes: a first aggregation node (BE), a second aggregation node (GJL) and a target node O; the target nodes with the reachability information meeting the reachability condition in the node set related to the third layer aggregation are as follows: a first aggregation node (BE) and a second aggregation node (GJL), and a second aggregation node (GJL) and a target node O. The first aggregation node (BE) and the second aggregation node (GJL) may BE added to a third node group (denoted by be= GJL), and the second aggregation node (GJL) and the target node O may BE added to a third node group (denoted by GJL =o). Further, since the same node exists in the two third node groups (i.e., the second aggregation node (GJL)), the two third node groups may be combined into one third node group to reduce the aggregation hierarchy. In this case, the number of third node groups involved in the third layer aggregation is 1, which includes the following nodes: a first aggregation node (BE), a second aggregation node (GJL) and a target node O; correspondingly, the third aggregation node corresponding to the third node group is (BEGJLO).

And secondly, replacing the selected node in the node set related to the third layer aggregation by adopting the third aggregation node corresponding to the third node group so as to update the node set. Specifically, the third aggregation node is (BEGJLO) may BE used to replace the selected first aggregation node (BE), the second aggregation node (GJL) and the target node O in the node set related to the third layer aggregation, so that the updated node set may include the following nodes: the third aggregation node is (BEGJLO). Since the updated node set includes only the third aggregation node (BEGJLO), there is necessarily no node whose reachability information satisfies the reachability condition in the updated node set; then the extraction of the aggregate level information may be stopped at this point to obtain the final aggregate level information. Alternatively, the aggregate level information may be represented using the level information diagram shown in fig. 5 g.

S403, performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information to obtain the target aggregation nodes.

In a specific implementation, node aggregation may be performed with the traffic threshold as granularity, starting from the innermost aggregation (i.e., the first layer aggregation) of the aggregation level information. Specifically, a specific embodiment of step S403 may include the following steps S21-S25:

s21, determining at least one nth node group required by nth layer aggregation according to aggregation level information, and determining the traffic sum of each nth node group according to the traffic of each node in each nth node group; wherein n is [1, N ]. It is to be noted that, from the foregoing, it is known that: the nodes in any node group may include at least one of: a target node and an aggregate node aggregated by at least two target nodes. For a target node, determining the traffic of the target node according to the data size of the execution result data corresponding to the target node; for an aggregation node, the traffic of the aggregation node is obtained by summing the traffic of a target node corresponding to the aggregation node.

s22, selecting an nth node group with the sum of the traffic less than or equal to the traffic threshold from at least one nth node group; and performing aggregation processing on each node in the selected nth node group to obtain an nth aggregation node.

s23, if the current value of N is smaller than N, and the sum of the traffic of each n+1th node group required by n+1th layer aggregation acquired according to the aggregation level information is greater than the traffic threshold, the target aggregation node can be obtained according to the N aggregation node.

s24, if the current value of N is smaller than N and the sum of the traffic of at least one n+1th node group is smaller than or equal to the traffic threshold, adding an operation to the current value of N to update N, and determining at least one nth node group required by the nth layer aggregation according to the aggregation level information.

And s25, if the current value of N is equal to N, obtaining the target aggregation node according to the nth aggregation node.

The specific implementation of the step s21-s25 of obtaining the target aggregation node according to the nth aggregation node may be: if the value of n is 1, the 1 st aggregation node is taken as the target aggregation node. If the value of n is not 1, at least one history aggregation node obtained by the previous n-1 layer aggregation is obtained, and the history aggregation node which is not subjected to aggregation processing is selected from the at least one history aggregation node, and the nth aggregation node is used as a target aggregation node. The first n-1 layer polymerization means: all layers between layer 1 polymerization to layer n-1 polymerization.

Based on the description of the steps S21 to S25, in order to more clearly understand the implementation of the step S403, the implementation of the step S403 will be further described with reference to specific examples. Specifically, the above example is still accepted, and the traffic threshold is set to 100 and the traffic of each target node is set as follows: the traffic volume of target node B is 50 (i.e., b=50), the traffic volume of target node E is 20 (i.e., e=50), the traffic volume of target node L is 120 (i.e., l=120), the traffic volume of target node G is 10 (i.e., g=10), the traffic volume of target node J is 80 (i.e., j=80), and the traffic volume of target node O is 50 (i.e., o=50). Then, correspondingly, the specific implementation procedure of step S403 is as follows:

(one) n has a value of 1:

first, determining at least one first node group required for the first layer aggregation according to the aggregation level information includes: a first group of nodes (b=e) and a first group of nodes (g=j); and determining the traffic sum of the respective first node groups from the traffic of each node in the respective first node groups, respectively, that is, determining that the traffic sum of the first node group (b=e) is 70 and determining that the traffic sum of the first node group (g=j) is 90. Second, a first node group with the traffic sum less than or equal to a traffic threshold value can be selected from at least one first node group; and performing aggregation processing on each node in the selected first node group to obtain a first aggregation node. Since the traffic sum of the first node group (b=e) and the traffic sum of the first node group (g=j) are smaller than the traffic threshold (100), the nodes in the two first node groups can BE aggregated respectively to obtain a first aggregation node (BE) and a first aggregation node (GJ), as shown in fig. 5 h.

Then, because the current value (1) of N is smaller than N (3), the total traffic of each n+1th node group required by n+1th layer aggregation can be obtained according to the aggregation level information; and obtaining the sum of the traffic of each second node group required by the second layer aggregation according to the aggregation level information. Specifically, the number of second node groups required by the second layer aggregation may be 1, specifically, the second node groups (gj=l) may be obtained according to the aggregation level information; the sum of the traffic of the second group of nodes may then be calculated as 210 based on the traffic (90) of the first aggregation node (GJ) and the traffic (120) of the target node L in the second group of nodes. Stopping the aggregation iteration since the acquired sum of the traffic of the second node group required for the second layer aggregation is larger than the traffic threshold (100); at this time, the target aggregation node may be obtained according to the first aggregation node, that is, the target aggregation node includes: a first aggregation node (BE) and a first aggregation node (GJ).

It should be noted that, in other embodiments, the traffic threshold is 300; then an add operation may also be performed on the current value of N to update N such that the value of N is updated to 2, since the traffic sum (210) of the second node group is less than the traffic threshold (300) and the current value of N is less than N. The step of determining at least one second node group required for the second layer aggregation according to the aggregation level information may then be performed, see in particular the description below.

(II) n has a value of 2:

first, determining at least one second node group required for second layer aggregation from aggregation level information includes: a second node group (gj=l); and determining a traffic sum of the respective second node groups from the traffic of each node in the second node group, the traffic sum of the second node groups (gj=l) may be determined to be 210. Second, selecting a second node group with the sum of the traffic less than or equal to the traffic threshold from at least one second node group; and performing aggregation processing on each node in the selected second node group to obtain a second aggregation node. Since the sum of traffic of the second node group (gj=l) is smaller than the traffic threshold (300), the nodes in the second node group may be aggregated to obtain a second aggregated node (GJL). Then, as the current value (2) of N is smaller than N (3), the total traffic of each n+1th node group required by n+1th layer aggregation can be obtained according to the aggregation level information; and obtaining the sum of the traffic of each third node group required by the third layer aggregation according to the aggregation level information. Specifically, the number of the third node groups required by the third layer aggregation can be obtained according to the aggregation level information to be 1; then, the traffic sum of the third node group may BE calculated as 330 based on the traffic (70) of the first aggregation node (BE), the traffic (210) of the second aggregation node (GJL), and the traffic (50) of the target node O in the third node group. Stopping the aggregation iteration since the acquired sum of traffic of the third node group required for the third layer aggregation is larger than the traffic threshold (300); at this time, the target aggregation node may be obtained according to the second aggregation node. Specifically, at least one history aggregation node obtained by the previous 1-layer aggregation, namely a first aggregation node (BE) and a first aggregation node (GJ), can BE obtained; and selecting a history aggregation node (i.e., a first aggregation node (BE)) which is not subjected to aggregation processing from at least one history aggregation node, and an nth aggregation node (i.e., a second aggregation node (GJL)) as a target aggregation node. That is, the target aggregation node in this case includes: a first aggregation node (BE) and a second aggregation node (GJL).

S404, updating the calculation graph by using the target aggregation node.

In the specific implementation process, a target aggregation node can be added in the calculation graph, and a directed edge is adopted to connect the target aggregation node and the aggregated target node; taking the target aggregation node shown in fig. 5h (i.e., the first aggregation node (BE) and the first aggregation node (GJ)) as an example, a schematic diagram of adding the target aggregation node can BE seen in fig. 5 i. Then, a matched communication node can be added for the target node which is not aggregated in the calculation graph, and a matched communication node can be added for the target aggregation node in the calculation graph; wherein the communication node is configured to represent a data transmission operation. With the above example in mind, the target nodes that are not aggregated include: a target node L and a target node O; the target aggregation node comprises a first aggregation node (BE) and a first aggregation node (GJ); a schematic diagram of adding a communication node may be seen in fig. 5 j.

And S405, sending the updated calculation graph to the computing device.

The embodiment of the invention can firstly acquire the reachability information of a plurality of target nodes in the calculation graph of the target object; secondly, the aggregation level information can be extracted according to the reachability information of a plurality of target nodes in the calculation graph of the target object. And secondly, performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information so as to improve the accuracy of the target aggregation nodes. The computational graph may then be updated with the target aggregation node and the updated computational graph sent to the computing device. In the process of calculating the target object, after the data processing operation represented by the aggregated target node is executed, the computing device can aggregate and transmit the execution result data of the data processing operation represented by the aggregated target node according to the indication of the target aggregation node, so that the number of times of data transmission is reduced, network resources are saved, and the total transmission time is shortened.

In practical applications, the above mentioned data transmission method can be applied in different application scenarios; for example, a distributed machine learning application scenario, an application scenario that uses one or more computing devices to test applications, an application scenario that uses one or more computing devices to test hardware devices, and so on. Wherein distributed machine learning refers to: machine learning mode of distributing machine learning task of neural network model to multiple computing devices for parallel processing. Distributed machine learning may support multiple modes, such as a Data parallel (Data parallel) mode, a model parallel (model Parallelism) mode, and so on. In data parallel mode: different computing devices have multiple copies of the same model, each computing device model trains the respective copies in parallel using different training data such that the respective copies are machine-learned, and then incorporates the results (e.g., gradients) of the computations involved in model training by all computing devices in some manner. In model parallel mode: different parts of the same model are assigned to different computing devices, such as different network layers or different parameters of the same network layer are assigned to different computing devices, the respective responsible parts are model trained by the respective computing devices in parallel to make the respective responsible parts machine-learnt, and then training results of all computing devices are combined.

The machine learning is a multi-field interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like; the learning behavior of the computer equipment is specially researched to simulate or realize the learning behavior of human beings so as to acquire new knowledge or skills, and the existing knowledge structure is reorganized to continuously improve the performance of the computer equipment. Machine learning is the core of AI (Artificial Intelligence ), which refers to the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, AI is a comprehensive technique of computer science; the intelligent machine is mainly used for producing a novel intelligent machine which can react in a similar way of human intelligence by knowing the essence of the intelligence, so that the intelligent machine has multiple functions of sensing, reasoning, decision making and the like.

The specific application of the data transmission method will be described below taking the application scenario of the above-mentioned data transmission method applied to distributed machine learning as an example; in the application scenario of the distributed machine learning, the target object may be a neural network model to be subjected to the distributed machine learning, and the execution result data of the data processing operation represented by each target node includes: the neural network model creates gradients in distributed machine learning. Specifically, the general principle of the data transmission method can be collectively seen in fig. 6:

The processing device may first obtain a computational graph of the neural network model, which may include a plurality of target nodes for representing data processing operations that require transmission of execution result data (e.g., gradients). Second, the target nodes with the same or similar reachability information may be aggregated into one target aggregation node (concatemer node) by comparing the reachability information of each target node in the computational graph that needs to transmit synchronization data (i.e., gradients). The target aggregation node may then be added to the computational graph, and a communication node (All Reduce node) may be added to update the computational graph for the tensor that needs to be communicated (i.e., the gradient corresponding to the target node that is not aggregated and the aggregation result corresponding to the aggregation node). At run-time, the processing device may issue the updated computational graph to each computing device; in the process of model training the copies of the neural network models held by the computing devices, gradient fusion can be carried out on gradients corresponding to the aggregated target nodes according to the indication of the aggregated nodes in the updated computing graph; by gradient fusion is meant: and fusing the different gradients into one communication data segment for communication transmission. After gradient fusion, the communication node can be operated; and each computing device can synchronously communicate with the management device when operating to the communication node so as to transmit the corresponding tensor (the gradient corresponding to the target node which is not aggregated and the aggregation result corresponding to the aggregation node) to the management device.

Correspondingly, after receiving the tensor transmitted by each computing device, if the tensor transmitted by each computing device is a gradient corresponding to the target node which is not aggregated, the management device may directly perform merging calculation (such as mean calculation) on the gradient transmitted by each computing device, and update the network parameters of the neural network model (i.e. the target object) by using the merged gradient. If the tensor transmitted by each computing device is an aggregation result corresponding to the aggregation node, the management device can perform separation processing on the aggregation result to obtain each gradient to be fused. Then, the gradients of the same data processing operation transmitted by the computing devices can be respectively combined and calculated (such as mean value calculation), and the combined gradients are used for updating network parameters of the neural network model (i.e. the target object) respectively. After updating the network parameters, the management device may issue the updated network parameters to each computing device; or after receiving the pulling request of each computing device, issuing the updated network parameters to each computing device, so that each computing device executes the next round of model training by adopting the updated network parameters, and repeatedly executing the steps until the model training is completed.

Therefore, in the application scenario that the data transmission method provided by the embodiment of the invention is applied to the distributed machine learning, the gradient obtained by each computing device in the model training process can be effectively fused and transmitted, so that the transmission delay can be effectively reduced, and the communication is accelerated. Moreover, the gradient fusion method can adapt to a complex calculation graph topological structure and different traffic threshold conditions, and can realize flexible fusion of communication information so as to enable calculation communication to be parallel. It should be understood that, the data transmission method provided by the embodiment of the invention can be reasonably and flexibly applied to machine learning platforms such as a distributed machine learning framework and the like, and can be further extended to other distributed systems needing computational communication parallelism; the embodiments of the present invention are not limited in this regard.

Based on the above description of the embodiments of the data transmission method, the embodiments of the present invention also disclose a data transmission device, which may be a computer program (including program code) running in a processing device. The data transmission device may perform the method shown in fig. 2 or fig. 4. Referring to fig. 7, the data transmission apparatus may operate as follows:

An obtaining unit 701, configured to obtain reachability information of a plurality of target nodes in a computation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and execution result data of the data processing operation represented by each target node needs to be transmitted; wherein the reachability information of any target node is used for indicating: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph;

an aggregation unit 702, configured to aggregate at least two target nodes into a target aggregate node according to reachability information of each target node, where the target aggregate node is configured to instruct aggregation of execution result data of a data processing operation represented by the aggregated target nodes;

a processing unit 703, configured to update the computational graph with the target aggregation node, and send the updated computational graph to a computing device, where the updated computational graph is used to indicate: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In one embodiment, the aggregation unit 702, when configured to aggregate at least two target nodes into a target aggregate node according to the reachability information of each target node, may be specifically configured to:

In yet another embodiment, the aggregation unit 702, when configured to extract aggregation level information according to reachability information of each target node, may be specifically configured to:

In another embodiment, the aggregation unit 702, when configured to perform at least one layer of aggregation iterative processing on the plurality of target nodes according to the aggregation level information, may be specifically configured to:

In yet another embodiment, the aggregation unit 702 may be further specifically configured to:

In yet another embodiment, the aggregation unit 702, when configured to obtain the target aggregation node according to the nth aggregation node, may be specifically configured to:

In still another embodiment, the obtaining unit 701, when used for obtaining reachability information of a plurality of target nodes in a computational graph of a target object, may be specifically configured to:

In still another embodiment, the obtaining unit 701, when used for obtaining a target directed graph formed by a plurality of target nodes in a computation graph of a target object, may be specifically configured to:

In still another embodiment, the obtaining unit 701, when configured to calculate, according to the topology relationship of the computation graph, a target reachability matrix including the plurality of target nodes, may be specifically configured to:

In yet another embodiment, the processing unit 703, when configured to update the computational graph with the target aggregation node, may be specifically configured to:

According to one embodiment of the invention, the steps involved in the method of fig. 2 or fig. 4 may be performed by the units of the data transmission device of fig. 7. For example, steps S201 to S203 shown in fig. 2 may be performed by the acquisition unit 701, the aggregation unit 702, and the processing unit 703 shown in fig. 7, respectively; as another example, step S401 shown in fig. 4 may be performed by the acquisition unit 701 shown in fig. 7, steps S402 to S403 may be performed by the aggregation unit 702 shown in fig. 7, and steps S404 to S405 may be performed by the processing unit 703 shown in fig. 7.

According to another embodiment of the present invention, each unit in the data transmission apparatus shown in fig. 7 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present invention, the data-based transmission device may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present invention, a data transmission apparatus device as shown in fig. 7 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 4 on a general-purpose processing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and the data transmission method of the embodiment of the present invention is implemented. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed in the processing apparatus described above via the computer-readable recording medium.

Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides a processing device. Referring to fig. 8, the processing device includes at least a processor 801, an input interface 802, an output interface 803, and a computer storage medium 804. Wherein the processor 801, input interface 802, output interface 803, and computer storage medium 804 within the processing device may be connected by bus or other means.

The computer storage medium 804 may be stored in a memory of a processing device, the computer storage medium 804 being adapted to store a computer program comprising program instructions, the processor 801 being adapted to execute the program instructions stored by the computer storage medium 804. The processor 801, or CPU (Central Processing Unit ), is a computing core and a control core of the processing device, adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 801 according to the embodiments of the present invention may be configured to perform a series of data transmission processes, including: obtaining reachability information of a plurality of target nodes in a calculation graph of a target object; each target node is used for representing a data processing operation which needs to be executed in the calculation process of the target object, and execution result data of the data processing operation represented by each target node needs to be transmitted; wherein the reachability information of any target node is used for indicating: the ability of the any target node to reach other target nodes along at least one edge in the computational graph, and the ability of the any target node to be reached by other target nodes along at least one edge in the computational graph; according to the reachability information of each target node, aggregating at least two target nodes into a target aggregation node, wherein the target aggregation node is used for indicating to aggregate the execution result data of the data processing operation represented by the aggregated target nodes; updating the computational graph by adopting the target aggregation node, and sending the updated computational graph to computing equipment, wherein the updated computational graph is used for indicating: the computing device aggregates execution result data of the data processing operation represented by the aggregated target node in the computing process of the target object according to the instruction of the target aggregation node, transmits an aggregation result, and the like.

The embodiment of the invention also provides a computer storage medium (Memory), which is a Memory device in the processing device and is used for storing programs and data. It is understood that the computer storage media herein may include both built-in storage media in the processing device and extended storage media supported by the processing device. The computer storage media provides storage space that stores the operating system of the processing device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 801. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; optionally, at least one computer storage medium remote from the processor may be present.

In one embodiment, one or more instructions stored in computer storage medium 804 may be loaded and executed by processor 801 to implement the corresponding method steps described above in connection with the data transmission method embodiments shown in fig. 2 or fig. 4; in particular implementations, one or more instructions in computer storage media 804 are loaded by processor 801 and perform the steps of:

updating the computational graph by adopting the target aggregation node, and sending the updated computational graph to computing equipment, wherein the updated computational graph is used for indicating: and the computing equipment aggregates the execution result data of the data processing operation represented by the aggregated target node in the computing process of the object according to the indication of the target aggregation node, and transmits an aggregation result.

In one embodiment, the one or more instructions may be loaded and executed in particular by the processor 801 when aggregating at least two target nodes into a target aggregate node according to reachability information for each target node:

In yet another embodiment, the one or more instructions may be loaded and executed by the processor 801 to perform in particular the steps of extracting the aggregation level information based on the reachability information of each target node:

In yet another embodiment, when at least one layer of aggregation iteration processing is performed on the plurality of target nodes according to the aggregation level information to obtain a target aggregation node, the one or more instructions may be loaded and specifically executed by the processor 801:

In yet another embodiment, the one or more instructions may also be loaded and executed in particular by the processor 801:

In yet another embodiment, the one or more instructions may be loaded and executed by the processor 801 to:

In yet another embodiment, the one or more instructions may be loaded and executed in particular by the processor 801 when obtaining reachability information for a plurality of target nodes in a computational graph of a target object:

In yet another embodiment, the one or more instructions may be loaded and executed in particular by the processor 801 when obtaining a target directed graph comprised of a plurality of target nodes in a computational graph of a target object:

In yet another embodiment, the one or more instructions may be loaded and executed in particular by the processor 801 in calculating a target reachability matrix comprising the plurality of target nodes based on the topology of the computational graph:

In yet another embodiment, the one or more instructions may be loaded and executed in particular by the processor 801 when updating the computational graph with the target aggregation node:

It should be noted that according to an aspect of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternatives of the data transmission method embodiments aspects shown in fig. 2 or fig. 4 described above.

It is also to be understood that the foregoing is merely illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A data transmission method, comprising:

2. The method of claim 1, wherein aggregating at least two target nodes into a target aggregate node based on reachability information for each target node, comprises:

3. The method of claim 2, wherein extracting aggregation level information based on reachability information for each target node comprises:

selecting nodes with reachability information meeting the reachability condition from a node set related to the i-th layer aggregation, and adding the selected nodes to at least one i-th node group required by the i-th layer aggregation; an ith node group corresponds to an ith aggregation node, and the value of i is E [1, N ]; when the value of i is 1, the node set related to layer 1 aggregation comprises a plurality of target nodes;

4. The method of claim 2, wherein each node in the set of nodes corresponding to the ith layer forms a directed graph corresponding to the ith layer, and the value of i is e [1, n ]; the reachability information for any node in any node group of the i-th layer includes at least one of: the reachable node corresponding to any node and the reachable node corresponding to any node;

5. The method of claim 2, wherein performing at least one layer of aggregation iteration processing on the plurality of target nodes according to the aggregation level information to obtain target aggregation nodes comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method according to claim 5 or 6, wherein the obtaining a target aggregation node according to the nth aggregation node comprises:

8. The method of claim 1, wherein the obtaining reachability information for a plurality of target nodes in a computational graph of a target object comprises:

9. The method of claim 8, wherein the obtaining a target directed graph comprised of a plurality of target nodes in a computational graph of a target object comprises:

10. The method of claim 9, wherein the computing a target reachability matrix containing the plurality of target nodes based on the topology of the computational graph comprises:

11. The method of claim 1, wherein the updating the computational graph with the target aggregation node comprises:

12. The method of claim 1, wherein the target object comprises a neural network model to be subjected to distributed machine learning, and the execution result data of the data processing operation represented by each target node comprises: gradients generated by the neural network model in the distributed machine learning.

13. A data transmission apparatus, comprising:

14. A processing device comprising an input interface and an output interface, further comprising:

computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the data transmission method according to any of claims 1-12.

15. A computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the data transmission method according to any one of claims 1-12.