WO2023088288A1 - 二部图构建方法、显示方法和装置 - Google Patents

二部图构建方法、显示方法和装置 Download PDF

Info

Publication number
WO2023088288A1
WO2023088288A1 PCT/CN2022/132189 CN2022132189W WO2023088288A1 WO 2023088288 A1 WO2023088288 A1 WO 2023088288A1 CN 2022132189 W CN2022132189 W CN 2022132189W WO 2023088288 A1 WO2023088288 A1 WO 2023088288A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
communication
node
cross
aggregation
Prior art date
Application number
PCT/CN2022/132189
Other languages
English (en)
French (fr)
Inventor
王中伟
朱融晨
高寒
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023088288A1 publication Critical patent/WO2023088288A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a bipartite graph construction method, display method and device.
  • the communication node is used to indicate the data interaction task, which refers to the communication between two or more devices (such as Graphics Processing Unit (GPU)). Data interaction.
  • GPU Graphics Processing Unit
  • researchers will design a reasonable parallel strategy to achieve the largest possible calculation-to-communication ratio, that is, to minimize the time for simple communication. If the parallel strategy design is unreasonable, redundant communication nodes may be introduced, resulting in performance bottlenecks at the communication nodes.
  • the embodiment of the present application provides a bipartite graph construction method, display method and device, which can extract the communication nodes in the calculation graph to the top layer of the bipartite graph to clearly display the model structure, thereby quickly and intuitively locating the communication nodes location and function, and then provide a basis for the design of subsequent parallel strategies.
  • the present application provides a bipartite graph construction method, including: searching for at least one cross-communication edge corresponding to the first communication node from the calculation graph, wherein the first communication node is the calculation graph that includes One of the M communication nodes, the first communication node corresponds to P precursor nodes and Q successor nodes, and each cross-communication edge in the at least one cross-communication edge indicates a predecessor in the P precursor nodes A communication path between the node and one of the Q successor nodes, and each cross-communication edge does not pass through the M communication nodes, M, P and Q are positive integers; cut the M communication Cross-communication edges corresponding to each communication node in the nodes are aggregated to obtain the bipartite graph, and any two communication nodes among the M communication nodes are directly connected without edges in the bipartite graph.
  • the precursor node corresponding to each communication node is a non-communication node that the data flow passes through before flowing into the communication node.
  • the successor node corresponding to each communication node is a non-communication node that the data flow passes through after flowing out of the communication node.
  • the cross-communication edge corresponding to each communication node includes the communication path between any one of the predecessor nodes corresponding to the communication node and any one of the successor nodes corresponding to the communication node, and each cross-communication edge does not pass through any communication node.
  • the present application cuts the cross-communication edges corresponding to each communication node in the calculation graph, so that only the paths passing through the communication nodes are reserved in the calculation graph.
  • the communication nodes in the calculation graph are extracted to the top layer of the bipartite graph, so that the model structure can be clearly displayed, and the location and function of the communication nodes can be quickly located, and then the integration/cutting of the communication nodes in the subsequent model training process can be provided.
  • the sub-strategy provides a basis to design the best parallel strategy to increase the overlap between computing time and communication time, that is, to reduce the training time of the parallel training process.
  • each cross-communication edge includes at least one sub-edge, and each sub-edge in the at least one sub-edge directly connects two computing nodes; each sub-edge in the at least one sub-edge An edge corresponds to a weight coefficient, and the weight coefficient corresponding to each sub-edge is determined by the types of two computing nodes directly connected to each sub-edge.
  • the calculation node is also called the calculation operator, which is a non-expandable node in the calculation graph.
  • the type of calculation operator is determined by the specific function of the calculation operator. For example, for the logarithmic operator Log Operator, its function is to perform logarithmic operations. Optionally, its type may be Log, that is, the type of each calculation operator is represented by its identifier.
  • this application defines a corresponding weight coefficient for each sub-edge in the cross-communication edge, and defines the importance of each sub-edge through the weight coefficient, thereby providing a corresponding basis for the subsequent cutting of the cross-communication edge.
  • the sequentially cutting the cross-communication edges corresponding to each of the M communication nodes includes: in the case that there is a sub-edge in the i-th cross-communication edge that has been cut , do not cut the i-th cross-communication edge, the i-th cross-communication edge is a cross-communication edge corresponding to any communication node among the M communication nodes; or, the i-th cross-communication edge In the case that none of the included sub-edges is cut, cut the sub-edge with the smallest weight coefficient or the largest weight coefficient among the sub-edges contained in the i-th cross-communication edge, and the i-th cross-communication edge is the M A cross-communication edge corresponding to any one of the communication nodes.
  • one of the sub-edges is cut based on the weight coefficient of the sub-edge.
  • the sub-edge with the least importance can be cut.
  • only one sub-edge can be cut.
  • the semantic information in the calculation graph can be preserved to the greatest extent.
  • a connected block (also called a connected component) is: after the cross-communication edge is cut, the calculation graph will be blocked by the barrier formed by the communication nodes. At this time, the subgraph composed of the calculation nodes in the calculation graph is called is a connected block or connected component.
  • Each connected block includes at least one computing node.
  • each connected block separately, that is, K connected blocks are aggregated to obtain K first-level aggregation nodes, and a bipartite graph composed of first-level aggregation nodes and communication nodes is obtained, so that based on this
  • the bipartite graph quickly locates the location and function of communication nodes, and provides the corresponding basis for the fusion/segmentation of subsequent communication nodes.
  • each first-level aggregation node among the K first-level aggregation nodes is a hierarchical structure, wherein the node in the jth layer in the hierarchical structure is composed of It is obtained by expanding the nodes of the j-1th layer, the first layer in the hierarchical structure is the first-level aggregation node, and the nodes in the jth layer respectively belong to different namespaces; the jth layer in Nodes include aggregation nodes and/or computing nodes, and the computing nodes are non-expandable nodes.
  • the above expansion is the inverse process of polymerization.
  • Aggregation refers to representing a graph structure represented by at least one node and an edge between the at least one node with one node.
  • Expanding refers to representing a node through a graph structure composed of at least one node and edges between the at least one node.
  • the first namespace is updated, and the first namespace is the A namespace in the calculation graph: constructing a namespace including a first computing node, where the first computing node does not belong to the updated first namespace.
  • the method further includes: calculating the hash value of the aggregation node in the bipartite graph, and calculating the hash value of the node; wherein, when the node is the aggregation node, The hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node.
  • the hash value of the node is determined by the type of the computing node, At least one of in-degree, out-degree, type of affiliated nodes, and number of affiliated nodes is determined.
  • the data flowing out from the affiliated nodes of the computing node only flows into the computing node, and no data flows in from the affiliated nodes of the computing node, and the affiliated nodes are usually constants or variables.
  • the method further includes: stacking and displaying multiple nodes in the bipartite graph; wherein, the multiple nodes are obtained by expanding the same aggregation node once, The hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • stack display refers to displaying a plurality of nodes satisfying the above conditions in a bipartite graph using a stack structure composed of connection relationship identifiers and digital identifiers; wherein, the connection relationship identifiers are used to represent the connections between the multiple nodes A relationship, such as a parallel connection or a serial connection; the numerical designation indicates the number of multiple nodes satisfying the above conditions.
  • the above conditions refer to a plurality of nodes that are obtained by expanding the same aggregation node once, have the same hash value, and have a connection relationship of serial connection or parallel connection.
  • nodes with the same hash value have the same internal structure, so when they are connected in series or in parallel, they can be stacked and displayed in the corresponding hierarchy, so that it is clear and concise Show the internal structure of the aggregation node clearly.
  • the present application provides a method for displaying a bipartite graph, the method comprising: inputting a computation graph, and outputting the bipartite graph based on the computation graph; wherein, the computation graph includes M communication nodes,
  • the first communication node among the M communication nodes corresponds to P precursor nodes and Q successor nodes, the first communication node corresponds to at least one cross-communication edge, and each cross-communication edge in the at least one cross-communication edge Indicate a communication path between a predecessor node in the P predecessor nodes and a successor node in the Q successor nodes, and each cross-communication edge does not pass through the M communication nodes, P, Q and M is a positive integer, the cross-communication edges corresponding to the M communication nodes are not connected in the bipartite graph, and any two communication nodes in the M communication nodes are directly connected without edges in the bipartite graph .
  • this application removes the cross-communication edges corresponding to each communication node in the calculation graph, so that only the path of the data flow passing through the communication nodes is reserved in the calculation graph.
  • the communication nodes in the calculation graph are extracted to the top layer of the bipartite graph, so that the model structure can be clearly displayed, and the location and function of the communication nodes can be quickly located, and then the integration/cutting of the communication nodes in the subsequent model training process can be provided.
  • the sub-strategy provides a basis to design the best parallel strategy to increase the overlap between computing time and communication time, that is, to reduce the training time of the parallel training process.
  • the calculation graph includes C computing nodes, and the bipartite graph includes K first-level aggregation nodes; wherein, the K first-level aggregation nodes are composed of the C computing nodes obtained by performing aggregation, each of the K first-level aggregation nodes is a hierarchical structure, wherein the nodes in the jth layer in the hierarchical structure are formed by the j-1th layer in the hierarchical structure obtained by expanding the aggregation nodes in , the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and C, K, and j are positive integers, so
  • the nodes in the jth layer include the aggregation node and/or the computing node, and the computing node is a non-expandable node.
  • the bipartite graph includes a stacking structure; wherein, the stacking structure includes a connection relationship identifier and a quantity identifier, the connection relationship identifier represents a connection relationship between multiple nodes, and the quantity The identifier represents the number of the multiple nodes, the multiple nodes are obtained by expanding the same aggregation node once, the hash values of the multiple nodes are the same, and the connection relationship between the multiple nodes for serial connection or parallel connection.
  • the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node, and when the node is the aggregation node
  • the hash value of the node is determined by the attributes of the computing node, and the attributes of the computing node include the type of computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes .
  • the present application provides a device for constructing a bipartite graph, including a search unit configured to search for at least one cross-communication edge corresponding to a first communication node from the calculation graph, wherein the first communication node is the One of the M communication nodes contained in the calculation graph, the first communication node corresponds to P precursor nodes and Q successor nodes, and each cross-communication edge in the at least one cross-communication edge indicates the P predecessor nodes A communication path between a predecessor node in the node and a successor node in the Q successor nodes, and each cross-communication edge does not pass through the M communication nodes; M, P and Q are positive integers; the cutting unit , used to cut the cross-communication edges corresponding to the M communication nodes respectively, and perform an aggregation operation to obtain the bipartite graph, and any two communication nodes in the M communication nodes have no edges in the bipartite graph directly connected.
  • a search unit configured to search for at least one cross-communication edge corresponding to a first communication node from the
  • each cross-communication edge includes at least one sub-edge, and each sub-edge in the at least one sub-edge directly connects two computing nodes; each sub-edge in the at least one sub-edge An edge corresponds to a weight coefficient, and the weight coefficient corresponding to each sub-edge is determined by the types of two computing nodes directly connected to each sub-edge.
  • the M communication nodes correspond to N cross-communication edges, and N is a positive integer; in terms of cutting the cross-communication edges corresponding to the M communication nodes, the cutting The unit is specifically used to: cut a sub-edge in each of the N cross-communication edges; wherein, when the E cross-communication edges of the N cross-communication edges contain a common sub-edge, The sum of the weight coefficients corresponding to all the sub-edges cut in the E cross-communication edges is the largest or the smallest, and E is a positive integer less than or equal to N; when the i-th cross-communication edge of the N cross-communication edges When the edge and other cross-communication edges do not contain a common sub-edge, cut the sub-edge with the smallest weight coefficient or the largest weight coefficient among the sub-edges included in the i-th cross-communication edge, and i is a positive integer.
  • the cut calculation graph includes K connected blocks, and K is a positive integer; in terms of performing the aggregation operation to obtain the bipartite graph, the cutting unit is specifically used to : respectively aggregate the K connected blocks in the cut calculation graph to obtain the bipartite graph; wherein, the K connected blocks are based on the M communication nodes in the calculation graph obtained by dividing the computing nodes in the computing graph, the bipartite graph includes K first-level aggregation nodes and the M communication nodes, and any two of the K first-level aggregation nodes are Aggregation nodes are boundless and directly connected, and the K first-level aggregation nodes respectively belong to K namespaces.
  • each first-level aggregation node among the K first-level aggregation nodes is a hierarchical structure, wherein the node in the jth layer in the hierarchical structure is composed of It is obtained by expanding the aggregation node in the j-1th layer, the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and j is positive Integer; the nodes in the jth layer include aggregation nodes and/or computing nodes, and the computing nodes are non-expandable nodes.
  • the device further includes: an updating unit, configured to update the first naming space, the first namespace is a namespace in the computation graph; a reconstruction unit is configured to construct a namespace including the first computation node, and the first computation node does not belong to the updated first computation node. Namespaces.
  • the device further includes: a calculation unit, configured to calculate the hash value of the aggregation node in the bipartite graph, and calculate the hash value of the node; wherein, when the node is the When the aggregation node is used, the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node; when the node is the computing node, the hash value of the node is determined by the Determined by the attribute of the computing node, the attribute of the computing node includes the type of the computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes.
  • a calculation unit configured to calculate the hash value of the aggregation node in the bipartite graph, and calculate the hash value of the node; wherein, when the node is the When the aggregation node is used, the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the
  • the device further includes: a stacking unit, configured to stack and display multiple nodes in the bipartite graph; wherein, the multiple nodes are expanded by the same aggregation node After one time, the hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • a stacking unit configured to stack and display multiple nodes in the bipartite graph; wherein, the multiple nodes are expanded by the same aggregation node After one time, the hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • the present application provides a bipartite graph display device, the device comprising: an input unit for inputting a calculation graph; a display unit for displaying the bipartite graph based on the calculation graph; wherein, the The calculation graph includes M communication nodes, the first communication node in the M communication nodes corresponds to P predecessor nodes and Q successor nodes, the first communication node corresponds to at least one cross-communication edge, and the at least one cross-communication node
  • Each cross-communication edge in the communication edges indicates a communication path between a predecessor node in the P predecessor nodes and a successor node in the Q successor nodes, and each cross-communication edge does not pass through the M communication nodes, P, Q and M are positive integers, the cross-communication edges corresponding to the M communication nodes are not connected in the bipartite graph, and any two communication nodes in the M communication nodes are in the No edges in the bipartite graph are directly connected.
  • the calculation graph includes C computing nodes, and the bipartite graph includes K first-level aggregation nodes; wherein, the K first-level aggregation nodes are composed of the C computing nodes obtained by performing aggregation, each of the K first-level aggregation nodes is a hierarchical structure, wherein the nodes in the jth layer in the hierarchical structure are formed by the j-1th layer in the hierarchical structure obtained by expanding the aggregation nodes in , the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and C, K, and j are positive integers, so
  • the nodes in the jth layer include the aggregation node and/or the computing node, and the computing node is a non-expandable node.
  • the bipartite graph includes a stacking structure; wherein, the stacking structure includes a connection relationship identifier and a quantity identifier, the connection relationship identifier represents a connection relationship between multiple nodes, and the quantity The identifier represents the number of the multiple nodes, the multiple nodes are obtained by expanding the same aggregation node once, the hash values of the multiple nodes are the same, and the connection relationship between the multiple nodes for serial connection or parallel connection.
  • the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node, and when the node is the aggregation node
  • the hash value of the node is determined by the attributes of the computing node, and the attributes of the computing node include the type of computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes .
  • the specific process of obtaining the bipartite graph based on the calculation graph in the bipartite graph display device in the fourth aspect is the same as the construction process of the bipartite graph in the bipartite graph display method in the second aspect, and will not be repeated here. .
  • the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed, the method described in any one of the above-mentioned first aspects is realized.
  • the present application provides a computer program, the computer program includes instructions, and when the computer program is executed, the method described in any one of the above-mentioned first aspects is realized.
  • FIG. 1 is a schematic diagram of a system architecture provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario in an embodiment of the present application
  • Fig. 3 is a schematic flow chart of a bipartite graph construction method in the embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a calculation graph in the embodiment of the present application.
  • Figure 5(a)- Figure 5(b) is a schematic diagram of a cross-communication edge cutting method in the embodiment of the present application.
  • Figure 6(a)- Figure 6(c) is a schematic diagram of another cutting method across communication edges in the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a bipartite graph in the embodiment of the present application.
  • FIG. 8 is a schematic diagram of a hierarchical structure of a namespace in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an aggregation process of computing nodes according to the hierarchical structure of the namespace in the embodiment of the present application.
  • Figure 10(a)- Figure 10(d) is an example diagram of a connected block aggregation process in the embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of serial connection and parallel connection of nodes in the embodiment of the present application.
  • Figure 12(a)- Figure 12(b) is a schematic diagram of a stacked aggregation node deployment process in the embodiment of the present application;
  • Figure 13(a)- Figure 13(b) is a timeline example of a model training process in the embodiment of the present application.
  • Fig. 14 is a schematic flow chart of a bipartite graph display method in the embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a bipartite graph construction device in an embodiment of the present application.
  • Fig. 16 is a schematic structural diagram of a bipartite graph display device in an embodiment of the present application.
  • Fig. 17 is a schematic diagram of the hardware structure of a bipartite graph construction device in the embodiment of the present application.
  • FIG. 18 is a schematic diagram of a hardware structure of a bipartite graph display device in an embodiment of the present application.
  • Parallel training Multiple Graphics Processing Units (GPUs) participate in the training process of the neural network model.
  • Parallel training methods include data parallelism, model parallelism, and pipeline parallelism.
  • the nodes in the calculation graph include expandable nodes and non-expandable nodes.
  • An expandable node is called an aggregation node, where the expandable node means that the node can be represented by a graph structure composed of at least one node and edges between the at least one node.
  • the non-expandable nodes in the calculation graph can be divided into two types: calculation nodes and communication nodes, which can also be called calculation operators and communication operators.
  • the communication operator is used to indicate the data interaction task, which refers to the data interaction between two or more devices (such as GPU and other devices).
  • Calculation operators are operators other than communication nodes in the calculation graph, such as the collection operator AllGather Operator, broadcast operator Broadcast Operator, convolution operator Conv2D Operator, maximum pooling operator MaxPool Operator, and addition operator Add Operator , logarithm operator Log Operator, sort operator Sort Operator, transpose operator Transpose Operator, etc.
  • Aggregation node a node that can be expanded in the bipartite graph.
  • the aggregation node is obtained based on the namespace, and stores the node information in the aggregation node, including its child node list, and whether the node is expanded or not, and other attributes that provide support for the subsequent interactive exploration module.
  • Precursor node the precursor node corresponding to each communication node is a non-communication node that the data flow passes through before flowing into the communication node.
  • Successor node the successor node corresponding to each communication node is a non-communication node that the data flow passes through after flowing out of the communication node.
  • Cross-communication edge The cross-communication edge corresponding to each communication node includes the communication path between any one of the predecessor nodes corresponding to the communication node and any one of the successor nodes corresponding to the communication node, and each A cross-communication edge does not pass through any communication node on the way.
  • each connected block includes at least one computing node.
  • Namespace (Name Scope): When the deep learning framework generates the neural network calculation graph, it will group nodes according to the calculation logic and generate a namespace for each node. After parsing the namespace in the calculation graph data, you can Get a data flow graph with hierarchical information.
  • Deep Learning (Deep Learning) framework refers to the structure of multiple levels of learning at different levels of abstraction through machine learning algorithms. Deep learning frameworks include PaddlePaddle, Tensorflow, Caffe, Theano, MXNet, Torch and PyTorch wait.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application. Used to describe the system architecture of the computer device 100 .
  • the system architecture of the computer device 100 may include a front end 110 , a back end 120 and a device layer 130 .
  • the computer device 100 may be a mobile phone, a computer, a tablet, or a server, which is not limited in this application.
  • the front end 110 may include a web page or an App page 111 and a bipartite graph construction unit 112 .
  • the bipartite graph construction unit 112 can send a request to the backend 120, for example, read the calculation graph data in a specific format (for example, json format, etc.) from the server or host directory, then parse the read calculation graph data, and construct the corresponding Bipartite graph (this process is also the main process in this application, will be expanded in the following specific embodiments); after building the bipartite graph, the user can continue to interact and render on the Web page or App page, to Adjust and display the shape of the bipartite graph, and analyze the corresponding model structure and function.
  • a specific format for example, json format, etc.
  • the backend 120 stores a deep learning framework/model 121 for performing various deep learning tasks, such as image processing, natural language processing, or other fields (such as scientific computing or physical modeling, etc.)
  • the task of parallel training is not limited in this application.
  • the backend 120 can convert the stored deep learning model into calculation graph data in a specific format for the frontend 110 to read.
  • the solution in this application can be used to visualize the calculation graph, that is, to obtain the corresponding binary data of the deep learning model. department diagram.
  • the device layer 130 includes a processor 131 .
  • Processor 131 can be a plurality of graphics processing units GPU and/or central processing unit (Central Processing Unit, CPU), is used for carrying out parallel training to deep learning frame/model 121, and the execution after training.
  • CPU Central Processing Unit
  • FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application. It should be understood that the bipartite graph construction method in the embodiment of the present application can be applied to fields including artificial intelligence (such as image processing or natural language processing, etc.), scientific computing, etc., which require the use of deep learning models for data processing, and require deep learning Scenarios for model parallel training.
  • artificial intelligence such as image processing or natural language processing, etc.
  • scientific computing etc.
  • the user builds a corresponding deep learning model 220 based on a specific deep learning task 210, where the deep learning task 210 can be image processing tasks such as image recognition, target detection, and image segmentation, or natural language processing tasks such as speech semantic recognition, etc.
  • the deep learning model 220 may be a convolutional neural network model (Convolutional Neural Network, CNN), a deep belief network model (Deep Belief Network, DBN) or a stacked auto-encoder network model (Stacked Auto-encoder Network) wait.
  • CNN convolutional Neural Network
  • DBN deep belief network model
  • Stacked Auto-encoder Network stacked Auto-encoder Network
  • model structure visualization 240 that is, construct a bipartite graph corresponding to the calculation graph data, and display it in the graphical user interface (Graphic Users Interface, GUI) for display.
  • GUI Graphic Users Interface
  • the user can quickly locate the position of the communication node and the corresponding function, and then adjust the parallel strategy 250 based on it.
  • the communication ratio is the smallest, that is, the degree of plantar ah reduces the communication time during training.
  • the model deployment 260 is performed, that is, the model is deployed to various feasible computer devices, such as mobile phones, computers, servers, etc., which is not limited in this application.
  • FIG. 3 is a schematic flowchart of a method for constructing a bipartite graph in an embodiment of the present application. As shown in Fig. 3, the method 300 includes step S310 and step S320.
  • Step S310 Search for at least one cross-communication edge corresponding to the first communication node from the calculation graph, wherein the first communication node is one of the M communication nodes included in the calculation graph, and the first communication node Corresponding to P predecessor nodes and Q successor nodes, each cross-communication edge in the at least one cross-communication edge indicates a connection between a predecessor node in the P predecessor nodes and a successor node in the Q successor nodes A communication path, and each cross-communication edge does not pass through the M communication nodes; M, P and Q are positive integers.
  • the above-mentioned calculation graph may be an expanded calculation graph, that is, the nodes in the calculation graph are computing nodes or communication nodes, and cannot be further expanded.
  • the above calculation graph may be a directed graph, that is, the edge (or data flow) between any two directly connected nodes in the calculation graph is directional, indicating the data flow direction between the two nodes.
  • the precursor nodes corresponding to each communication node may be all non-communication nodes that the data flow passes through before flowing into the communication node.
  • the successor nodes corresponding to each communication node may be all non-communication nodes that the data flow passes through after flowing out of the communication node.
  • the predecessor nodes corresponding to the above-mentioned first communication node are all computing nodes before the logical sequence of the first communication node in the calculation graph (that is, the data flow passes before flowing into the first communication node), and do not include the first communication node itself , a total of P.
  • the successor nodes corresponding to the first communication node are all calculation nodes after the logical sequence of the first communication node in the calculation graph (that is, the data flow passes after the first communication node), and do not include the first communication node itself, a total of Q .
  • searching for at least one cross-communication edge corresponding to the first communication node from the calculation graph above is specifically: searching for each of the P precursor nodes to the 1st, 2nd, 3rd..., Qth successor nodes
  • Each communication path searched is a cross-communication edge, and a total of N cross-communication edges are searched out, and N is a positive integer.
  • the number of cross-communication edges between any one of the P predecessor nodes and any one of the Q successor nodes is an integer greater than or equal to zero.
  • search process across communication edges corresponding to any communication node in the computation graph is the same as the search process across communication edges corresponding to the first communication node above, and will not be repeated here.
  • FIG. 4 is a schematic structural diagram of a calculation graph provided in an embodiment of the present application.
  • the computation graph 400 includes communication nodes and computing nodes. Communication nodes are denoted by T, including T1, T2, and T3; computing nodes are denoted by J, including J1, J2, . . . , J10.
  • communication node T1 For communication node T1, it corresponds to one predecessor node: J1; and corresponds to four successor nodes: J5, J6, J9 and J10. Therefore, there are 6 cross-communication edges corresponding to communication node T1, namely: J1-J2-J5, J1-J2-J5-J6, J1-J2-J5-J6-J9, J1-J2-J5-J6-J9 -J10, J1-J2-J4-J8-J10, J1-J2-J3-J7-J10.
  • communication node T2 For communication node T2, it corresponds to three predecessor nodes: J1, J2 and J3; and corresponds to two successor nodes: J7 and J10. Therefore, there are 10 cross-communication edges corresponding to communication node T2, which are: J3-J7, J3-J7-J10, J2-J3-J7, J2-J3-J7-J10, J2-J4-J8-J10, J2 -J5-J6-J9-J10, J1-J2-J3-J7, J1-J2-J5-J6-J9-J10, J1-J2-J4-J8-J10, J1-J2-J3-J7-J10.
  • the three communication nodes in the calculation graph 400 correspond to 17 cross-communication edges in total.
  • each cross-communication edge includes at least one sub-edge, and each sub-edge in the at least one sub-edge directly connects two computing nodes; each sub-edge in the at least one sub-edge Corresponding to a weight coefficient, the weight coefficient corresponding to each sub-edge is determined by the types of the two computing nodes directly connected to each sub-edge.
  • a sub-edge is an edge between two directly connected computing nodes, and each cross-communication edge includes at least one sub-edge.
  • the type of calculation operator is determined by the specific function of the calculation operator.
  • Log Operator its function is to perform logarithmic operations.
  • its type may be Log, that is, the type of each calculation operator is represented by its identifier.
  • the user may determine the weight coefficient of the sub-edge connecting the two computing nodes according to the type of the two computing nodes directly connected by each sub-edge, which is not limited in this application.
  • the calculation graph contains Reshape nodes, Tile nodes, and Mul nodes that are directly connected in sequence
  • the Reshape nodes and Tile nodes are nodes for tensor operations, their logic is similar.
  • the Mul node is a node for mathematical operations. Therefore, when cutting sub-edges, users prefer to retain the sub-edges connected between the Reshape node and the Tile node. At this time, a larger weight coefficient may be assigned to the sub-edge between the Reshape node and the Tile node, and a smaller weight coefficient may be assigned to the sub-edge between the Tile node and the Mul node.
  • the weight coefficient may be used to characterize the importance of the corresponding sub-edge. For example, the larger the weight coefficient, the higher the importance of the sub-edge; or the larger the weight coefficient, the lower the importance of the sub-edge.
  • the values of weight coefficients corresponding to all sub-edges may be between 0 and 1. It should be understood that the value interval of the weight coefficient corresponding to the sub-edge may also be other value ranges, which is not limited in this application.
  • the determination method of the weight coefficient of the sub-edge included in the cross-communication edge corresponding to each communication node in the above calculation diagram is the same as the determination method of the weight coefficient of the sub-edge included in the cross-communication edge corresponding to the first communication node. I won't repeat them here.
  • Step S320 Cut the cross-communication edges corresponding to the M communication nodes, and perform an aggregation operation to obtain the bipartite graph, and any two communication nodes among the M communication nodes are in the bipartite graph Infinity is directly connected.
  • the cross-communication edges corresponding to the M communication nodes in the computation graph are cut to obtain the cut computation graph.
  • the cut calculation graph includes K connected blocks and M communication nodes, and K is a positive integer.
  • all the nodes in the bipartite graph can be divided into two sets, any two nodes in any set of the two sets are directly connected without boundaries, and one set of the two sets is composed of the above M communication nodes gather.
  • the M communication nodes correspond to N cross-communication edges in total, and N is a positive integer;
  • the cutting the cross-communication edges corresponding to the M communication nodes includes: cutting the N cross-communication edges A sub-edge in each of the cross-communication edges; wherein, when the E cross-communication edges of the N cross-communication edges include a common sub-edge, the E cross-communication edges are cut The sum of the weight coefficients corresponding to all the sub-edges of is the largest or smallest, and E is a positive integer less than or equal to N; when the i-th cross-communication edge of the N cross-communication edges does not contain a common For the sub-edge, cut the sub-edge with the smallest weight coefficient or the largest weight coefficient among the sub-edges included in the i-th cross-communication edge, and i is a positive integer.
  • any one of the following two ways may be used for cutting.
  • the cutting method is the same as that of the above-mentioned E cross-communication edges;
  • the cutting method of the cross-communication edge is the same as that of the i-th cross-communication edge.
  • the above N cross-communication edges are grouped by using L kinds of grouping methods, and each grouping method corresponds to a weight coefficient and a cutting method of the cross-communication edges.
  • each grouping method corresponds to a weight coefficient and a cutting method of the cross-communication edges.
  • at least one group of cross-communication edges is obtained.
  • Each of the at least one group of cross-communication edges includes at least one common sub-edge, and each group includes at least one cross-communication edge.
  • the difference between the L types of grouping methods lies in that the common sub-edges based on different grouping methods are different.
  • the a-th grouping method divides N cross-communication edges into group A.
  • the a-th grouping method divides N cross-communication edges into group A.
  • cut a sub-edge for each cross-communication edge in the group so that the sum of the weight coefficients corresponding to all the cut sub-edges in the group is maximum or minimum, and the obtained
  • the sum of the weight coefficients is used as the weight coefficient of this group of cross-communication edges.
  • the A weight coefficients corresponding to the cross-communication edges of the group A can be calculated, and then the A weight coefficients are added to obtain the weight coefficient corresponding to the a-th division method.
  • L weight coefficients corresponding to the L grouping methods can be obtained, and the cross-communication edge cutting method corresponding to the largest or smallest weight coefficient among the L weight coefficients can be used as the cutting method of the calculation graph, that is, the global optimal cutting method .
  • the cross-communication edges corresponding to each of the M communication nodes are sequentially cut, and the sequence of the communication nodes is not limited when cutting.
  • FIG. 5( a )- FIG. 5( b ) are schematic diagrams of a cross-communication edge cutting method provided by an embodiment of the present application.
  • Figures 5(a) and 5(b) respectively enumerate two cross-communication edge cutting methods corresponding to two different grouping methods when performing global optimal cutting. It should be understood that when searching for the global optimal cutting method in the example calculation graph, other grouping methods corresponding to the cutting methods will also be searched, which will not be listed here.
  • the 17 cross-communication edges in the computation graph are divided into 3 groups.
  • the first group contains common sub-edges J3-J7, including cross-communication edges: J3-J7, J2-J3-J7, J2-J3-J7-J10, J3-J7-J10, J1-J2-J3-J7 -J10.
  • the second group contains common sub-edges J4-J8, including cross-communication edges: J4-J8, J2-J4-J8, J1-J2-J4-J8, J4-J8-J10, J2-J4-J8-J10 , J1-J2-J4-J8-J10.
  • the third group contains common sub-edges J2-J5, including cross-communication edges: J1-J2-J5, J1-J2-J5-J6, J1-J2-J5-J6-J9, J1-J2-J5-J6 -J9-J10, J2-J5-J6-J9-J10, J1-J2-J5-J6-J9-J10.
  • the above three groups of cross-communication edges are cut separately, so that the sum of weight coefficients corresponding to all the cut sub-edges in each group of cross-communication edges is the smallest.
  • the first group of sub-edges cut across communication edges is J3-J7
  • the sum of the weight coefficients of all sub-edges cut across communication edges in the first group can be minimized.
  • the corresponding weight coefficient is 0.3; when the second group of sub-edges cut across the communication edge is J4-J8, the sum of the weight coefficients of all the sub-edges cut across the communication edge in the second group can be minimized.
  • the corresponding weight coefficient is 0.2; when the third group of sub-edges cut across the communication edge is J2-J5, the sum of the weight coefficients of all the sub-edges of the third group of cross-communication edges can be minimized, The weight coefficient corresponding to the third group after cutting across communication edges is 0.2. To sum up, in the cross-communication edge cutting method corresponding to the grouping method shown in Fig. 5(a), the sum of the weight coefficients of all cut sub-edges is 0.7.
  • the 17 cross-communication edges in the computation graph are divided into 5 groups.
  • the first group includes common sub-edges J3-J7, including cross-communication edges: J3-J7, J3-J7-J10.
  • the second group contains common sub-edges J4-J8, including cross-communication edges: J4-J8, J2-J4-J8, J1-J2-J4-J8, J4-J8-J10, J2-J4-J8-J10 , J1-J2-J4-J8-J10.
  • the third group contains common sub-edges J1-J2, including cross-communication edges: J1-J2-J5, J1-J2-J5-J6, J1-J2-J5-J6-J9, J1-J2-J5-J6 -J9-J10.
  • the fourth group includes common sub-edges J5-J6, including cross-communication edges: J2-J5-J6-J9-J10, J1-J2-J5-J6-J9-J10.
  • the fifth group includes common sub-edges J2-J3, including cross-communication edges: J2-J3-J7, J2-J3-J7-J10, and J1-J2-J3-J7-J10.
  • the first group of sub-edges cut across communication edges is J3-J7
  • the sum of the weight coefficients of all sub-edges cut across communication edges in the first group can be minimized.
  • the corresponding weight coefficient is 0.3; when the second group of sub-edges cut across the communication edge is J4-J8, the sum of the weight coefficients of all the sub-edges cut across the communication edge in the second group can be minimized.
  • the corresponding weight coefficient is 0.2; when the third group of sub-edges cut across the communication edge is J1-J2, the sum of the weight coefficients of all the sub-edges of the third group of cross-communication edges can be minimized, The corresponding weight coefficient after the third group of cross-communication edges is cut is 0.8; when the fourth group of sub-edges cut across communication edges is J5-J6, the weight coefficients of all the sub-edges of the fourth group of cross-communication edges can be made The sum is the smallest, and the corresponding weight coefficient of the fourth group of cross-communication edges is 0.4; when the fifth group of sub-edges cut across communication edges is J2-J3, all the sub-edges of the fifth group of cross-communication edges can be cut The sum of the weight coefficients of the edges is the smallest, and the weight coefficient corresponding to the fifth group of cross-communication edges is 0.5.
  • the sum of the weight coefficients of all cut sub-edges is 2.2. Since the sum of weight coefficients of all cut sub-edges in Figure 5(a) is smaller than the sum of weight coefficients of all cut sub-edges in Figure 5(b), the cutting method across communication edges in Figure 5(a) is better .
  • Figure 5(a) and Figure 5(b) only enumerate two ways of cutting across communication edges in pursuit of the global optimum, it can be seen that the number of cutting sub-edges in Figure 5(a) The least, and the sum of the weight coefficients corresponding to the cutting sub-edges is also the smallest.
  • the cross-communication edge cutting method in Figure 5(a) can be used as the global optimal cutting method of the calculation graph.
  • FIG. 6( a )- FIG. 6( c ) are schematic diagrams of another cutting method across communication edges provided by the embodiment of the present application.
  • Figure 6(a) and Figure 6(c) enumerate the process of sequentially cutting the cross-communication edges corresponding to each communication node in the order from communication node T1-communication node T2-communication node T3.
  • the cutting order in Figure 6(a)- Figure 6(c) is just a specific example, and those skilled in the art can also use other orders to sequentially cut the cross-communication edges corresponding to each communication node. This is not limited.
  • Figure 6(a) shows the process of cutting the cross-communication edge corresponding to the communication node T1.
  • the six cross-communication edges are cut so that the sum of the weight coefficients corresponding to all the cut sub-edges is the smallest. It can be seen that the six cross-communication edges contain a common sub-edge J1-J2, so when the sub-edge J1-J2 is cut, the sum of the weight coefficients is the smallest, which is 0.8.
  • Fig. 6(b) shows the process of cutting the cross-communication edge corresponding to the communication node T2 after the cross-communication edge corresponding to the communication node T1 is cut.
  • There are 6 cross-communication edges corresponding to communication node T2 at this time including: J3-J7, J3-J7-J10, J2-J3-J7, J2-J3-J7-J10, J2-J4-J8-J10, J2- J5-J6-J9-J10.
  • the optimal cutting method for the six cross-communication edges is searched, so that the sum of the weight coefficients of all cut sub-edges in the six cross-communication edges is the smallest.
  • the first group contains four cross-communication edges: J3-J7, J3-J7-J10, J2-J3-J7, J2-J3-J7-J10, and Contains common sub-edges J3-J7.
  • the second group is cross-communication edges J2-J4-J8-J10.
  • the third group is cross-communication edges J2-J5-J6-J9-J10.
  • the sum of the weight coefficients of all sub-edges cut across communication edges in the second group can be minimized, and the corresponding weight of the second group of cross-communication edges after cutting The coefficient is 0.2.
  • the third group of sub-edges cut across communication edges is J2-J5
  • the sum of the weight coefficients of all sub-edges cut across communication edges in the third group can be minimized, and the corresponding weight of the third group of cross-communication edges after cutting The coefficient is 0.2.
  • the sum of weight coefficients corresponding to all cut sub-edges is 0.7.
  • Figure 6(c) shows the calculation graph obtained after cutting the cross-communication edge corresponding to the communication node T2. It can be seen that the cross-communication edges corresponding to the communication node T3 have all been cut in the foregoing process, therefore, Fig. 6(c) is the calculation graph obtained after cutting using the local optimal cutting method. When the local optimal cutting method is used for cutting, the sum of the weights corresponding to all cut sub-edges is 1.5.
  • the calculation nodes in the cut calculation graph are separated by barriers formed by M communication nodes, forming K connected blocks, and each connected block in the K connected blocks includes at least one calculation node.
  • FIG. 7 is a schematic structural diagram of a bipartite graph provided in an embodiment of the present application.
  • the bipartite graph constructed based on the calculation graph contains K first-level aggregation nodes and M communication nodes. All nodes in the bipartite graph can be divided into two sets, one set includes all communication nodes, the other set includes all first-level aggregation nodes, and any two nodes in any set in the other set Infinity is directly connected.
  • any two first-level aggregation nodes among the K first-level aggregation nodes are directly connected without boundaries, and any two communication nodes among the M communication nodes are directly connected without boundaries.
  • O first-level aggregation nodes communicate with K-O first-level aggregation nodes through communication nodes.
  • Aggregating the K connected blocks in the cut calculation graph above to obtain the bipartite graph specifically includes: based on the hierarchical structure of the namespace to which the calculation node belongs in each connected block, for each connected Blocks are aggregated to obtain a first-level aggregation node corresponding to each connected block, and a total of K first-level aggregation nodes are obtained.
  • corresponding K namespaces are respectively constructed for the K first-level aggregation nodes, that is, the K first-level aggregation nodes belong to K namespaces respectively.
  • the following takes the first connected block among the K connected blocks as an example to describe the process of aggregating the first connected block based on the hierarchical structure of the namespace to which the computing nodes belong.
  • the nodes in the first connected block belong to the namespace with hierarchical structure of group Z, where Z is a positive integer.
  • FIG. 8 is a schematic diagram of a hierarchical structure of a namespace in an embodiment of the present application.
  • the e-th namespace has n namespaces, which are: X1, X2, . . . , Xn.
  • the n namespaces are a hierarchical structure that can be expanded level by level: namespace X1 includes namespace X2, . . . , and namespace Xn-1 includes namespace Xn.
  • computing node J11 belongs to namespace X1; computing nodes J12 and J13 belong to namespace X2; computing nodes J14, . . . , Jd belong to namespace Xn.
  • Fig. 9 is a schematic diagram of an aggregation process of computing nodes according to the hierarchical structure of the namespace provided by the embodiment of the present application.
  • the nodes in the layer 3 namespace are aggregated to obtain the aggregation node G25; the aggregation node G25 is aggregated with the computing nodes belonging to the layer 2 namespace to obtain Aggregation node G26 is to obtain aggregation node G26 and computing node J11 after aggregating computing nodes included in the e-th group namespace. It should be noted that during the node aggregation process, corresponding namespaces are also created for the aggregation nodes.
  • the namespace created for the aggregation node G26 is Xg26
  • the namespace created for the aggregation node G25 is Xg25.
  • the namespace Xg26 and the namespace X1 are both the first-level namespaces
  • the namespace Xg25 and the namespace X2 are both the second-level namespaces.
  • the computing nodes contained in each group of namespaces are aggregated to obtain the Z group of aggregation results, and the nodes contained in the Z group of aggregation results belong to the same layer; finally, the Z Group aggregation results are aggregated to obtain a first-level aggregation node corresponding to the first connected block, and at the same time, a corresponding namespace is created for the first-level aggregation node.
  • each connected block in the K connected blocks can be aggregated by referring to the aggregation manner of the first connected block to obtain K first-level aggregation nodes.
  • each first-level aggregation node among the K first-level aggregation nodes is a hierarchical structure, wherein the node in the jth layer in the hierarchical structure is formed by It is obtained by expanding the aggregation nodes in the j-1 layer, the first layer in the hierarchical structure is the first-level aggregation node, and the nodes in the jth layer belong to different namespaces; the jth layer
  • the nodes include aggregation nodes and/or computing nodes, and the computing nodes are non-expandable nodes.
  • each first-level aggregation node has a hierarchical structure, and the nodes in each layer in the hierarchical structure are obtained by expanding the aggregation nodes in the previous layer.
  • the first-level aggregation A node is the first level in its hierarchy, that is, the uppermost level in the hierarchy.
  • expansion is the inverse process of aggregation.
  • Aggregation refers to representing a graph structure represented by at least one node and an edge between the at least one node with one node.
  • Expanding refers to representing a node through a graph structure composed of at least one node and edges between the at least one node.
  • the nodes in each layer of the hierarchical structure belong to different namespaces, and the different namespaces belong to the same layer in the hierarchical structure of the namespace.
  • Nodes in each layer can include only aggregation nodes, or only computing nodes, or both aggregation nodes and computing nodes.
  • the aggregation node is a node that can be expanded
  • the calculation node is the smallest unit that cannot be expanded in the calculation graph.
  • the method further includes: when the sub-edge between the first computing node and the second computing node in the first namespace is cut, updating the first namespace, the first The namespace is a namespace in the computation graph; a namespace including a first computing node is constructed, and the first computing node does not belong to the updated first namespace.
  • the first namespace is any namespace in the computation graph.
  • the first computing node and the second computing node are any two computing nodes in the first namespace.
  • the sub-edge between the first computing node and the second computing node is cut.
  • the first computing node and the second computing node belong to different connected blocks.
  • the first computing node does not belong to the first namespace, and the second computing node still belongs to the first namespace.
  • the attributes corresponding to the first namespace may be updated, including the identifier of the first namespace and the number of included computing nodes.
  • a corresponding namespace may be constructed for the first computing node, and the hierarchical structure of the namespace constructed for the first computing node is the same as the hierarchical structure of the namespace to which the second computing node belongs. For example, if the second computing node belongs to the third namespace in the three-tier namespace, a three-tier namespace can be constructed for the first computing node at this time, and the first computing node belongs to the third namespace in the three-tier namespace.
  • a layer namespace In this case, the identifiers of the namespaces to which the first computing node and the second computing node belong are different.
  • the identifier of the namespace may be characterized by letters, numbers or a combination of letters and numbers, or other characters, which is not limited in the present application.
  • FIG. 10(a)-FIG. 10(d) is an example diagram of a connected block aggregation process provided by the embodiment of the present application.
  • FIG. 10(a) is the calculation graph obtained after cross-communication edge cutting.
  • Figure 10(a) is the result obtained after cutting based on the global optimal cutting method in Figure 5(a).
  • the cut calculation graph includes two connected blocks (V1 and V2) and three compute nodes (T1, T2 and T3).
  • Figure 10(b) shows the hierarchical structure of namespaces in the computation graph and the namespaces each node belongs to.
  • the hierarchical structure of the namespace corresponding to the calculation graph is two layers.
  • the first-level namespace includes: namespace D1, namespace H1, and namespace R1.
  • Namespace D1 has no sub-namespace
  • namespace H1 contains namespace H2
  • namespace R1 contains namespace R2, that is, namespace H2 and namespace R2 are sub-namespaces of namespace S1 and namespace R1, respectively.
  • computing node J1 and communication node T1 belong to namespace D1; communication node T2, communication node T3 and computing node J2 belong to namespace H1; computing node J3, computing node J4 and computing node J5 belong to namespace H2; computing node J6 and Computing node J7 belongs to namespace R1; computing node J8, computing node J9, and computing node J10 belong to namespace R2.
  • Figure 10(c) shows the aggregation process of connected blocks V1 and V2 in Figure 10(a).
  • the dotted box dendrogram represents the hierarchical structure of the namespace
  • the realization box dendrogram represents the hierarchical structure of the first-level aggregation node.
  • the connected block V1 corresponds to two sets of hierarchical namespaces, namely: D1, H1-H2. Aggregation is performed according to the hierarchical structure of the namespace: first, the computing node J3 and computing node J4 are aggregated into the aggregation node G2, and the aggregation node G2 belongs to the namespace H1.
  • the identifier of the namespace H1 can be updated to H1_1 ; Finally, aggregate the three nodes (computing node J1, computing node J2, and aggregation node G2) belonging to the first-level namespace to obtain the first-level aggregation node G1 corresponding to the connected block V1, and construct a corresponding namespace for it U.
  • a new hierarchical namespace can be constructed based on the corresponding namespace of the computing node J5 before cutting. Namely, there are two layers of namespaces H1_2-H2_2, and computing node J5 belongs to namespace H2_2.
  • the connected block V2 corresponds to two sets of hierarchical namespaces, namely: H1_2-H2_2, R1-R2.
  • Aggregation is performed according to the hierarchical structure of the namespace: firstly, the computing node J8, computing node J9, and computing node J10 are aggregated into the aggregation node G4, and the aggregation node G4, computing node J6, and computing node J7 belong to the namespace R1.
  • the identifier of space R1 is updated to R1_1; at the same time, aggregation node J5 is aggregated to obtain aggregation node G5, which belongs to namespace H1_2; finally, the four nodes (aggregation node G5, aggregation node G4 and Computing nodes J6 and computing nodes J7) perform aggregation to obtain the first-level aggregation node G6 corresponding to the connected block V2, and construct a corresponding namespace S for it.
  • Figure 10(d) shows the bipartite graph obtained after connected block aggregation.
  • the first-level aggregation node G1 is obtained after the connected block V1 is aggregated
  • the first-level aggregation node G6 is obtained after the connected block V2 is aggregated.
  • G1 and G6 communicate through three communication nodes (T1, T2, and T3). . Any two of the three communication nodes are directly connected without boundaries, and two first-level aggregation nodes are also directly connected without boundaries.
  • the method further includes: calculating the hash value of the aggregation node in the bipartite graph, and calculating the hash value of the node; wherein, when the node is the aggregation node, the The hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node.
  • the hash value of the node is determined by the attribute of the computing node.
  • the attributes of the calculation node include the type of the calculation node, in-degree, out-degree, type of affiliated nodes, and number of affiliated nodes.
  • the above calculation of the hash value of the aggregation node in the hierarchical structure of the first-level aggregation node and the calculation of the hash value of the node include: starting from the bottom layer of the hierarchical structure, going up layer by layer, and sequentially calculating the hash values corresponding to all nodes in each layer Hash value until the hash value of the first-level aggregation node is calculated at the end.
  • its hash value is equal to the sum of the hash values of all nodes obtained after one expansion; for a computing node, its hash value is determined by the attributes of the computing node.
  • the attributes of the computing node include computing node type, in-degree, out-degree, type of affiliated node, number of affiliated nodes, etc., which are not limited in this application.
  • the type of the computing node is represented by the identifier of the node, for example, the Add node is represented by add, and the Reduce node is represented by reduce; the in-degree of the computing node is the quantity directly flowing into the computing node; the out-degree is The number of nodes that directly flow in after flowing out from the calculation node; the subsidiary nodes of the calculation node are nodes whose data is only input to the calculation node, and the subsidiary nodes have no data input, and the type of the subsidiary nodes can be constant or variable, which can be respectively It is represented by character strings Const and Para, which is not limited in this application.
  • the character strings representing the attributes of each computing node can be spliced to obtain a character string; The one string is mapped to the corresponding hash value.
  • the method further includes: stacking and displaying multiple nodes in the bipartite graph, wherein the multiple nodes are obtained by expanding the same aggregation node once, so The hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • stack display refers to displaying a plurality of nodes satisfying the above conditions in a bipartite graph using a stack structure composed of connection relationship identifiers and digital identifiers; wherein, the connection relationship identifiers are used to represent the connections between the multiple nodes A relationship, such as a parallel connection or a serial connection; the numerical designation indicates the number of multiple nodes satisfying the above conditions.
  • the above conditions refer to a plurality of nodes that are obtained by expanding the same aggregation node once, have the same hash value, and have a connection relationship of serial connection or parallel connection.
  • each first-level aggregation node is detected layer by layer in the order from the upper layer to the lower layer (that is, starting from the first layer of the hierarchical structure).
  • the first aggregation node is expanded once, the number of nodes obtained is greater than or When it is equal to the preset number, an isomorphism detection is performed on the nodes obtained after expansion, specifically: detecting whether there are nodes with the same hash value among the nodes obtained after expansion.
  • the first aggregation node is an aggregation node in any layer in the hierarchical structure of the first-level aggregation nodes.
  • the multiple nodes When it is detected that there are multiple nodes with the same hash value among the expanded nodes, and the connection relationship between the multiple nodes is a parallel connection or a serial connection, the multiple nodes are stacked and displayed. Among them, multiple nodes with the same hash value are regarded as nodes with the same internal structure.
  • the serial connection means that the multiple nodes are connected sequentially, and there is only one communication path between the node at which the data stream starts and the node at which the data stream terminates.
  • Parallel connection means that the input data streams of all nodes in multiple nodes flow out from the same node without passing through any nodes; and the output data streams of all nodes in the multiple nodes flow into the same node without passing through any nodes in the middle.
  • the user can operate the same aggregation node by double-clicking or clicking to expand the same aggregation node. After the same aggregation node is expanded, a plurality of stacked and displayed nodes can be obtained.
  • stacking and displaying when the user visualizes the bipartite graph, the hierarchical structure of the aggregation node can be simplified, saving user interface space, and helping the user to understand the internal structure of the aggregation node more quickly.
  • FIG. 11 is a schematic structural diagram of serial connection and parallel connection of nodes provided by an embodiment of the present application.
  • nodes 1-5 are five nodes with the same hash value, that is, isomorphic nodes.
  • serial structure detection traverse the edges connected by the five nodes, and detect that there is only one communication path between node 1 and node 3, that is, node 1, node 2 and node 3 are three serial nodes, which can be Show it stacked.
  • node 4 and node 5 are used as starting points to perform forward search and backward search respectively.
  • nodes 4 and 5 converge at the aggregation node Fhub, and during the subsequent search, it is found that nodes 4 and 5 converge at the aggregation node Bhub, then nodes 4 and 5 are connected in parallel. It is also possible to stack and display Node 4 and Node 5.
  • FIG. 12(a)-FIG. 12(b) is a schematic diagram of a stacked aggregation node expansion process provided by the embodiment of the present application.
  • Fig. 12(a) is a schematic diagram of the visual unfolding process of aggregation nodes stacked in a serial structure.
  • node J20, stacking structure W1 and node J21 can be the result of the second aggregation node being expanded once, and the second aggregation node can be any layer of any first-level aggregation node node. Since there are stackable substructures (ie, nodes with the same hash value) in the nodes obtained after the expansion of the second aggregation node, the stacking structure W1 can be used for display.
  • the mark 1 in the stack structure W1 is a connection relation mark, indicating that the stacked nodes in the stack structure W1 are in a serial structure, and the number n1 in the stack structure W1 represents the number of stacked isomorphic nodes.
  • the user can further expand the stacking structure W1 by clicking or double-clicking to obtain the stacking structure W2.
  • the stacked structure W2 can reveal the internal structure of a single isomorphic node.
  • the user can further expand the stack structure W2 to obtain a fully expanded schematic diagram of the stack structure W2, which shows the actual connection relationship of n1 serially connected isomorphic nodes.
  • the serially connected isomorphic nodes can be displayed with the same color.
  • Fig. 12(b) is a schematic diagram of the visual unfolding process of aggregation nodes stacked in parallel structure.
  • node J22, stacking structure W3, and node J23 can be the result of an expansion of the third aggregation node, and the third aggregation node can be any layer of any first-level aggregation node. node. Since there are stackable substructures (ie, nodes with the same hash value) in the nodes obtained after the expansion of the third aggregation node, the stacking structure W3 can be used for display.
  • the mark 2 in the stack structure W3 is a connection relation mark, indicating that the stacked nodes in the stack structure W3 are in a parallel structure, and the number n2 in the stack structure W3 represents the number of stacked isomorphic nodes.
  • the user can further expand the stacking structure W3 by clicking or double-clicking to obtain the stacking structure W4.
  • the stacked structure W4 can reveal the internal structure of a single isomorphic node.
  • the user can further expand the stack structure W4 to obtain a fully expanded schematic diagram of the stack structure W4, which shows the actual connection relationship of n2 parallel-connected isomorphic nodes.
  • the parallel connected homogeneous nodes can be displayed with the same color.
  • connection relationship identifiers may also be used to represent the stacking of serial connections and parallel connections, which is not limited in the present application.
  • FIG. 13(a)-FIG. 13(b) is a timeline example of a model training process provided by the embodiment of the present application.
  • users can construct bipartite graphs of various deep learning tasks based on this scheme.
  • the bipartite graph can clearly present the structure of the model, so that users can quickly locate the location and function of communication nodes based on the bipartite graph. Then formulate the fusion/segmentation strategy of communication nodes to minimize the communication time in the training process.
  • the user can view the timeline Timeline corresponding to the model training, observe the overlap between communication time and computing time, find communication nodes that do not overlap between computing and communication, and then build a model based on this scheme Quickly locate communication nodes in the bipartite graph, and analyze the role of communication nodes according to the specific map structure, so as to formulate a reasonable communication node fusion/segmentation strategy, so that the duration of the model training process corresponding to different deep learning tasks.
  • the first layer is the calculation duration corresponding to the computing node
  • the second layer is the communication duration corresponding to the communication node.
  • the fused communication nodes are divided, the first 55 communication nodes are fused into one, the 55th-108th communication nodes are fused into one, and the 109th-162th communication nodes are fused into one, and the process is repeated.
  • Training the obtained training process timeline is shown in Figure 13(b). It can be seen that after the communication nodes are divided, there is an overlap between the calculation duration and the communication duration. Moreover, compared with the time t1 required for gradient calculation and fusion before the communication node is split, the time t2 for gradient calculation and fusion is significantly shortened after the communication node is divided.
  • FIG. 14 is a schematic flowchart of a method for displaying a bipartite graph in an embodiment of the present application. As shown in FIG. 14, the method 1400 includes step S1410.
  • Step S1410 Input a computation graph, and output the bipartite graph based on the computation graph.
  • the calculation graph includes M communication nodes, the first communication node among the M communication nodes corresponds to P predecessor nodes and Q successor nodes, the first communication node corresponds to at least one cross-communication edge, and the Each cross-communication edge in at least one cross-communication edge indicates a communication path between a predecessor node in the P predecessor nodes and a successor node in the Q successor nodes, and each cross-communication edge does not
  • the M communication nodes, P, Q, and M are positive integers, and the cross-communication edges corresponding to the M communication nodes are not connected in the bipartite graph, any two of the M communication nodes Corresponding nodes are directly connected without edges in the bipartite graph.
  • the calculation graph includes C computing nodes, and the bipartite graph includes K first-level aggregation nodes; wherein, the K first-level aggregation nodes are composed of the C computing nodes obtained by performing aggregation, each of the K first-level aggregation nodes is a hierarchical structure, wherein the nodes in the jth layer in the hierarchical structure are formed by the j-1th layer in the hierarchical structure obtained by expanding the aggregation nodes in , the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and C, K, and j are positive integers, so
  • the nodes in the jth layer include the aggregation node and/or the computing node, and the computing node is a non-expandable node.
  • the bipartite graph includes a stacking structure; wherein, the stacking structure includes a connection relationship identifier and a quantity identifier, the connection relationship identifier represents a connection relationship between multiple nodes, and the quantity The identifier represents the number of the multiple nodes, the multiple nodes are obtained by expanding the same aggregation node once, the hash values of the multiple nodes are the same, and the connection relationship between the multiple nodes for serial connection or parallel connection.
  • the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node, and when the node is the aggregation node
  • the hash value of the node is determined by the attributes of the computing node, and the attributes of the computing node include the type of computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes .
  • the specific process of obtaining the bipartite graph based on the calculation graph in the above method embodiment 1400 is the same as the corresponding process in the above method embodiment 300, and will not be repeated here.
  • FIG. 15 is a schematic structural diagram of a bipartite graph construction device provided by an embodiment of the present application.
  • a bipartite graph construction device 1500 includes a search unit 1501 and a cutting unit 1502 .
  • the search unit 1501 is configured to search for at least one cross-communication edge corresponding to a first communication node from the calculation graph, wherein the first communication node is one of the M communication nodes included in the calculation graph, and the first A communication node corresponds to P precursor nodes and Q successor nodes, and each cross-communication edge in the at least one cross-communication edge indicates a predecessor node in the P precursor nodes and a successor node in the Q successor nodes
  • M, P and Q are positive integers.
  • the cutting unit 1502 is configured to cut the cross-communication edges corresponding to the M communication nodes, and perform an aggregation operation to obtain the bipartite graph, and any two communication nodes among the M communication nodes are in the bipartite There are no edges directly connected in the graph.
  • each cross-communication edge includes at least one sub-edge, and each sub-edge in the at least one sub-edge directly connects two computing nodes; each sub-edge in the at least one sub-edge An edge corresponds to a weight coefficient, and the weight coefficient corresponding to each sub-edge is determined by the types of two computing nodes directly connected to each sub-edge.
  • the M communication nodes correspond to N cross-communication edges, and N is a positive integer; in terms of cutting the cross-communication edges corresponding to the M communication nodes, the cutting Unit 1502 is specifically used to: cut a sub-edge in each of the N cross-communication edges; wherein, when E of the N cross-communication edges contains a common sub-edge , the sum of weight coefficients corresponding to all the cut sub-edges in the E cross-communication edges is the largest or the smallest, and E is a positive integer less than or equal to N; when the i-th cross- When the communication edge and other cross-communication edges do not contain common sub-edges, cut the sub-edge with the smallest weight coefficient or the largest weight coefficient among the sub-edges included in the i-th cross-communication edge, and i is a positive integer.
  • the calculation graph after cutting includes K connected blocks, and K is a positive integer; in terms of performing the aggregation operation to obtain the bipartite graph, the cutting unit 1502 specifically uses In: respectively aggregating the K connected blocks in the cut calculation graph to obtain the bipartite graph; wherein, the K connected blocks are based on the M communication nodes in the calculation graph The positions in the calculation graph are obtained by dividing the calculation nodes in the calculation graph, the bipartite graph includes K first-level aggregation nodes and the M communication nodes, and any two of the K first-level aggregation nodes are The first-level aggregation nodes are boundless and directly connected, and the K first-level aggregation nodes respectively belong to K namespaces.
  • each first-level aggregation node among the K first-level aggregation nodes is a hierarchical structure, wherein the node in the jth layer in the hierarchical structure is composed of It is obtained by expanding the aggregation node in the j-1th layer, the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and j is positive Integer; the nodes in the jth layer include aggregation nodes and/or computing nodes, and the computing nodes are non-expandable nodes.
  • the device further includes: an updating unit, configured to update the first naming space, the first namespace is a namespace in the computation graph; a reconstruction unit is configured to construct a namespace including the first computation node, and the first computation node does not belong to the updated first computation node. Namespaces.
  • the device further includes: a calculation unit, configured to calculate the hash value of the aggregation node in the bipartite graph, and calculate the hash value of the node; wherein, when the node is the When the aggregation node is used, the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node; when the node is the computing node, the hash value of the node is determined by the Determined by the attribute of the computing node, the attribute of the computing node includes the type of the computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes.
  • a calculation unit configured to calculate the hash value of the aggregation node in the bipartite graph, and calculate the hash value of the node; wherein, when the node is the When the aggregation node is used, the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the
  • the device further includes: a stacking unit, configured to stack and display multiple nodes in the bipartite graph; wherein, the multiple nodes are expanded by the same aggregation node After one time, the hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • a stacking unit configured to stack and display multiple nodes in the bipartite graph; wherein, the multiple nodes are expanded by the same aggregation node After one time, the hash values of the multiple nodes are the same, and the multiple nodes are connected in series or in parallel.
  • FIG. 16 is a schematic structural diagram of a bipartite graph display device in an embodiment of the present application.
  • the device 1600 includes an input unit 1601 and a display unit 1602 .
  • the input unit 1601 is configured to input a computation graph; the display unit 1602 is configured to display the bipartite graph based on the computation graph; wherein, the computation graph includes M communication nodes, and the first of the M communication nodes
  • the communication node corresponds to P precursor nodes and Q successor nodes, the first communication node corresponds to at least one cross-communication edge, and each cross-communication edge in the at least one cross-communication edge indicates a predecessor in the P precursor nodes
  • the communication path between the node and one of the Q successor nodes, and each cross-communication edge does not pass through the M communication nodes, P, Q and M are positive integers, and the M communication Cross-communication edges corresponding to the nodes are not connected in the bipartite graph, and any two communication nodes among the M communication nodes are directly connected without edges in the bipartite graph.
  • the calculation graph includes C computing nodes, and the bipartite graph includes K first-level aggregation nodes; wherein, the K first-level aggregation nodes are composed of the C computing nodes obtained by performing aggregation, each of the K first-level aggregation nodes is a hierarchical structure, wherein the nodes in the jth layer in the hierarchical structure are formed by the j-1th layer in the hierarchical structure obtained by expanding the aggregation nodes in , the first layer in the hierarchical structure is the first-level aggregation node, the nodes in the jth layer belong to different namespaces, and C, K, and j are positive integers, so
  • the nodes in the jth layer include the aggregation node and/or the computing node, and the computing node is a non-expandable node.
  • the above C computing nodes are computing nodes included in the computing graph of the method embodiment in FIG. 3 .
  • the bipartite graph includes a stacking structure; wherein, the stacking structure includes a connection relationship identifier and a quantity identifier, the connection relationship identifier represents a connection relationship between multiple nodes, and the quantity The identifier represents the number of the multiple nodes, the multiple nodes are obtained by expanding the same aggregation node once, the hash values of the multiple nodes are the same, and the connection relationship between the multiple nodes for serial connection or parallel connection.
  • the hash value of the node is equal to the sum of the hash values of the nodes obtained by expanding the aggregation node, and when the node is the aggregation node
  • the hash value of the node is determined by the attributes of the computing node, and the attributes of the computing node include the type of computing node, in-degree, out-degree, type of affiliated node, and number of affiliated nodes .
  • the specific process of obtaining the bipartite graph based on the calculation graph in the above-mentioned bipartite graph display device 1600 is the same as the construction process of the bipartite graph in the method embodiment 1400, and will not be repeated here.
  • the device 1500 and the device 1600 here are embodied in the form of functional units.
  • the term "unit” here may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor for executing one or more software or firmware programs (such as a shared processor, a dedicated processor, or a group processor, etc.) and memory, incorporated logic, and/or other suitable components to support the described functionality.
  • ASIC application specific integrated circuit
  • the device 1500 and the device 1600 can be used to respectively perform the processes and/or steps corresponding to the above-mentioned method embodiment 300 and method embodiment 1400. To avoid repetition, the This will not be repeated here.
  • FIG. 17 is a schematic diagram of a hardware structure of a device for constructing a bipartite graph in an embodiment of the present application.
  • an apparatus 1700 may include: a memory 1701 , one or more (only one is shown in the figure) processors 1702 , an interface circuit 1703 and a bus 1704 .
  • the memory 1701 , the processor 1702 , and the interface circuit 1703 are connected to each other through a bus 1704 .
  • the memory 1701 is used to store instructions, and the processor 1702 is used to call the instructions stored in the memory 1701 .
  • the processor 1702 is specifically configured to acquire a computer program to execute the corresponding bipartite graph construction method in embodiment 300.
  • the bipartite graph construction device of the embodiment of the present application can extract the communication nodes in the calculation graph to the top layer of the bipartite graph, and can clearly display the model structure, thereby quickly and intuitively locating the position and function of the communication nodes, and then provide subsequent parallel provide a basis for strategy design.
  • apparatus 1700 may be specifically a computer, and it may be used to execute each step and/or process in the foregoing method embodiment 300 .
  • the memory 1701 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 1701 may store programs, and when the programs stored in the memory 1701 are executed by the processor 1702, the processor 1702 and the interface circuit 1703 are used to execute each step of the bipartite graph construction method in the embodiment of the present application.
  • Processor 1702 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the bipartite graph construction device of the embodiment of the present application, or execute the bipartite graph construction method of the method embodiment of the present application.
  • the processor 1702 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the bipartite graph construction method of the present application can be completed by instructions in the form of software in the processor 1702 .
  • the above-mentioned processor 1702 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA ready-made programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1701, and the processor 1702 reads the information in the memory 1701, and combines its hardware to complete the functions required by the units included in the bipartite graph construction device of the embodiment of the application, or execute the method of the embodiment of the application. Bipartite graph construction method.
  • the interface circuit 1703 implements communication between the apparatus 1700 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • a transceiver device such as but not limited to a transceiver.
  • the program can be acquired through the interface circuit 1703 .
  • the bus 1704 may include pathways for transferring information between various components of the device 1700 (eg, memory 1701 , processor 1702 , interface circuit 1703 ).
  • an apparatus 1800 may include: a memory 1801 , one or more (only one is shown in the figure) processors 1802 , an interface circuit 1803 and a bus 1804 .
  • the memory 1801 , the processor 1802 , and the interface circuit 1803 realize communication connection with each other through the bus 1804 .
  • the memory 1801 is used to store instructions, and the processor 1802 is used to call the instructions stored in the memory 1801 .
  • the processor 1802 is specifically configured to obtain a computer program to execute the corresponding bipartite graph display method in embodiment 1400.
  • the bipartite graph display device in the embodiment of the present application can process the calculation graph based on the bipartite graph display method in method embodiment 1400, and output a corresponding bipartite graph.
  • the output bipartite graph can clearly show the model structure of the corresponding deep learning model, so as to quickly and intuitively locate the position and function of the communication node, and then provide a basis for the design of subsequent parallel strategies.
  • the apparatus 1800 may be specifically a computer, and it may be used to execute various steps and/or processes in the foregoing method embodiment 1400.
  • the memory 1801 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 1801 may store a program. When the program stored in the memory 1801 is executed by the processor 1802, the processor 1802 and the interface circuit 1803 are used to execute each step of the bipartite graph display method in the embodiment of the present application.
  • the processor 1802 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the bipartite graph display device of the embodiment of the present application, or to execute the bipartite graph display method of the method embodiment of the present application.
  • the processor 1802 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step in the bipartite graph display method of the present application can be completed by instructions in the form of software in the processor 1802 .
  • the above-mentioned processor 1802 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA ready-made programmable gate array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1801, and the processor 1802 reads the information in the memory 1801, and combines its hardware to complete the functions required to be performed by the units included in the bipartite graph display device of the embodiment of the application, or execute the method of the embodiment of the application. Bipartite graph display method.
  • the interface circuit 1803 implements communication between the apparatus 1800 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • a transceiver device such as but not limited to a transceiver.
  • the program can be acquired through the interface circuit 1803 .
  • the bus 1804 may include pathways for transferring information between various components of the device 1800 (eg, memory 1801 , processor 1802 , interface circuit 1803 ).
  • An embodiment of the present application provides a computer-readable storage medium, the computer-readable medium stores a computer program, and when the computer program is executed, the above-mentioned bipartite graph construction method embodiment and/or bipartite graph display method Any part or all of the steps described in the embodiments can be realized.
  • An embodiment of the present application provides a computer program.
  • the computer program includes instructions.
  • any of the methods described in the above-mentioned bipartite graph construction method embodiment and/or bipartite graph display method embodiment are described. Some or all of the steps of one are accomplished.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division.
  • there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种二部图构建方法、显示方法和装置,构建方法包括:从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,第一通信节点为计算图包含的M个通信节点中的一个,第一通信节点对应P个前驱节点和Q个后继节点,至少一条跨通信边中的每条跨通信边指示P个前驱节点中一个前驱节点和Q个后继节点中一个后继节点之间的通信路径,且每条跨通信边不经过M个通信节点;切割M个通信节点分别对应的跨通信边,并进行聚合操作,以得到二部图,M个通信节点中的任意两个通信节点在二部图中无边相连。采用本申请,可以基于构建得到的二部图清晰展示模型结构,从而快速直观地定位通信节点的位置和功能,进而为后续并行策略的设计提供依据。

Description

二部图构建方法、显示方法和装置
本申请要求于2021年11月19日提交中国专利局、申请号为202111381794.7、申请名称为“二部图构建方法、显示方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种二部图构建方法、显示方法和装置。
背景技术
随着深度学习的不断发展,硬件算力的不断提升,深度神经网络的规模越来越大。大模型通常会采用集群并行训练,将数据或者模型进行切分并分配到不同的设备中。在表征并行训练过程的计算图中,通信节点用于指示数据交互任务,该数据交互任务是指两个或两个以上的设备(如图形处理器等(Graphics Processing Unit,GPU))之间的数据交互。通常,研究人员会通过设计合理的并行策略,来实现尽可能大的计算通信比,即最大程度地降低单纯通信的时间。如果并行策略设计不合理,可能会引入冗余的通信节点,从而导致通信节点处出现性能瓶颈。
在设计并行策略的过程中,研究人员通常会以通信节点为入口,分析和调整模型的并行策略。
然而,目前利用Tensorboard等工具展示并行训练计算图时,无法清晰展示大模型的结构,需要逐级展开才能看到通信节点,定位通信节点的过程复杂且繁琐。
发明内容
本申请实施例提供了一种二部图构建方法、显示方法和装置,可以将计算图中的通信节点抽提至二部图的顶层,以清晰展示模型结构,从而快速直观地定位通信节点的位置和功能,进而为后续并行策略的设计提供依据。
第一方面,本申请提供了一种二部图构建方法,包括:从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,M、P和Q为正整数;切割所述M个通信节点中每个通信节点对应的跨通信边,并进行聚合操作,以得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
其中,每个通信节点对应的前驱节点为数据流在流入通信节点之前所经过的非通信节点。每个通信节点对应的后继节点为数据流在流出通信节点之后所经过的非通信节点。每个通信节点对应的跨通信边包括该通信节点对应的前驱节点中任意一个前驱节点到该通信节点对应的后继节点中任意一个后继节点之间的通信路径,且每条跨通信边不经过任何通信节点。
从技术效果上看,本申请通过对计算图中每个通信节点对应的跨通信边进行切割,从而在计算图中只保留经过通信节点的路径。通过此种方式将计算图中的通信节点抽提至二部图的顶层,从而清晰地展示模型结构,并快速定位通信节点的位置和功能,进而为后续模型训练过程中通信节点的融合/切分策略提供依据,以设计最佳的并行策略,来增加计算时间与通信时间的重叠程度,也即降低并行训练过程的训练时长。
在一种可行的实施方式中,所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
其中,计算节点也称为计算算子,为计算图中不可展开的节点。计算算子的类型由计算算子的具体功能决定,例如,对于求对数算子Log Operator而言,其功能为进行对数运算。可选地,其类型可以为Log,即每个计算算子的类型由其标识符进行表示。
从技术效果上看,本申请为每条跨通信边中的子边定义相应的权重系数,通过权重系数来定义每条子边的重要程度,进而为后续跨通信边的切割提供相应的依据。
在一种可行的实施方式中,所述依次切割所述M个通信节点中每个通信节点对应的跨通信边,包括:在第i条跨通信边中存在一条子边已被切割的情况下,不对所述第i条跨通信边进行切割,所述第i条跨通信边为所述M个通信节点中任一通信节点对应的一条跨通信边;或者,在第i条跨通信边所包含的子边均未被切割的情况下,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,所述第i条跨通信边为所述M个通信节点中任一通信节点对应的一条跨通信边。
从技术效果上看,在对跨通信边进行切割时,基于子边的权重系数切割其中的一条子边,一方面可以切割重要程度最小的子边,另一方面,只切割一条子边的方式可以最大程度地保留计算图中的语义信息。
在一种可行的实施方式中,切割后的所述计算图包括K个连通块,K为正整数;所述进行聚合操作,得到所述二部图,包括:对所述切割后的所述计算图中的每个连通块分别进行聚合,得到所述二部图;其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
其中,连通块(也可称为连通分量)为:在跨通信边被切割后,计算图会被通信节点形成的屏障阻隔开,此时计算图中的计算节点相互连接所构成的子图称为连通块或连通分量。每个连通块包括至少一个计算节点。
从技术效果上看,通过对每个连通块分别进行聚合,即将K个连通块分别聚合得到K个一级聚合节点,得到由一级聚合节点和通信节点构成的二部图,从而可以基于此二部图快速定位通信节点的位置和功能,为后续通信节点的融合/切分提供相应的依据。
在一种可行的实施方式中,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中的第j层中的节点是由所述层级结构中第j-1层的节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间;所述第j层中的节点包括聚合节点和/或计算节点,所述计算节点为不可展开的节点。
其中,上述展开为聚合的逆过程。聚合指将至少一个节点和该至少一个节点之间的边所表示的图结构用一个节点进行表示。展开指通过至少一个节点和该至少一个节点之间的边所构成的图结构来表示一个节点。
在一种可行的实施方式中,当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;构建包含第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
从技术效果上看,在对两个计算节点之间的子边进行切割操作后,存在一个计算节点不属于原来的命名空间,此时通过为该计算节点建立新的命名空间,以满足二部图的构建要求。
在一种可行的实施方式中,所述方法还包括:计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;其中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的类型、入度、出度、附属节点的类型、附属节点的数量中的至少一个确定。
其中,计算节点的附属节点流出的数据只流入该计算节点,且计算节点的附属节点没有数据流入,附属节点通常为常量或者变量。
从技术效果上看,通过计算聚合节点的层级结构中每层中节点的哈希值,从而为后续节点的堆叠展示提供相应依据。
在一种可行的实施方式中,所述方法还包括:对所述二部图中的多个节点进行堆叠展示;其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点串行连接或并行连接。
可选地,堆叠展示指在二部图中,用连接关系标识以及数字标识构成的堆叠结构来显示满足上述条件的多个节点;其中,连接关系标识用于表征该多个节点之间的连接关系,例如为并行连接或串行连接;数字标识表示满足上述条件的多个节点的数量。上述条件指由同一所述聚合节点展开一次后得到、哈希值相同,且连接关系为串行连接或并行连接的多个节点。
从技术效果上看,在层级结构中,可以认为哈希值相同的节点其内部结构也相同,因而当其进行串行连接或者并行连接时,可以在对应的层级中进行堆叠展示,从而清晰简洁地展示聚合节点的内部结构。
第二方面,本申请提供了一种二部图的显示方法,所述方法包括:输入计算图,基于所述计算图输出所述二部图;其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
从技术效果上看,本申请通过去除计算图中每个通信节点对应的跨通信边,从而在计算图中只保留数据流经过通信节点的路径。通过此种方式将计算图中的通信节点抽提至二部图的顶层,从而清晰地展示模型结构,并快速定位通信节点的位置和功能,进而为后续模型训练过程中通信节点的融合/切分策略提供依据,以设计最佳的并行策略,来增加计算时间与通信时间的重叠程度,也即降低并行训练过程的训练时长。
在一种可行的实施方式中,所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
在一种可行的实施方式中,所述二部图包括堆叠结构;其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
在一种可行的实施方式中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
具体地,上述第二方面中基于计算图得到二部图的具体过程与上述第一方面中的对应过程相同,此处不再赘述。
第三方面,本申请提供了一种二部图构建装置,包括搜索单元,用于从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点;M、P和Q为正整数;切割单元,用于切割所述M个通信节点分别对应的跨通信边,并进行聚合操作,得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
在一种可行的实施方式中,所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
在一种可行的实施方式中,所述M个通信节点共对应N条跨通信边,N为正整数;在所述切割所述M个通信节点分别对应的跨通信边的方面,所述切割单元具体用于:切割所述N条跨通信边中的每条跨通信边中的一条子边;其中,当所述N条跨通信边中的E条跨通信边包含共同的子边时,所述E条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小,E为小于或等于N的正整数;当所述N条跨通信边中的第i条跨通信边与其它跨通信边不包含共同的子边时,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,i为正整数。
在一种可行的实施方式中,切割后的所述计算图包括K个连通块,K为正整数;在所述进行聚合操作,得到所述二部图的方面,所述切割单元具体用于:对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图;其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M个通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
在一种可行的实施方式中,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中的第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的所述命名空间,j为正整数;所述第j层中的节点包括聚合节点和/或计算节点,所述计算节点为不可展开的节点。
在一种可行的实施方式中,所述装置还包括:更新单元,用于当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;重建单元,用于构建包含所述第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
在一种可行的实施方式中,所述装置还包括:计算单元,用于计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;其中,当所述节点为所述聚合节点时,所述节点的哈 希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
在一种可行的实施方式中,所述装置还包括:堆叠单元,用于对所述二部图中的多个节点进行堆叠展示;其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点串行连接或并行连接。
第四方面,本申请提供了一种二部图显示装置,所述装置包括:输入单元,用于输入计算图;显示单元,用于基于所述计算图显示所述二部图;其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
在一种可行的实施方式中,所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
在一种可行的实施方式中,所述二部图包括堆叠结构;其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
在一种可行的实施方式中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
具体地,上述第四方面中二部图显示装置中基于计算图得到二部图的具体过程与第二方面中二部图的显示方法中二部图的构建过程对应相同,此处不再赘述。
第五方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被执行时,上述第一方面中任意一项所述的方法得以实现。
第六方面,本申请提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被执行时,上述第一方面中任意一项所述的方法得以实现。
附图说明
以下对本申请实施例用到的附图进行介绍。
图1是本申请实施例中一种提供的一种系统架构示意图;
图2是本申请实施例中一种应用场景示意图;
图3是本申请实施例中一种二部图构建方法流程示意图;
图4是本申请实施例中一种计算图的结构示意图;
图5(a)-图5(b)为本申请实施例中一种跨通信边切割方式示意图;
图6(a)-图6(c)为本申请实施例中另一种跨通信边切割方式示意图;
图7是本申请实施例中一种二部图结构示意图;
图8是本申请实施例中一种命名空间的层级结构示意图;
图9是本申请实施例中一种依据命名空间层级结构进行计算节点的聚合过程示意图;
图10(a)-图10(d)是本申请实施例中一种连通块聚合过程实例图;
图11是本申请实施例中一种节点串行连接和并行连接的结构示意图;
图12(a)-图12(b)是本申请实施例中一种堆叠的聚合节点展开过程示意图;
图13(a)-图13(b)是本申请实施例中一种模型训练过程的时间线示例;
图14是本申请实施例中一种二部图显示方法流程示意图;
图15是本申请实施例中一种二部图构建装置的结构示意图;
图16是本申请实施例中一种二部图显示装置的结构示意图;
图17是本申请实施例中一种二部图构建装置的硬件结构示意图;
图18是本申请实施例中一种二部图显示装置的硬件结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先对本申请实施例中的相关术语进行解释:
(1)并行训练:多个图形处理器(Graphics Processing Unit,GPU)参与神经网络模型训练的过程,并行训练的方式包括数据并行、模型并行和流水线并行等。
(2)计算图可视化:将深度学习模型的计算过程及数据流通过计算图以可视化的方式进行展示的过程。
(3)二部图:如果图中所有节点可分为两个互不相交的子集,并且图中每条边连接的两个节点都分属于这两个互不相交的子集,且两个子集中每个子集内的任意两个节点无边直接相连,则此图为二部图。
(4)节点Node:计算图中的节点包括可展开的节点和不可展开的节点。可展开的节点称为聚合节点,其中,可展开的节点指该节点可以由至少一个节点和该至少一个节点之间的边所构成的图结构进行表示。计算图中不可展开的节点可以划分为计算节点和通信节点两类,也可称为计算算子和通信算子。在表征并行训练过程的计算图中,通信算子用于指示数据交互任务,该数据交互任务是指两个或两个以上的设备(如GPU等设备)之间的数据交互。计算算子为计算图中除通信节点外的其它算子,例如收集算子AllGather Operator、广播算子Broadcast Operator、卷积算子Conv2D Operator、最大池化算子MaxPool Operator、相加算子 Add Operator、求对数算子Log Operator、排序算子Sort Operator、转置算子Transpose Operator等。
(5)聚合节点:二部图中能进行展开的节点。聚合节点基于命名空间得到,存储了该聚合节点内的节点信息,包括其子节点列表,以及该节点是否展开等为后续交互探索模块提供支持的属性。
(6)前驱节点:每个通信节点对应的前驱节点为数据流在流入通信节点之前所经过的非通信节点。
(7)后继节点:每个通信节点对应的后继节点为数据流在流出该通信节点之后所经过的非通信节点。
(8)数据流:计算图中的每条边上的箭头指示与该条边直接连接的两个节点之间的数据流向,即从一个节点流出,流入另一个节点。
(9)跨通信边:每个通信节点对应的跨通信边包括该通信节点对应的前驱节点中任意一个前驱节点到该通信节点对应的后继节点中任意一个后继节点之间的通信路径,且每条跨通信边不经过途中的任何通信节点。
(10)连通块/连通分量:在跨通信边被切割后,计算图会被通信节点形成的屏障阻隔开,此时计算图中计算节点相互连接所构成的子图称为连通块或连通分量。每个连通块包括至少一个计算节点。
(11)命名空间(Name Scope):深度学习框架在生成神经网络计算图时,会根据计算逻辑将节点分组,为每个节点生成一个命名空间,将计算图数据中的命名空间解析后,可以得到带层次信息的数据流图。
(12)深度学习(Deep Learning)框架:指通过机器学习的算法,在不同的抽象层级上进行多个层次学习的结构,深度学些框架包括PaddlePaddle、Tensorflow、Caffe、Theano、MXNet、Torch和PyTorch等。
下面介绍本申请实施例的系统架构和应用场景
请参见图1,图1为本申请实施例提供的一种系统架构示意图。用于描述计算机设备100的系统架构。如图1所示,计算机设备100的系统架构可以前端110、后端120和设备层130。
可选地,该计算机设备100可以是手机、电脑、平板或服务器等,本申请对此不限定。
可选地,前端110可以包括网页Web页面或应用程序App页面111,二部图构建单元112。二部图构建单元112可以向后端120发出请求,例如,从服务器或主机目录中读取特定格式(例如,json格式等)的计算图数据,然后解析读取的计算图数据,并构建相应的二部图(此过程也是本申请中的主要过程,将在下文的具体实施例中进行展开);在构建出二部图之后,用户可以在Web页面或者App页面持续进行交互和渲染,来调整和展示二部图的形态,分析对应的模型结构和功能。
可选地,后端120存储有深度学习框架/模型121,用于执行各种深度学习任务,例如图像处理、自然语言处理或其它领域(如科学计算或物理建模等)中的需要进行模型并行训练的任务,本申请对此不限定。此外,后端120可以将存储的深度学习模型转化成特定格式的计算图数据,以供前端110进行读取。在实际处理过程中,对于输入的深度学习模型的计算图,只需要配置统一的计算图数据存储和解析格式,即可采用本申请中的方案进行计算图可视化,即得到深度学习模型对应的二部图。
可选地,设备层130包括处理器131。处理器131可以是多个图形处理器GPU和/或中央 处理器(Central Processing Unit,CPU),用于对深度学习框架/模型121进行并行训练,以及训练结束后的执行。
请参见图2,图2为本申请实施例提供的一种应用场景示意图。应当理解,本申请实施例中的二部图构建方法可应用于包括人工智能(例如图像处理或自然语言处理等)、科学计算等领域中需要利用深度学习模型进行数据处理,且需要对深度学习模型并行训练的场景。
首先,用户基于具体的深度学习任务210构建对应的深度学习模型220,其中,该深度学习任务210可以是图像识别、目标检测、图像分割等图像处理任务或者语音语义识别等自然语言处理任务等,本申请对此不限定;深度学习模型220可以是卷积神经网络模型(Convolutional Neural Network,CNN)、深度信任网络模型(Deep Belief Network,DBN)或堆栈自编码网络模型(Stacked Auto-encoder Network)等。然后并对深度学习模型进行并行训练230,即可以将模型的训练过程拆分到不同的GPU或CPU中并行执行。
进一步地,基于深度学习模型220的训练过程中所形成的计算图数据,利用本申请实施例中的方法进行模型结构可视化240,即构建与计算图数据对应的二部图,并在图形用户界面(Graphic Users Interface,GUI)进行展示。用户基于可视化的二部图快速地定位出通信节点的位置及相应的功能,从而基于此进行并行策略调整250,例如,可以快速确定通信节点的融合/切分策略,从而使得在并行训练过程中通信比最小,即足底啊程度降低训练过程中的通信时长。
在得到训练好的深度学习模型后,进行模型部署260,即将模型部署到各种可行的计算机设备上,例如手机、电脑、服务器等,本申请对此不限定。
请参见图3,图3为本申请实施例中一种二部图构建方法流程示意图。如图3所示,方法300包括步骤S310和步骤S320。
步骤S310:从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点;M、P和Q为正整数。
其中,上述计算图可以是展开后的计算图,即计算图中的节点为计算节点或者通信节点,且不可再进行展开。上述计算图可以为有向图,即计算图中任意两个直接相连的节点之间的边(或称为数据流)是有方向的,指示该两个节点之间的数据流向。
可选地,每个通信节点对应的前驱节点可以为数据流在流入通信节点之前所经过的所有非通信节点。
可选地,每个通信节点对应的后继节点可以为数据流在流出通信节点之后所经过的所有非通信节点。
具体地,上述第一通信节点对应的前驱节点为在计算图中第一通信节点逻辑顺序之前(即数据流在流入第一通信节点之前经过)的所有计算节点,且不包括第一通信节点本身,共P个。第一通信节点对应的后继节点为在计算图中第一通信节点逻辑顺序之后(即数据流在流出第一通信节点之后经过)的所有计算节点,且不包括第一通信节点本身,共Q个。
可选地,上述从计算图中搜索出第一通信节点对应的至少一条跨通信边,具体为:搜索出P个前驱节点中每个前驱节点到第1、2、3…、Q个后继节点之间,且不经过任何通信节 点的通信路径,搜索出的每条通信路径即为一条跨通信边,共搜索出N条跨通信边,N为正整数。应当注意,P个前驱节点中任一前驱节点到Q个后继节点中的任一后继节点之间的跨通信边的数量为大于或等于零的整数。
应当理解,计算图中任意一个通信节点对应的跨通信边的搜索过程与上述第一通信节点对应的跨通信边的搜索过程相同,此处不再赘述。
下面将以图4所示的计算图为例描述搜索通信节点对应的跨通信边的过程。
请参见图4,图4为本申请实施例提供的一种计算图的结构示意图。如图4中所示,计算图400中包含通信节点和计算节点,通信节点用T表示,包括T1、T2和T3;计算节点用J表示,包括J1,J2,…,J10。
对于通信节点T1而言,其对应1个前驱节点:J1;对应4个后继节点:J5、J6、J9和J10。因而,通信节点T1对应的跨通信边共6条,分别为:J1-J2-J5、J1-J2-J5-J6、J1-J2-J5-J6-J9、J1-J2-J5-J6-J9-J10、J1-J2-J4-J8-J10、J1-J2-J3-J7-J10。
对于通信节点T2而言,其对应3个前驱节点:J1、J2和J3;对应2个后继节点:J7和J10。因而,通信节点T2对应的跨通信边共10条,分别为:J3-J7、J3-J7-J10、J2-J3-J7、J2-J3-J7-J10、J2-J4-J8-J10、J2-J5-J6-J9-J10、J1-J2-J3-J7、J1-J2-J5-J6-J9-J10、J1-J2-J4-J8-J10、J1-J2-J3-J7-J10。
对于通信节点T3而言,其对应3个前驱节点:J1、J2和J4;对应2个后继节点:J8和J10。因而,通信节点T3对应的跨通信边共10条,分别为:J4-J8、J4-J8-J10、J2-J4-J8、J2-J4-J8-J10、J2-J5-J6-J9-J10、J2-J3-J7-J10、J1-J2-J4-J8、J1-J2-J5-J6-J9-J10、J1-J2-J4-J8-J10、J1-J2-J3-J7-J10。
可以看出,上述不同通信节点可以对应相同的跨通信边。
综上,在去除不同节点对应的相同跨通信边后,计算图400中的三个通信节点共对应17条跨通信边。
在一种可行实施方式中,所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
其中,在上述计算图中,子边为直接相连的两个计算节点之间的边,每条跨通信边包括至少一条子边。
其中,计算算子的类型由计算算子的具体功能决定。例如,对于求对数算子Log Operator而言,其功能为进行对数运算。可选地,其类型可以为Log,即每个计算算子的类型由其标识符进行表示。
可选地,用户可以根据每个子边直接连接的两个计算节点的类型来确定连接该两个计算节点的子边的权重系数,本申请对此不进行限定。
例如,对于利用Mindspore框架构建的计算图而言,当计算图中包含依次直接相连的Reshape节点、Tile节点和Mul节点时,由于Reshape节点和Tile节点为进行张量运算的节点,其逻辑相近,而Mul节点为进行数学运算的节点,因此,在进行子边切割时,用户更希望能保留Reshape节点和Tile节点之间相连的子边。此时,可以将Reshape节点和Tile节点之间的子边赋予一个较大的权重系数,将Tile节点和Mul节点之间的子边赋予一个较小的权重系数。
可选地,权重系数可以用于表征对应子边的重要程度。例如,权重系数越大,子边的重要程度越高;或者权重系数越大,子边的重要程度越低。
可选地,所有子边对应的权重系数的值可以位于0和1之间。应当理解,也可以为子边对应权重系数的取值区间也可以是其它取值范围,本申请对此不进行限定。
应当理解,上述计算图中每个通信节点对应的跨通信边所包含的子边的权重系数的确定方式和第一通信节点对应跨通信边所包含的子边的权重系数的确定方式相同,此处不再赘述。
步骤S320:切割所述M个通信节点分别对应的跨通信边,并进行聚合操作,以得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
具体地,对计算图中M个通信节点对应的跨通信边进行切割,得到切割后的计算图。其中,切割后的计算图包括K个连通块和M个通信节点,K为正整数。对切割后的计算图进行聚合操作,得到二部图。其中,二部图中所有节点可以划分为两个集合,该两个集合中任一集合中的任意两个节点无边直接相连,该两个集合中的一个集合是由上述M个通信节点构成的集合。
在一种可行实施方式中,所述M个通信节点共对应N条跨通信边,N为正整数;所述切割所述M个通信节点分别对应的跨通信边,包括:切割所述N条跨通信边中的每条跨通信边中的一条子边;其中,当所述N条跨通信边中的E条跨通信边包含共同的子边时,所述E条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小,E为小于或等于N的正整数;当所述N条跨通信边中的第i条跨通信边与其它跨通信边不包含共同的子边时,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,i为正整数。
应当理解,不同的通信节点可以对应相同的跨通信边,上述M个通信节点共对应的N条跨通信边中不包含相同的跨通信边。
具体地,在对上述M个通信节点对应的跨通信边进行切割时,只切割每条跨通信边中的一条子边,从而确保而后续构建得到的二部图能最大程度地保留计算图中的语义信息。
其中,在对M个通信节点分别对应的跨通信边进行切割时,可以采用下述两种方式中的任意一种进行切割。在全局最优切割方式和局部最优切割方式中,在进行分组后,当每组跨通信边中包含共同子边的跨通信边,其切割方式与上述E条跨通信边的切割方式相同;当分组后的一组中只包含一条跨通信边时,该一条跨通信边的切割方式与上述第i条跨通信边的切割方式相同。
(一)全局最优切割
采用L种分组方式分别对上述N条跨通信边进行分组,每种分组方式对应一个权重系数和一种跨通信边的切割方式。在采用L种分组方式中的任意一种分组方式进行分组后,得到至少一组跨通信边。该至少一组跨通信边中的每组都包含至少一条共同的子边,且每组中包含至少一条跨通信边。其中,L种分组方式的区别在于,在采用不同分组方式分组时依据的共同的子边不同。
下面以L种分组方式中的第a种分组方式为例进行描述其对应的权重系数的确定过程:第a种分组方式将N条跨通信边分为A组。在对每组跨通信边进行切割时,对该组中的每条跨通信边切割一条子边,使得该组中被切割的所有子边分别对应的权重系数之和最大或者最小,将得到的权重系数之和作为该组跨通信边的权重系数。依照上述步骤,可以计算得到A组跨通信边分别对应的A个权重系数,然后将A个权重系数相加,得到第a种划分方式对应的权重系数。
依照上述步骤,可以得到L种分组方式分别对应的L个权重系数,将L个权重系数中最大或者最小的权重系数对应的跨通信边切割方式作为计算图的切割方式,即全局最优切割方 式。
(二)局部最优切割
依次切割M个通信节点每个通信节点对应的跨通信边,在切割时,对通信节点的先后顺序不做限定。
搜索出第b个通信节点对应的B条跨通信边,该B条跨通信边中不包括已进行过切割的跨通信边。按照上述全局最优的切割方式,选出B条跨通信边最优的切割方式,该种切割方式使得B条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小。
在切割完第b个通信节点对应的B条跨通信边后,开始切割第b+1个通信节点对应的跨通信边,直到切割完上述M个通信节点。
应当理解,采用此种方式可以使得对每个通信节点对应的跨通信边进行切割时,达到最优。用户可以根据具体地场景,选择采用全局最优切割或者局部最优切割的方式来对计算图中的跨通信边进行切割。
下面将以图5(a)-图5(b)为例,来举例描述采用上述全局最优切割方式切割跨通信边的过程;在示例中,以权重系数越高,对应子边的重要程度越高进行描述。图5(a)-图5(b)中的计算图与图4中的计算图相同。
图5(a)-图5(b)为本申请实施例提供的一种跨通信边切割方式示意图。图5(a)和5(b)分别列举了在进行全局最优切割时,两种不同的分组方式分别对应的两种跨通信边的切割方式。应当理解,在搜索示例计算图中的全局最优切割方式时,还会搜索其它的分组方式对应的切割方式,在此不进行一一列举。
在图5(a)中,将计算图中的17条跨通信边分为3组。第一组包含共同的子边J3-J7,包括的跨通信边为:J3-J7、J2-J3-J7、J2-J3-J7-J10、J3-J7-J10、J1-J2-J3-J7-J10。第二组包含共同的子边J4-J8,包括的跨通信边为:J4-J8、J2-J4-J8、J1-J2-J4-J8、J4-J8-J10、J2-J4-J8-J10、J1-J2-J4-J8-J10。第三组包含共同的子边J2-J5,包括的跨通信边为:J1-J2-J5、J1-J2-J5-J6、J1-J2-J5-J6-J9、J1-J2-J5-J6-J9-J10、J2-J5-J6-J9-J10、J1-J2-J5-J6-J9-J10。对上述三组跨通信边分别进行切割,使得每组跨通信边中被切割的所有子边分别对应的权重系数之和最小。
此时,当第一组跨通信边被切割的子边为J3-J7时,可以使得第一组跨通信边被切割的所有子边的权重系数之和最小,第一组跨通信边切割后对应的权重系数为0.3;当第二组跨通信边被切割的子边为J4-J8时,可以使得第二组跨通信边被切割的所有子边的权重系数之和最小,第二组跨通信边切割后对应的权重系数为0.2;当第三组跨通信边被切割的子边为J2-J5时,可以使得第三组跨通信边被切割的所有子边的权重系数之和最小,第三组跨通信边切割后对应的权重系数为0.2。综上,图5(a)所示的分组方式所对应的跨通信边切割方式中,所有被切割子边的权重系数之和为0.7。
在图5(b)中,将计算图中的17条跨通信边分为5组。第一组包含共同的子边J3-J7,包括的跨通信边为:J3-J7、J3-J7-J10。第二组包含共同的子边J4-J8,包括的跨通信边为:J4-J8、J2-J4-J8、J1-J2-J4-J8、J4-J8-J10、J2-J4-J8-J10、J1-J2-J4-J8-J10。第三组包含共同的子边J1-J2,包括的跨通信边为:J1-J2-J5、J1-J2-J5-J6、J1-J2-J5-J6-J9、J1-J2-J5-J6-J9-J10。第四组包含共同的子边J5-J6,包括的跨通信边为:J2-J5-J6-J9-J10、J1-J2-J5-J6-J9-J10。第五组包含共同的子边J2-J3,包括的跨通信边为:J2-J3-J7、J2-J3-J7-J10、J1-J2-J3-J7-J10。
此时,当第一组跨通信边被切割的子边为J3-J7时,可以使得第一组跨通信边被切割的所有子边的权重系数之和最小,第一组跨通信边切割后对应的权重系数为0.3;当第二组跨通信 边被切割的子边为J4-J8时,可以使得第二组跨通信边被切割的所有子边的权重系数之和最小,第二组跨通信边切割后对应的权重系数为0.2;当第三组跨通信边被切割的子边为J1-J2时,可以使得第三组跨通信边被切割的所有子边的权重系数之和最小,第三组跨通信边切割后对应的权重系数为0.8;当第四组跨通信边被切割的子边为J5-J6时,可以使得第四组跨通信边被切割的所有子边的权重系数之和最小,第四组跨通信边切割后对应的权重系数为0.4;当第五组跨通信边被切割的子边为J2-J3时,可以使得第五组跨通信边被切割的所有子边的权重系数之和最小,第五组跨通信边切割后对应的权重系数为0.5。
综上,图5(b)所示分组方式所对应的跨通信边切割方式中,所有被切割子边的权重系数之和为2.2。由于图5(a)中所有被切割子边的权重系数之和小于图5(b)中所有被切割子边的权重系数之和,因而图5(a)中跨通信边的切割方式更好。
应当理解,虽然图5(a)和图5(b)只分别列举了在追求全局最优时的两种跨通信边的切割方式,但可以看出,图5(a)中切割子边数量最少,且切割子边对应的权重系数之和也是最小,此时可以将图5(a)中跨通信边切割方式作为计算图的全局最优切割方式。
下面将以图6(a)-图6(c)为例,来举例描述采用上述局部最优切割方式切割跨通信边的过程;在示例中,以权重系数越高,对应子边的重要程度越高进行描述。图6(a)-图6(c)中的计算图与图4中的计算图相同。
图6(a)-图6(c)为本申请实施例提供的另一种跨通信边切割方式示意图。图6(a)和图6(c)分别列举了按照从通信节点T1—通信节点T2—通信节点T3的顺序,依次切割每个通信节点对应的跨通信边的过程。应当理解,图6(a)-图6(c)中的切割顺序只是一个具体示例,本领域技术人员也可采用其它顺序,依次对每个通信节点对应的跨通信边进行切割,本申请对此不进行限定。
图6(a)展示了对通信节点T1对应的跨通信边进行切割的过程。通信节点T1对应的跨通信边共6条,包括:J1-J2-J5、J1-J2-J5-J6、J1-J2-J5-J6-J9、J1-J2-J5-J6-J9-J10、J1-J2-J4-J8-J10、J1-J2-J3-J7-J10。对此6条跨通信边进行切割,使得所有被切割的子边分别对应的权重系数之和最小。可以看出,此6条跨通信边包含一条共同的子边J1-J2,因而当切割子边J1-J2时,权重系数之和最小,为0.8。
图6(b)展示了通信节点T1对应的跨通信边切割后,对通信节点T2对应的跨通信边进行切割的过程。通信节点T2此时对应的跨通信边共6条,包括:J3-J7、J3-J7-J10、J2-J3-J7、J2-J3-J7-J10、J2-J4-J8-J10、J2-J5-J6-J9-J10。按照前述实施例中全局最优切割方式搜索出对此6条跨通信边的最优切割方式,使得6条跨通信边中所有被切割子边的权重系数之和最小。容易理解,可以将此6条跨通信边分为三组,第一组包含四条跨通信边:J3-J7、J3-J7-J10、J2-J3-J7、J2-J3-J7-J10,其包含共同的子边J3-J7。第二组为跨通信边J2-J4-J8-J10。第三组为跨通信边J2-J5-J6-J9-J10。此时,当第一组跨通信边被切割的子边为J3-J7时,可以使得第一组跨通信边被切割的所有子边的权重系数之和最小,第一组跨通信边切割后对应的权重系数为0.3。当第二组跨通信边被切割的子边为J4-J8时,可以使得第二组跨通信边被切割的所有子边的权重系数之和最小,第二组跨通信边切割后对应的权重系数为0.2。当第三组跨通信边被切割的子边为J2-J5时,可以使得第三组跨通信边被切割的所有子边的权重系数之和最小,第三组跨通信边切割后对应的权重系数为0.2。此种全局最优的切割方式下,所有被切割子边分别对应的权重系数之和为0.7。
图6(c)展示了对通信节点T2对应的跨通信边进行切割后得到的计算图。可以看出, 通信节点T3对应的跨通信边在前述过程中已全部被切割,因此,图6(c)即为采用局部最优切割方式进行切割后得到的计算图。采用局部最优切割方式进行切割时,所有被切割子边对应的权重之和为1.5。
综上,从图5(a)-图5(b)和图6(a)-图6(c)分别展示的全局最优切割和局部最优切割方式可以看出,采用全局最优切割时,所有被切割子边对应的权重系数之和小于局部最优切割时所有被切割子边对应的权重系数之和,全局最优切割可以最大程度保留计算图中的语义信息。
在一种可行实施方式中,在对跨通信边进行切割后,对所有切割的子边进行标记,使得后续用户对构建好的二部图进行展示过程中,当用户将用户界面的光标等操作按钮移动到子边被切割的计算节点上时,显示该计算节点被切割的子边。
在一种可行实施方式中,切割后的所述计算图包括K个连通块,K为正整数;所述进行聚合操作,得到所述二部图,包括:对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图;其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M个通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
其中,切割后的计算图中的计算节点被M个通信节点形成的屏障阻隔开,形成K个连通块,该K个连通块中每个连通块包括至少一个计算节点,具体可参见后文图10(a)-图10(d)所示实施例的详细描述。
请参见图7,图7为本申请实施例提供的一种二部图结构示意图。如图7所示,基于计算图构建得到的二部图中包含K个一级聚合节点和M个通信节点。二部图中所有节点可以分为两个集合,其中一个集合包括所有的通信节点,另一个集合包括所有的一级聚合节点,且该另个集合中任一集合中的任意两个节点之间无边直接相连。如图7所示,K个一级聚合节点中任意两个一级聚合节点之间无边直接相连,且M个通信节点中的任意两个通信节点之间无边直接相连。O个一级聚合节点通过通信节点与K-O个一级聚合节点进行通信。上述对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图,具体包括:基于每个连通块中计算节点所属命名空间的层级结构,对每个连通块进行聚合,得到与每个连通块对应的一个一级聚合节点,共得到K个一级聚合节点。同时,为该K个一级聚合节点分别构建对应的K个命名空间,即该K个一级聚合节点分别属于K个命名空间。
下面以K个连通块中第一连通块为例来描述,基于第一连通块中计算节点所属命名空间的层级结构对第一连通块进行聚合的过程。其中,第一连通块中的节点属于Z组具有层级结构的命名空间,Z为正整数。
下面以第e组命名空间为例,描述依据其层级结构进行计算节点聚合的过程。具体参见图8,图8为本申请实施例中一种命名空间的层级结构示意图。如图8所示,第e组命名空间具有n个命名空间,分别为:X1、X2、…、Xn。该n个命名空间为可以逐级展开的层级结构:命名空间X1包含命名空间X2,…,命名空间Xn-1包含命名空间Xn。其中,计算节点J11属于命名空间X1;计算节点J12和J13属于命名空间X2;计算节点J14,…,Jd属于命名空间Xn。
然后依据命名空间的层级结构对第e组命名空间中包含的计算节点进行逐层聚合。具体参见图9,图9为本申请实施例提供的一种依据命名空间层级结构进行计算节点的聚合过程 示意图。如图9所示,首选从第n层命名空间开始,将属于第n层命名空间的计算节点(J14,…,Jd)进行聚合;然后将得到的聚合节点与属于第n-1层命名空间的计算节点进行聚合;依照此规则逐层聚合,在将第3层命名空间中的节点进行聚合后,得到聚合节点G25;将聚合节点G25与属于第2层命名空间的计算节点进行聚合,得到聚合节点G26,即在对第e组命名空间中包含的计算节点进行聚合后,得到聚合节点G26和计算节点J11。应当注意,在进行节点聚合过程中,还胡同时为聚合节点创建对应的命名空间,如图9所示,为聚合节点G26创建的命名空间为Xg26,为聚合节点G25创建的命名空间为Xg25。命名空间Xg26与命名空间X1同为第一层命名空间,命名空间Xg25与命名空间X2同为第二层命名空间。
按照上述第e组命名空间中包含计算节点的聚合过程,对每组命名空间中包含计算节点进行聚合,得到Z组聚合结果,该Z组聚合结果中包含的节点属于同一层;最后将该Z组聚合结果进行聚合,得到第一连通块对应的一个一级聚合节点,并同时为该一级聚合节点创建对应的命名空间。
同理,可以参照第一连通块的聚合方式对K个连通块中每个连通块进行聚合,得到K个一级聚合节点。
在一种可行实施方式中,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中的第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间;所述第j层中的节点包括聚合节点和/或计算节点,所述计算节点为不可展开的节点。
由上述对第一连通块进行聚合的过程可知,每个一级聚合节点都为层级结构,且层级结构中的每层中的节点是由上一层中的聚合节点展开得到的,一级聚合节点为其层级结构中的第一层,即层级结构中的最上层。其中,展开为聚合的逆过程。聚合指将至少一个节点和该至少一个节点之间的边所表示的图结构用一个节点进行表示。展开指通过至少一个节点和该至少一个节点之间的边所构成的图结构来表示一个节点。
其中,层级结构每层中节点分别属于不同的命名空间,该不同的命名空间在命名空间的层级结构中属于同一层。每层中节点可以只包括聚合节点,或者只包括计算算,或者包括聚合节点和计算节点。
其中,聚合节点为可以进行展开的节点,计算节点为计算图中不可展开的最小单元。
在一种可行实施方式中,所述方法还包括:当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;构建包含第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
其中,第一命名空间为计算图中的任意一个命名空间。第一计算节点和第二计算节点为第一命名空间中任意两个计算节点。
具体地,在对计算图中的跨通信边进行切割后,第一计算节点和第二计算节点之间的子边被切割,此时,第一计算节点和第二计算节点在计算图切割后属于不同的连通块。在切割后,第一计算节点不属于第一命名空间,第二计算节点仍属于第一命名空间。此时由于第一命名空间中计算节点的数量发生了变化,可以更新第一命名空间对应的属性,包括第一命名空间的标识符和包含的计算节点的数量等。同时,可以为第一计算节点构建对应的命名空间,为第一计算节点构建的命名空间的层级结构与第二计算节点属于命名空间的层级结构相同。 例如,若第二计算节点属于三层命名空间中第三命名空间中,则此时可以为第一计算节点构建一个三层命名空间,且第一计算节点属于该三层命名空间中的第三层命名空间,此时,第一计算节点和第二计算节点所属的命名空间的标识符不同。
可选地,命名空间的标识符可以由字母、数字或字母和数字的组合,或其它字符进行表征,本申请对此不限定。
下面将以图10(a)-图10(d)为例,详细描述对切割跨通信边后的计算图中的连通块进行聚合,得到相应二部图的过程。
请参见图10(a)-图10(d),图10(a)-图10(d)为本申请实施例提供的一种连通块聚合过程实例图。
其中,图10(a)为进行跨通信边切割后,得到的切割后的计算图。图10(a)是基于图5(a)中的全局最优切割方式进行切割后得到的结果。如图10(a)所示,计算图在进行切割后,所有的计算节点被通信节点阻隔开,形成了两个连通块V1和连通块V2,即切割后的计算图中包括两个连通块(V1和V2)和三个计算节点(T1、T2和T3)。
图10(b)展示了计算图中命名空间的层级结构,以及每个节点所属于的命名空间。如图10(b)所示,计算图对应的命名空间的层级结构为两层。其中,第一层命名空间包括:命名空间D1、命名空间H1和命名空间R1。命名空间D1没有子命名空间,命名空间H1包含命名空间H2,命名空间R1包含命名空间R2,即命名空间H2和命名空间R2分别为命名空间S1和命名空间R1的子命名空间。其中,计算节点J1和通信节点T1属于命名空间D1;通信节点T2、通信节点T3和计算节点J2属于命名空间H1;计算节点J3、计算节点J4和计算节点J5属于命名空间H2;计算节点J6和计算节点J7属于命名空间R1;计算节点J8、计算节点J9和计算节点J10属于命名空间R2。
图10(c)展示了图10(a)中连通块V1和V2的聚合过程。其中,虚线框树状图表示命名空间的层级结构,实现框树状图表示一级聚合节点的层级结构。
在对连通块V1进行聚合的过程中,由于计算节点J5被切割出命名空间H2,此时可以将命名空间H2标识符更新为H2_1,计算节点J3和计算节点J4属于命名空间H2_1。此时,连通块V1对应两组层级结构的命名空间,分别为:D1、H1-H2。依照命名空间的层级结构进行聚合:首先将计算节点J3和计算节点J4聚合为聚合节点G2,聚合节点G2属于命名空间H1,由于H1中节点数量发生变化,可以将命名空间H1标识符更新为H1_1;最后将同属于第一层命名空间的三个节点(计算节点J1、计算节点J2和聚合节点G2)进行聚合,得到连通块V1对应的一级聚合节点G1,并为其构建对应的命名空间U。
在对连通块V2进行聚合的过程中,由于在进行跨通信边切割后,计算节点J5不属于命名空间H2,此时可以依据计算节点J5在切割前对应的命名空间构建新的层级命名空间,即两层命名空间H1_2-H2_2,计算节点J5属于命名空间H2_2。此时,连通块V2对应两组层级结构的命名空间,分别为:H1_2-H2_2、R1-R2。依照命名空间的层级结构进行聚合:首先将计算节点J8、计算节点J9和计算节点J10聚合为聚合节点G4,聚合节点G4和计算节点J6、计算节点J7同属于命名空间R1,此时可以将命名空间R1标识符更新为R1_1;同时,对计算节点J5进行聚合,得到聚合节点G5,聚合节点G5属于命名空间H1_2;最后将属于同一层命名空间的四个节点(聚合节点G5、聚合节点G4和计算节点J6、计算节点J7)进行聚合,得到连通块V2对应的一级聚合节点G6,并未其构建对应的命名空间S。
图10(d)展示进行连通块聚合后,得到的二部图。如图10(d)所示,连通块V1聚合 后得到一级聚合节点G1,连通块V2聚合后得到一级聚合节点G6,G1和G6通过三个通信节点(T1、T2和T3)进行通信。三个通信节点中的任意两个无边直接相连,且两个一级聚合节点之间也无边直接相连。
在一种可行实施方式中,所述方法还包括:计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;其中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型、附属节点的数量。
其中,上述计算上述一级聚合节点层级结构中聚合节点的哈希值,以及计算节点的哈希值,包括:从层级结构的底层开始,逐层往上,依次计算每层中所有节点对应的哈希值,直到最后计算得到一级聚合节点的哈希值。对于聚合节点而言,其哈希值等于其在进行一次展开后,得到的所有节点的哈希值之和;对于计算节点而言,其哈希值由该计算节点的属性决定。
可选地,计算节点的属性包括计算节点的类型、入度、出度、附属节点的类型、附属节点的数量等,本申请对此不进行限定。进一步,可选地,计算节点的类型用该节点的标识符进行表征,例如Add节点用add表征,Reduce节点用reduce进行表征;计算节点的入度为直接流入该计算节点的数量;出度为从该计算节点流出后,直接流入的节点的数量;计算节点的附属节点为数据只输入到该计算节点的节点,且附属节点无数据输入,附属节点的类型为可以为常量或者变量,可以分别用字符串Const和Para表示,本申请对此不限定。
可选地,在确定计算节点对应的哈希值过程中,可以将表征每个计算节点属性的字符串进行拼接,得到一个字符串;然后采用DJB哈希算法(或称为Times33算法)等将该一个字符串映射为对应的哈希值。
在一种可行实施方式中,所述方法还包括:对所述二部图中的多个节点进行堆叠展示,其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点串行连接或并行连接。可选地,堆叠展示指在二部图中,用连接关系标识以及数字标识构成的堆叠结构来显示满足上述条件的多个节点;其中,连接关系标识用于表征该多个节点之间的连接关系,例如为并行连接或串行连接;数字标识表示满足上述条件的多个节点的数量。上述条件指由同一所述聚合节点展开一次后得到、哈希值相同,且连接关系为串行连接或并行连接的多个节点。
具体地,对每个一级聚合节点按照从上层到下层(即从层级结构的第一层开始)的顺序,逐层进行检测,当第一聚合节点进行一次展开后,得到的节点数量大于或等于预设数量时,对展开后得到的节点进行同构检测,具体地:检测展开后得到的节点中是否存在哈希值相同的节点。其中,第一聚合节点为一级聚合节点层级结构中的任意一层中的一个聚合节点。
在检测到展开后的节点中存在哈希值相同的多个节点,且该多个节点之间的连接关系为并行连接或者串行连接时,对该多个节点进行堆叠展示。其中,将哈希值相同的多个节点作为内部结构相同的节点。其中,串行连接指该多个节点依次连接,数据流起始的节点与数据流终止的节点之间只有一条通信路径。并行连接指多个节点中所有节点的输入数据流是由同一节点流出,中间不经过任何节点;且该多个节点中所有节点的输出数据流流入同一节点,中间也不经过任何节点。可选地,用户可以采用双击或单击等操作对该同一聚合节点进行操作,以对该同一聚合节点进行展开,该同一聚合节点展开后,即得到堆叠展示的多个节点。 通过进行堆叠展示,可以使得在用户进行二部图可视化时,简化聚合节点的层级结构,节省用户界面空间,有利于用户更加快速了解聚合节点的内部结构。
请参见图11,图11为本申请实施例提供的一种节点串行连接和并行连接的结构示意图。如图11所示,节点1-5为哈希值相同的五个节点,即同构节点。在进行串行结构检测时,遍历该5个节点所连的边,检测到节点1和节点3之间只有一条通信路径,即节点1、节点2和节点3为串行的3个节点,可以对其进行堆叠展示。在进行并行结构检测时,以节点4和节点5作为起点,分别进行前向搜索和后向搜索。在进行前向搜索时,发现节点4和节点5到聚合节点Fhub处汇聚,在进行后项搜索时,发现节点4和节点5到聚合节点Bhub处汇聚,则节点4和节点5为并行连接,同样可以对节点4和节点5进行堆叠展示。
请参见图12(a)-图12(b),图12(a)-图12(b)为本申请实施例提供的一种堆叠的聚合节点展开过程示意图。
图12(a)为串行结构堆叠的聚合节点可视化展开过程示意图。如图12(a)所示,节点J20、堆叠结构W1和节点J21可以为第二聚合节点进行一次展开后的展开结果,第二聚合节点可以是任一一级聚合节点的任一层中的节点。由于第二聚合节点展开后得到的节点中存在可以堆叠的子结构(即哈希值相同的节点),可以采用堆叠结构W1进行展示。堆叠结构W1中的标识1为连接关系标识,表示堆叠结构W1中堆叠的节点为串行结构,堆叠结构W1中的数字n1表示进行堆叠的同构节点的数量。用户可以采用单击或双击等操作来进一步展开该堆叠结构W1,得到堆叠结构W2。堆叠结构W2可以展示单个同构节点的内部结构。用户还可以进一步对堆叠结构W2进行展开,得到堆叠结构W2的全展开示意图,即展示n1个串行连接的同构节点的实际连接关系。在可视化过程中,该串行连接的同构节点可以用相同的颜色进行展示。
图12(b)为并行结构堆叠的聚合节点可视化展开过程示意图。如图12(b)所示,节点J22、堆叠结构W3和节点J23可以为第三聚合节点进行一次展开后的展开结果,第三聚合节点可以是任一一级聚合节点的任一层中的节点。由于第三聚合节点展开后得到的节点中存在可以堆叠的子结构(即哈希值相同的节点),可以采用堆叠结构W3进行展示。堆叠结构W3中的标识2为连接关系标识,表示堆叠结构W3中堆叠的节点为并行结构,堆叠结构W3中的数字n2表示进行堆叠的同构节点的数量。用户可以采用单击或双击等操作来进一步展开该堆叠结构W3,得到堆叠结构W4。堆叠结构W4可以展示单个同构节点的内部结构。用户还可以进一步对堆叠结构W4进行展开,得到堆叠结构W4的全展开示意图,即展示n2个并行连接的同构节点的实际连接关系。在可视化过程中,该并行连接的同构节点可以用相同的颜色进行展示。
应当理解,在实际应用过程中,也可采用其它连接关系标识来表征串行连接和并行连接的堆叠,本申请对此不限定。
请参见图13(a)-图13(b),图13(a)-图13(b)为本申请实施例提供的一种模型训练过程的时间线示例。在实际应用过程中,用户可以基于本方案构建出各种深度学习任务的二部图,二部图可以清晰地呈现模型的结构,从而使得用户基于二部图快速定位通信节点的位置和功能,进而制定通信节点的融合/切分策略,尽可能地降低训练过程中的通信时长。
在一种可能的场景下,用户可以查看模型训练过对应的时间线Timeline,观察通信时间 和计算时间之间的重叠,找到计算和通信之间不重叠的通信节点,然后在基于本方案构建出的二部图中快速定位通信节点,并根据具体地图结构分析通信节点的作用,从而制定合理的通信节点融合/切分策略,使得不同深度学习任务对应的模型训练过程的时长。
举例来说,在采用Mindspore训练残差网络ResNet-50,发现训练过程对应的timeline中,通信节点AllReduce的通信时长与计算节点对应计算时长之间不存在重叠Overlap,迭代拖尾时间较长,timeline如图13(a)所示,第一层为计算节点对应的计算时长,第二层为通信节点对应的通信时长。在将残差网络对应的训练过程构建成相应的二部图后,可以快速且清晰地发现框架自动将所有用于反向梯度聚合的162个通信节点AllReduce融合成了一个节点,因而导致了timeline中的通信时长和计算时长之间不存在重叠。此时,对融合后的通信节点进行切分,将前55个通信节点融合成1个,将第55-108个通信节点融合成1个,将第109-162通信节点融合成一个,重新进行训练,得到的训练过程timeline,如图13(b)所示。可以看出,在对通信节点进行切分后,计算时长与通信时长之间产生重叠。且相对于通信节点切分前梯度计算及融合的过程耗时t1,在进行通信节点切分后,梯度计算及融合的过程耗时t2明显缩短。
请参见图14,图14为本申请实施例中一种二部图显示方法流程示意图。如图14所示,方法1400包括步骤S1410。
步骤S1410:输入计算图,基于所述计算图输出所述二部图。
其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
在一种可行的实施方式中,所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
在一种可行的实施方式中,所述二部图包括堆叠结构;其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
在一种可行的实施方式中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
具体地,上述方法实施例1400中基于计算图得到二部图的具体过程与上述方法实施例300中的对应过程相同,此处不再赘述。
请参见图15,图15为本申请实施例提供的一种二部图构建装置结构示意图。如图15所示,二部图构建装置1500包括搜索单元1501和切割单元1502。
搜索单元1501,用于从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点;M、P和Q为正整数。切割单元1502,用于切割所述M个通信节点分别对应的跨通信边,并进行聚合操作,得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
在一种可行的实施方式中,所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
在一种可行的实施方式中,所述M个通信节点共对应N条跨通信边,N为正整数;在所述切割所述M个通信节点分别对应的跨通信边的方面,所述切割单元1502具体用于:切割所述N条跨通信边中的每条跨通信边中的一条子边;其中,当所述N条跨通信边中的E条跨通信边包含共同的子边时,所述E条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小,E为小于或等于N的正整数;当所述N条跨通信边中的第i条跨通信边与其它跨通信边不包含共同的子边时,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,i为正整数。
在一种可行的实施方式中,切割后的所述计算图包括K个连通块,K为正整数;在所述进行聚合操作,得到所述二部图的方面,所述切割单元1502具体用于:对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图;其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M个通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
在一种可行的实施方式中,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中的第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的所述命名空间,j为正整数;所述第j层中的节点包括聚合节点和/或计算节点,所述计算节点为不可展开的节点。
在一种可行的实施方式中,所述装置还包括:更新单元,用于当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;重建单元,用于构建包含所述第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
在一种可行的实施方式中,所述装置还包括:计算单元,用于计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;其中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
在一种可行的实施方式中,所述装置还包括:堆叠单元,用于对所述二部图中的多个节点进行堆叠展示;其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个 节点的哈希值相同,且所述多个节点串行连接或并行连接。
请参见图16,图16为本申请实施例中一种二部图显示装置的结构示意图。如图16所示,装置1600包括输入单元1601和显示单元1602。
输入单元1601,用于输入计算图;显示单元1602,用于基于所述计算图显示所述二部图;其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
在一种可行的实施方式中,所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
其中,上述C个计算节点为图3方法实施例计算图中包含的计算节点。
在一种可行的实施方式中,所述二部图包括堆叠结构;其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
在一种可行的实施方式中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
具体地,上述二部图显示装置1600中基于计算图得到二部图的具体过程与方法实施例1400中二部图的构建过程对应相同,此处不再赘述。
这里的装置1500和装置1600以功能单元的形式体现。这里的术语“单元”可以指应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,装置1500和装置1600可以用于分别执行与上述方法实施例300、方法实施例1400中对应的各个流程和/或步骤,为避免重复,在此不再赘述。
请参见图17,图17为本申请实施例中一种二部图构建装置的硬件结构示意图。如图17所示,装置1700可以包括:存储器1701、一个或多个(图中仅示出一个)处理器1702、接口电路1703以及总线1704。其中,存储器1701、处理器1702、接口电路1703通过总线1704实现彼此之间的通信连接。
存储器1701,用于存储指令,该处理器1702用于调用该存储器1701中存储的指令。
处理器1702具体用于获取计算机程序,以执行实施例300中对应的二部图构建方法。
本申请实施例的二部图构建装置,可以将计算图中的通信节点抽提至二部图的顶层,可以清晰展示模型结构,从而快速直观地定位通信节点的位置和功能,进而为后续并行策略的设计提供依据。
应理解,装置1700可以具体为计算机,并且其可以用于执行上述方法实施例300中的各个步骤和/或流程。
存储器1701可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1701可以存储程序,当存储器1701中存储的程序被处理器1702执行时,处理器1702和接口电路1703用于执行本申请实施例的二部图构建方法的各个步骤。
处理器1702可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的二部图构建装置中的单元所需执行的功能,或者执行本申请方法实施例的二部图构建方法。
处理器1702还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的二部图构建方法的各个步骤可以通过处理器1702中软件形式的指令完成。上述的处理器1702还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1701,处理器1702读取存储器1701中的信息,结合其硬件完成本申请实施例的二部图构建装置中包括的单元所需执行的功能,或者执行本申请方法实施例的二部图构建方法。
接口电路1703使用例如但不限于收发器一类的收发装置,来实现装置1700与其他设备或通信网络之间的通信。例如,可以通过接口电路1703获取程序。
总线1704可包括在装置1700各个部件(例如,存储器1701、处理器1702、接口电路1703)之间传送信息的通路。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述二部图构建方法实施例300中的对应过程,在此不再赘述。
请参见图18,图18为本申请实施例中一种二部图显示装置的硬件结构示意图。如图18所示,装置1800可以包括:存储器1801、一个或多个(图中仅示出一个)处理器1802、接口电路1803以及总线1804。其中,存储器1801、处理器1802、接口电路1803通过总线1804实现彼此之间的通信连接。
存储器1801,用于存储指令,该处理器1802用于调用该存储器1801中存储的指令。
处理器1802具体用于获取计算机程序,以执行实施例1400中对应的二部图显示方法。
本申请实施例的二部图显示装置,可以基于方法实施例1400中的二部图显示方法,对计算图进行处理,输出对应的二部图。输出的二部图可以清晰展示对应深度学习模型的模型结 构,从而快速直观地定位通信节点的位置和功能,进而为后续并行策略的设计提供依据。
应理解,装置1800可以具体为计算机,并且其可以用于执行上述方法实施例1400中的各个步骤和/或流程。
存储器1801可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1801可以存储程序,当存储器1801中存储的程序被处理器1802执行时,处理器1802和接口电路1803用于执行本申请实施例的二部图显示方法的各个步骤。
处理器1802可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的二部图显示装置中的单元所需执行的功能,或者执行本申请方法实施例的二部图显示方法。
处理器1802还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的二部图显示方法中的各个步骤可以通过处理器1802中软件形式的指令完成。上述的处理器1802还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1801,处理器1802读取存储器1801中的信息,结合其硬件完成本申请实施例的二部图显示装置中包括的单元所需执行的功能,或者执行本申请方法实施例的二部图显示方法。
接口电路1803使用例如但不限于收发器一类的收发装置,来实现装置1800与其他设备或通信网络之间的通信。例如,可以通过接口电路1803获取程序。
总线1804可包括在装置1800各个部件(例如,存储器1801、处理器1802、接口电路1803)之间传送信息的通路。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述二部图显示方法实施例1400中的对应过程,在此不再赘述。
本申请实施例提供了一种计算机可读存储介质,所述计算机存储可读介质存储有计算机程序,该计算机程序被执行时,使得上述二部图构建方法实施例和/或二部图显示方法实施例中记载的任意一种的部分或全部步骤得以实现。
本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被处理器执行时,使得上述二部图构建方法实施例和/或二部图显示方法实施例中记载的任意一种的部分或全部步骤得以实现。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次, 本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (26)

  1. 一种二部图的构建方法,其特征在于,所述方法包括:
    从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,M、P和Q为正整数;
    切割所述M个通信节点分别对应的跨通信边,并进行聚合操作,以得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
  2. 根据权利要求1所述的方法,其特征在于,
    所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;
    所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
  3. 根据权利要求2所述的方法,其特征在于,所述M个通信节点共对应N条跨通信边,N为正整数;所述切割所述M个通信节点分别对应的跨通信边,包括:
    切割所述N条跨通信边中的每条跨通信边中的一条子边;
    其中,当所述N条跨通信边中的E条跨通信边包含共同的子边时,所述E条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小,E为小于或等于N的正整数;当所述N条跨通信边中的第i条跨通信边与其它跨通信边不包含共同的子边时,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,i为正整数。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,切割后的所述计算图包括K个连通块,K为正整数;所述进行聚合操作,得到所述二部图,包括:
    对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图;
    其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M个通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
  5. 根据权利要求4所述的方法,其特征在于,
    所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的所述命名空间,j为正整数;
    所述第j层中的节点包括聚合节点和/或计算节点,所述计算节点为不可展开的节点。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;
    构建包含所述第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
  7. 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:
    计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;
    其中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    对所述二部图中的多个节点进行堆叠展示;
    其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点串行连接或并行连接。
  9. 一种二部图的显示方法,其特征在于,所述方法包括:
    输入计算图,基于所述计算图输出所述二部图;
    其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
  10. 根据权利要求9所述的方法,其特征在于,
    所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;
    其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
  11. 根据权利要求10所述的方法,其特征在于,
    所述二部图包括堆叠结构;
    其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
  12. 根据权利要求11所述的方法,其特征在于,
    当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点 的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
  13. 一种二部图的构建装置,其特征在于,所述装置包括:
    搜索单元,用于从计算图中搜索出第一通信节点对应的至少一条跨通信边,其中,所述第一通信节点为所述计算图包含的M个通信节点中的一个,所述第一通信节点对应P个前驱节点和Q个后继节点,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱节点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,M、P和Q为正整数;
    切割单元,用于切割所述M个通信节点分别对应的跨通信边,并进行聚合操作,以得到所述二部图,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
  14. 根据权利要求13所述的装置,其特征在于,
    所述每条跨通信边包括至少一条子边,所述至少一条子边中的每条子边直接连接两个计算节点;
    所述至少一条子边中的每条子边对应一个权重系数,所述每条子边对应的一个权重系数由所述每条子边直接连接的两个计算节点的类型决定。
  15. 根据权利要求13或14所述的装置,其特征在于,所述M个通信节点共对应N条跨通信边,N为正整数;在所述切割所述M个通信节点分别对应的跨通信边的方面,所述切割单元具体用于:
    切割所述N条跨通信边中的每条跨通信边中的一条子边;
    其中,当所述N条跨通信边中的E条跨通信边包含共同的子边时,所述E条跨通信边中被切割的所有子边分别对应的权重系数之和最大或者最小,E为小于或等于N的正整数;当所述N条跨通信边中的第i条跨通信边与其它跨通信边不包含共同的子边时,切割所述第i条跨通信边所包含的子边中权重系数最小或者权重系数最大的子边,i为正整数。
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,切割后的所述计算图包括K个连通块,K为正整数;在所述进行聚合操作,得到所述二部图的方面,所述切割单元具体用于:
    对所述切割后的所述计算图中的K个连通块分别进行聚合,得到所述二部图;
    其中,所述K个连通块是基于所述M个通信节点在所述计算图中的位置对所述计算图中的计算节点进行划分得到的,所述二部图包括K个一级聚合节点和所述M个通信节点,所述K个一级聚合节点中任意两个一级聚合节点无边直接相连,且所述K个一级聚合节点分别属于K个命名空间。
  17. 根据权利要求16所述的装置,其特征在于,
    所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中的第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的所述命名空间,j为正整数;
    所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
  18. 根据权利要求17所述的装置,其特征在于,所述装置还包括:
    更新单元,用于当第一命名空间中第一计算节点和第二计算节点之间的子边被切割时,更新所述第一命名空间,所述第一命名空间为所述计算图中的命名空间;
    重建单元,用于构建包含所述第一计算节点的命名空间,所述第一计算节点不属于更新后的所述第一命名空间。
  19. 根据权利要求17或18所述的装置,其特征在于,所述装置还包括:
    计算单元,用于计算所述二部图中聚合节点的哈希值,以及计算节点的哈希值;
    其中,当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
  20. 根据权利要求19所述的装置,其特征在于,所述装置还包括:
    堆叠单元,用于对所述二部图中的多个节点进行堆叠展示;
    其中,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点串行连接或并行连接。
  21. 一种二部图显示装置,其特征在于,所述装置包括:
    输入单元,用于输入计算图;
    显示单元,用于基于所述计算图显示所述二部图;
    其中,所述计算图包括M个通信节点,所述M个通信节点中的第一通信节点对应P个前驱节点和Q个后继节点,所述第一通信节点对应至少一条跨通信边,所述至少一条跨通信边中的每条跨通信边指示所述P个前驱节点中一个前驱结点和所述Q个后继节点中一个后继节点之间的通信路径,且所述每条跨通信边不经过所述M个通信节点,P、Q和M为正整数,所述M个通信节点分别对应的跨通信边在所述二部图中不连通,所述M个通信节点中的任意两个通信节点在所述二部图中无边直接相连。
  22. 根据权利要求21所述的装置,其特征在于,
    所述计算图包括C个计算节点,所述二部图包括K个一级聚合节点;
    其中,所述K个一级聚合节点是由所述C个计算节点进行聚合得到的,所述K个一级聚合节点中的每个一级聚合节点为层级结构,其中,所述层级结构中第j层中的节点是由所述层级结构中第j-1层中的聚合节点展开得到的,所述层级结构中的第一层为所述一级聚合节点,所述第j层中的节点分别属于不同的命名空间,C、K和j为正整数,所述第j层中的节点包括所述聚合节点和/或所述计算节点,所述计算节点为不可展开的节点。
  23. 根据权利要求22所述的装置,其特征在于,
    所述二部图包括堆叠结构;
    其中,所述堆叠结构包括连接关系标识和数量标识,所述连接关系标识表征多个节点之间的连接关系,所述数量标识表征所述多个节点的数量,所述多个节点是由同一所述聚合节点展开一次后得到的,所述多个节点的哈希值相同,且所述多个节点之间的连接关系为串行连接或并行连接。
  24. 根据权利要求23所述的装置,其特征在于,
    当所述节点为所述聚合节点时,所述节点的哈希值等于所述聚合节点展开得到的各节点的哈希值之和,当所述节点为所述计算节点时,所述节点的哈希值由所述计算节点的属性决定,所述计算节点的属性包括所述计算节点的类型、入度、出度、附属节点的类型和附属节点的数量。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被执行时,权利要求1-12中任意一项所述的方法得以实现。
  26. 一种计算机程序,其特征在于,该计算机程序包括指令,当该计算机程序被执行时,权利要求1-12中任意一项所述的方法得以实现。
PCT/CN2022/132189 2021-11-19 2022-11-16 二部图构建方法、显示方法和装置 WO2023088288A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111381794.7A CN116152269A (zh) 2021-11-19 2021-11-19 二部图构建方法、显示方法和装置
CN202111381794.7 2021-11-19

Publications (1)

Publication Number Publication Date
WO2023088288A1 true WO2023088288A1 (zh) 2023-05-25

Family

ID=86356852

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132189 WO2023088288A1 (zh) 2021-11-19 2022-11-16 二部图构建方法、显示方法和装置

Country Status (2)

Country Link
CN (1) CN116152269A (zh)
WO (1) WO2023088288A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827619A (zh) * 2024-02-29 2024-04-05 浪潮电子信息产业股份有限公司 异构算力的耗时预测仿真方法、装置、设备、介质及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125289A1 (en) * 2014-10-30 2016-05-05 International Business Machines Corporation Mapping graphs onto core-based neuromorphic architectures
CN111935005A (zh) * 2020-08-07 2020-11-13 腾讯科技(深圳)有限公司 数据传输方法、装置、处理设备及介质
CN112990265A (zh) * 2021-02-09 2021-06-18 浙江师范大学 基于二部图的后期融合多视图聚类机器学习方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125289A1 (en) * 2014-10-30 2016-05-05 International Business Machines Corporation Mapping graphs onto core-based neuromorphic architectures
CN111935005A (zh) * 2020-08-07 2020-11-13 腾讯科技(深圳)有限公司 数据传输方法、装置、处理设备及介质
CN112990265A (zh) * 2021-02-09 2021-06-18 浙江师范大学 基于二部图的后期融合多视图聚类机器学习方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827619A (zh) * 2024-02-29 2024-04-05 浪潮电子信息产业股份有限公司 异构算力的耗时预测仿真方法、装置、设备、介质及系统
CN117827619B (zh) * 2024-02-29 2024-05-24 浪潮电子信息产业股份有限公司 异构算力的耗时预测仿真方法、装置、设备、介质及系统

Also Published As

Publication number Publication date
CN116152269A (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2022048280A1 (zh) 一种分布式的量子计算仿真方法和装置
US8326825B2 (en) Automated partitioning in parallel database systems
US8237716B2 (en) Algorithm for drawing directed acyclic graphs
Durand et al. Complexity and optimality of the best response algorithm in random potential games
CN104346629B (zh) 一种模型参数训练方法、装置及系统
US9466041B2 (en) User selected flow graph modification
JP2021105980A (ja) 異種グラフにおけるノード表現の生成方法、装置及び電子機器
WO2022057303A1 (zh) 一种图处理的方法,系统以及装置
US20100161532A1 (en) Determination of graph connectivity metrics using bit-vectors
CN103455531B (zh) 一种支持高维数据实时有偏查询的并行索引方法
US20130163467A1 (en) Shortest path determination for large graphs
WO2014078424A1 (en) Quantitative assessment of similarity of categorized data
US20180253653A1 (en) Rich entities for knowledge bases
WO2023088288A1 (zh) 二部图构建方法、显示方法和装置
Bertolazzi et al. Quasi-upward planarity
Souravlas et al. Hybrid CPU-GPU community detection in weighted networks
CN115168281A (zh) 一种基于禁忌搜索算法的神经网络片上映射方法和装置
CN110853120B (zh) 基于分割绘图法的网络布局方法、系统及介质
Shi et al. Partitioning dynamic graph asynchronously with distributed FENNEL
Papadopoulos et al. Drawing graphs using modular decomposition
Falih et al. Muna: A multiplex network analysis library
US11947503B2 (en) Autoregressive graph generation machine learning models
Li et al. An improved distributed query for large-scale RDF data
Marques et al. A cloud computing based framework for general 2D and 3D cellular automata simulation
Wei et al. Accelerating the shortest-path calculation using cut nodes for problem reduction and division

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22894820

Country of ref document: EP

Kind code of ref document: A1