CN117544552A

CN117544552A - Automatic construction system and method for data exchange communication path

Info

Publication number: CN117544552A
Application number: CN202311385289.9A
Authority: CN
Inventors: 李一鹏; 王迎港; 郭冉; 张文骁; 成诚
Original assignee: Beijing Oneflow Technology Co Ltd
Current assignee: Beijing Oneflow Technology Co Ltd
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-02-09

Abstract

The invention discloses an automatic construction system of a data exchange communication path, which is used for distributed data processing. The system comprises: an initial logical node generating component for receiving task configuration data input by a user, generating an initial logical node topology map for the distributed data processing system, each initial logical node being attached with a predetermined node attribute; a communication through path determining component that determines that a candidate communication through path exists in a case where the SBP distributed descriptor of the output logical tensor is different from the SBP distributed descriptor of the input logical tensor; and a communication indirect path acquisition component that employs a shortest path method to acquire one or more candidate communication indirect paths comprising one or more intermediate tensors described using different SBP distributed descriptors; and an intermediate tensor generation node insertion component inserts an intermediate tensor generation node with the current intermediate tensor as an output tensor between any intermediate tensors, thereby obtaining a resulting logical node topology.

Description

Automatic construction system and method for data exchange communication path

Technical Field

The present disclosure relates to a data processing technique. More particularly, the present disclosure relates to an automatic construction system and method for a data exchange communication path of a distributed data processing system, thereby enabling automatic arrangement of data exchange communication.

Background

With the popularity of distributed computing, large jobs or large logical tensors deploy different portions of data to various computing devices of different distributed data processing systems for processing by segmentation, and require interactions of intermediate parameters during the computing process of each portion. Thus, during the processing of a particular job, the computational intermediate parameters or results deployed on one computing device may be referred to as input data for a computational task on another computing device, which may cause data transfer overhead between computing devices. In the case of large job data, the transmission overhead between such different computing devices would impose a significant computational burden on the distributed data processing system. Therefore, the inventor of the present application proposes an invention application (bulletin number CN 110955734B) with application number of 202010090335.2 named as "distributed signature decision system of logical node and method thereof" to the chinese patent office in month 13 of 2020, and the invention patent proposes an SBP signature decision system capable of minimizing the data exchange amount between different computing devices in the process of processing data of static distributed data processing system from a global perspective, thereby reducing the overhead generated in the process of data interaction, and thus effectively reducing the adverse effects of data exchange on actual operation. The patent of this invention is incorporated by reference into the specification of this application as if set forth in its entirety herein.

With the development of a large-scale deep learning framework, one-dimensional SBP (1 DSBP) cannot meet the requirement of describing parallel strategies, and with the increase of the number of card machines, only high-dimensional SBP (nDSBP) can describe and customize the optimal fastest parallel strategy. And the 1DSBP is expanded to the high-dimensional nDSBP, so that not only is the difficulty in selecting parallel strategies increased, but also new challenges are brought to the execution of the underlying strategy. Meanwhile, the combination of pipeline parallelism and data parallelism, model parallelism or automatic parallelism also provides new requirements for the underlying communication.

In a neural network or an initial logical node topology, the edges of the graph between two adjacent logical nodes or computation nodes often represent a communication, i.e. a transmission of a tensor (where noted, some edges do not represent a communication, e.g. a control edge, which is usually used to control the sequencing of operators, without any transmission). Specifically, the state of one tensor in the upstream and downstream may be different, that is, the distribution manner of the tensor described by the distributed SBP descriptor of the tensor output by the upstream logical node is different from the distribution manner of the tensor described by the distributed SBP descriptor of the input tensor required by the upstream logical node, for example, the SBP descriptor of the tensor output in the upstream is S0 (dimension 0 division), the SBP descriptor of the tensor required to be received in the downstream is B (broadcast tensor), and thus, the data exchange communication process expressed by the communication primitive such as AllGather is performed once from the upstream to the downstream. Taking the 2DSBP as an example, for example, the SBP descriptor of the tensor output upstream is (P, S1), and the SBP descriptor of the tensor required to be received downstream is (P, B), so that from upstream to downstream, each second hierarchy requires an s1→b communication of AllGather, that is, (P, S1) → (P, B). Similarly, for the (P, S1) → (B, S1) example, communication of a communication primitive such as AllReduce of p→b, which is first classified once, is required from upstream to downstream. However, not all the distributed descriptors can directly have corresponding communication primitives for data exchange communication, for example, (P, S1) → (B, B) and (S0, S1) → (B, B) cannot realize data exchange communication through the existing communication primitives for collective communication, that is, data transformation communication such as (P, S1) → (B, B) and (S0, S1) → (B, B) cannot be performed in a way of one AllGather at present, because the data arrangement is not simply stacked vertically or horizontally, but combined vertically and horizontally according to a certain arrangement.

For the case that the existing basic communication primitive cannot be used to implement the one-time data exchange communication, the cost of data transmission is considered to be infinite, and is therefore often avoided in the prior art, for this reason, the case that the SBP signature capable of implementing the one-time data exchange communication with the existing basic communication primitive is used to avoid the cost of transmission from being infinite in the candidate SBP signature of the selected logic node is adopted in the prior art. In some distributed data processing systems, however, some SBP signatures have special effects, and even some logical nodes 'SBP signatures are assigned and not modifiable, which requires more effort to make adjustments to implement such data exchange communications in the event of non-modifiable situations, which can greatly consume a technician's effort to handle such data exchange communications. Such unrealizable data exchange communications typically exist between tensors described by different high-dimensional SBPs (ndsbps).

Thus, in the case where data exchange communication between the high-dimensional SBPs (ndsbps) cannot be accomplished between tensors described by different high-dimensional SBPs (ndsbps) through the underlying communication primitives, how to automatically implement the same as the tensors described by the existing one-dimensional SBPs capable of automatically implementing data exchange communication through the underlying communication primitives is a long-felt result expected by those skilled in the art. That is, a method is desired that automatically implements such a system that cannot directly use existing basic communication primitives at a time to complete data exchange communications.

Disclosure of Invention

Therefore, the automatic construction system of the data exchange communication path based on the inventor of the application provides possibility for solving the technical problems. The application provides an automatic construction system of a data exchange communication path, which is used for distributed data processing, and comprises the following steps: an initial logical node generating component that receives task configuration data entered by a user, generates an initial logical node topology map for the distributed data processing system, each initial logical node being configured to perform a predetermined data processing operation and being appended with a predetermined node attribute, the node attribute comprising a location tag of a logical data processing device to which the initial logical node belongs, the location tag representing a deployment structure of the logical data processing device to which the initial logical node belongs being appended based on the task configuration data, each SBP distributed signature in the candidate SBP distributed signature set specifying a one-dimensional or multi-dimensional SBP distributed descriptor for each input logical tensor to which it belongs and a one-dimensional or multi-dimensional SBP distributed descriptor for each output logical tensor; a communication through path determining component traversing each initial logical node as a current initial logical node, based on the SBP distributed descriptor of the input logical tensor of each input of each SBP distributed signature of its candidate SBP distributed signature set and the SBP distributed descriptor of the output logical tensor of the output of the upstream initial logical node corresponding to the input, determining that a candidate communication through path exists between the current initial logical node and the upstream initial logical node of a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the SBP distributed descriptor of the input logical tensor can be completed by only one basic communication primitive of collective communication in case that the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor are different; a communication indirect path obtaining component that obtains one or more pieces by a shortest path method, transforms, via one or more intermediate tensors described by different SBP distributed descriptors, a tensor described by an SBP distributed descriptor of the output logical tensor to a candidate communication indirect path of a tensor described by an SBP distributed descriptor of the input logical tensor, between any two adjacent tensors of the candidate communication indirect path by one basic communication primitive of collective communication, in a case where the communication direct path determining component determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor by only one basic communication primitive of collective communication; and an intermediate tensor generation node insertion component that inserts, for any current intermediate tensor of the selected candidate communication indirect path, an intermediate tensor generation node having the current intermediate tensor as an output tensor, based on a generation result of the communication indirect path acquisition component, an SBP distributed descriptor having an SBP distributed descriptor of the upstream tensor as an input tensor and an SBP distributed descriptor of the current intermediate tensor as an output tensor, thereby obtaining a result logical node topology map.

A data exchange communication path automatic construction system according to the present disclosure, further comprising: a communication path library component storing a transmission cost conversion table storing corresponding communication transmission costs for converting tensors described by any SBP distribution descriptor into tensor communication indirect paths described by another, different SBP distribution descriptor; wherein the communication indirect path obtaining component first queries a communication path library component based on the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor to obtain a corresponding communication indirect path and obtains the candidate communication indirect path using a shortest path method without querying to obtain the corresponding communication indirect path, in case that the communication direct path determining component determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication.

The data exchange communication path automatic construction system according to the present disclosure, wherein the communication indirect path acquisition component stores the transmission cost of the candidate communication indirect path in the transmission cost conversion table of the communication path library component after generating the candidate communication indirect path.

A data exchange communication path automatic construction system according to the present disclosure, wherein the case where there is no communication through path capable of completing a communication transformation from a tensor described by an SBP distributed descriptor of the output logical tensor to a tensor described by an SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication is one of the following cases: the current initial logical node and the upstream initial logical node are each deployed with the same computing device and level division, but the SBP distributed descriptor of the output logical tensor is different from the SBP distributed descriptor of the input logical tensor in that there are at least two or more corresponding dimensions, or the SBP distributed descriptor of the output logical tensor is the same as the SBP distributed descriptor of one of the SBP distributed descriptors of the input logical tensor in that there are two different dimensions and the first dimension of the corresponding two dimensions in the other has a different SBP distributed descriptor and the same SBP distributed descriptor must be a split descriptor; the current initial logical node and the upstream initial logical node are respectively deployed with the same computing equipment but different in level division; and the current initial logical node is different from the computing devices respectively deployed by the upstream initial logical nodes.

The data exchange communication path automatic construction system according to the present disclosure further includes: and the communication path selection component searches a transmission cost conversion table stored by the communication path library component for the candidate communication through path or the candidate communication indirect path determined by the current initial logic node and the upstream initial logic node based on the communication through path determination component or the communication indirect path acquisition component, and selects a communication path with the minimum transmission cost as a communication path between the current initial logic node and the upstream initial logic node to obtain a logic node topological graph with the minimum transmission cost result.

According to another aspect of the present disclosure, there is also provided a method for automatically constructing a data exchange communication path, including: receiving, by an initial logical node generating component, task configuration data input by a user, generating an initial logical node topology map for the distributed data processing system, each initial logical node being configured to perform a predetermined data processing operation and being appended with a predetermined node attribute, the node attribute comprising a location tag of a logical data processing device to which the initial logical node belongs, the location tag representing a deployment structure of the logical data processing device to which the initial logical node belongs being appended based on the task configuration data, each SBP distributed signature in the candidate SBP distributed signature set specifying a one-dimensional or multi-dimensional SBP distributed descriptor for each input logical tensor for the initial logical node to which it belongs and a one-dimensional or multi-dimensional SBP distributed descriptor for each output logical tensor; traversing, by a communication pass-through path determining component, each initial logical node as a current initial logical node, based on the SBP distributed descriptor of the input logical tensor of each input of each SBP distributed signature of its candidate SBP distributed signature set and the SBP distributed descriptor of the output logical tensor of the output corresponding to the input, determining that a candidate communication pass-through path exists between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor by only one basic communication primitive of the collective communication in case that the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor are different; and obtaining, by a communication indirection path obtaining component, one or more pieces of communication indirection paths by a shortest path method, in a case where a communication indirection path that is capable of completing a communication transformation from a tensor described by an SBP distributed descriptor of the output logical tensor to a tensor described by an SBP distributed descriptor of the input logical tensor by only one basic communication primitive of collective communication does not exist between a current initial logical node and the upstream initial logical node, transforming the tensor described by the SBP distributed descriptor of the output logical tensor to a candidate communication indirection path of the tensor described by an SBP distributed descriptor of the input logical tensor by one basic communication primitive of collective communication between any two adjacent tensors of the candidate communication indirection paths, the transformation of an upstream tensor to a downstream tensor in the two adjacent tensors being completed by one basic communication primitive of collective communication; and inserting, by an intermediate tensor generating node inserting component, for any current intermediate tensor of the selected candidate communication indirect path based on the generation result of the communication indirect path obtaining component, an intermediate tensor generating node having the current intermediate tensor as an output tensor, the intermediate tensor generating node having an upstream tensor of the current intermediate tensor as an input tensor, and the SBP distributed signature being given an SBP distributed descriptor having an SBP distributed descriptor of the upstream tensor as an SBP distributed descriptor of its input tensor and an SBP distributed descriptor of the current intermediate tensor as an SBP distributed descriptor of its output tensor, thereby obtaining a result logical node topology map.

The automatic construction method of the data exchange communication path according to the present disclosure further includes: storing, by the communication path library component, a transmission cost scaling table, wherein each entry of the transmission cost scaling table stores a corresponding communication transmission cost for converting the tensor described by any SBP distribution descriptor into a tensor communication indirect path described by a different SBP distribution descriptor; wherein the communication indirect path obtaining component first queries a communication path library component based on the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor to obtain a corresponding communication indirect path and obtains the candidate communication indirect path using a shortest path method without querying to obtain the corresponding communication indirect path, in case that the communication direct path determining component determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication.

According to the automatic construction method of the data exchange communication path, the communication indirect path acquisition component stores the transmission cost of the candidate communication indirect path in the transmission cost conversion table of the communication path library component after acquiring the candidate communication indirect path.

According to the data exchange communication path automatic construction method of the present disclosure, the case where there is no communication through path capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication is one of the following cases: the current initial logical node and the upstream initial logical node are each deployed with the same computing device and level division, but the SBP distributed descriptor of the output logical tensor is different from the SBP distributed descriptor of the input logical tensor in that there are at least two or more corresponding dimensions, or the SBP distributed descriptor of the output logical tensor is the same as the SBP distributed descriptor of one of the SBP distributed descriptors of the input logical tensor in that there are two different dimensions and the first dimension of the corresponding two dimensions in the other has a different SBP distributed descriptor and the same SBP distributed descriptor must be a split descriptor; the current initial logical node and the upstream initial logical node are respectively deployed with the same computing equipment but different in level division; and the current initial logical node is different from the computing devices respectively deployed by the upstream initial logical nodes.

The automatic construction method of the data exchange communication path according to the present disclosure further includes: and searching a transmission cost conversion table stored by a communication path library component for the candidate communication through path or the candidate communication indirect path determined by the current initial logic node and the upstream initial logic node based on a communication through path determining component or a communication indirect path acquiring component, and selecting a communication path with the minimum transmission cost as a communication path between the current initial logic node and the upstream initial logic node to obtain a logic node topological graph with the minimum transmission cost result.

Thus, in the case where data exchange communication between the high-dimensional SBPs (ndsbps) cannot be accomplished between tensors described by different high-dimensional SBPs (ndsbps) through the underlying communication primitives, how to automatically implement the same as the tensors described by the existing one-dimensional SBPs capable of automatically implementing data exchange communication through the underlying communication primitives is a long-felt result expected by those skilled in the art. That is, a method is desired that automatically implements such a system that cannot directly use existing basic communication primitives at a time to complete data exchange communications. By automatically constructing the data exchange communication path according to the system and the method, technicians performing complex distributed data processing face the problem that data exchange communication between different high-dimensional SBPs (nDSBP) can not be achieved between tensors described by different high-dimensional SBPs (nDSBP) through basic communication primitives, the problem of data exchange communication in the data processing process is solved only by automatically solving the problem by the system without spending more efforts to write lengthy codes completely, and only the technicians focused on study on the data to be processed per se is needed, so that the technicians are greatly liberated, the requirements of the data processing technicians on knowledge constitution are reduced (the technicians do not need to become specialists in the aspect of data exchange communication), and the obstacle of the common data processing technicians in the field of data processing is eliminated.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

Fig. 1 is a schematic diagram of a data exchange communication path auto-build system 100 for a distributed data processing system according to the present disclosure.

Detailed Description

The present invention is described in further detail below with reference to examples and drawings to enable those skilled in the art to practice the same and to refer to the description.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, one of the two possible devices may be referred to hereinafter as a first logically distributed signature or a second logically distributed signature, and similarly the other of the two possible devices may be referred to as a second logically distributed signature or a first logically distributed signature, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In order that those skilled in the art will better understand the present disclosure, the present disclosure will be described in further detail below with reference to the accompanying drawings and detailed description.

Deep learning is essentially one of feature learning, from which point of view deep learning can be applied directly to extract features from the raw data. And an automatic encoder is one of important models for realizing this function of feature extraction.

Before setting forth the specific embodiments of the present disclosure, it should be noted that SBP signatures in accordance with the present disclosure are signatures that are employed in a distributed data processing system. In a distributed data processing system, because there are often cases of data parallelism, model parallelism, mixed parallelism, streaming parallelism, and the like, tasks of adjacent logic nodes are often deployed on different computing devices at the same time, so that in an actual data processing process, intermediate parameters are exchanged between the computing devices, which causes a great deal of transmission overhead. For this reason, in order to reduce the data transmission overhead, more logical nodes need to be further generated on the basis of the initial logical node topology map 101, so as to perfect the logical node topology map, and in particular, reduce the transmission overhead between the upstream and downstream logical nodes, so that the change caused by the data distribution manner of the upstream and downstream logical nodes needs to be minimized. To this end, the present disclosure designates a logically distributed signature for each logical node in order to obtain a better downstream logical node. The logic distributed signature is a signature of logic nodes by adopting distributed descriptors of logic tensors, wherein the distributed descriptor of each logic tensor describes the distribution mode of each logic tensor in the whole computing system, and mainly comprises a Segmentation (SPLIT) logic tensor descriptor, a BROADCAST (BROADCAST) logic tensor descriptor and a PARTIAL VALUE (PARTIAL VALUE) logic tensor descriptor.

In particular, a SPLIT (SPLIT) logical tensor descriptor is a SPLIT way of describing one logical tensor, for example, splitting one tensor in a specified dimension according to a user description, and distributing the SPLIT to different computing devices for a specified computing process. If a tensor is a two-dimensional tensor, when the tensor is cut in its 0 th dimension, the distributed descriptors of the logical tensors of the data of the batch of data formed by the tensor are S (0), and each logical tensor obtains at its input the distributed descriptors of such logical tensors of the data are S (0). Likewise, if a tensor is a two-dimensional tensor, when the tensor is cut in its 1 st dimension, the distributed descriptors of the logical tensors of the data of the batch of data formed by the tensor are S (1), and each logical tensor obtains at its input the distributed descriptors of such logical tensors of the data are S (1). Similarly, if the dimension of the task data to be processed is more, there will be more distributed descriptors, e.g., S (2), S (3) …, etc. Such mentioned data may be processed data or a model. If the data itself is cut, then data parallel processing is formed on the distributed data processing system, and if the model is split, then model parallel processing is formed on the distributed data processing system. If the input of the logic node is such a SPLIT (SPLIT) logic tensor descriptor, during the actual data processing process, if the data size of one logic tensor is T, and the logic tensor will be distributed to four computing cards for data parallel computation, the data amount allocated to each card is one-fourth of the data amount, and the data amount on the whole four cards is T. If a tensor is divided in dimension 0, and then divided again in dimension 1 for the divided tensor, the distribution tree descriptor of the tensor is a two-dimensional distribution descriptor (S (0), S (1)). If a tensor is split in dimension 0, and then split further in dimension 0 for the split tensor formed after splitting, the distribution tree descriptor of the tensor is a two-dimensional distribution descriptor (S (0), S (0)). And so on. The distributed descriptors may also be three-dimensional or more.

BROADCAST (BROADCAST) logical tensor descriptor is used to describe the way a logical tensor is published in a distributed system in a BROADCAST manner. In general, for a data processing system that performs only data parallelism, model data is typically broadcast to individual computing devices, and thus broadcast logical tensor descriptors are employed for broadcast data input to logical nodes. In the actual data processing process, the tensor size of the broadcasted data on each actual computing card is the same. If a distribution of tensors is broadcast first and then split on dimension 0 for the broadcast tensor, the distribution tree descriptor for that tensor is a two-dimensional distributed descriptor (B, S (0)). Similarly, if a tensor is split on dimension 0 before each sliced tensor is broadcast, the distribution tree descriptor of the tensor is a two-dimensional distributed descriptor (S (0), B). And so on.

The PARTIAL VALUE (PARTIAL VALUE) logical tensor descriptor indicates that an input or output logical tensor of one logical node is a PARTIAL VALUE of a plurality of homogeneous logical tensors. These partial values include partial sums, partial products, partial ANDs, partial maxima, and partial minima. Since data is typically processed in parallel for data, processing of the data on different devices is processing of portions of the data. For example, if some of the logical tensors are S (0) or S (1), then the result logical tensors are obtained on some computing devices, and the result logical tensors on these computing devices are combined to form a partial value logical tensor. Combining the same kind of data on all devices is the final output result.

The above-described distributed descriptors of various logical tensors represent the distribution of these logical tensors in a distributed computing system, and the respective distribution of these logical tensors, whether as inputs and outputs of the logical nodes, also describes the distributed description of the operational data by the logical nodes. For descriptive convenience, this disclosure refers to this distributed description Fu Jian as an "SBP distributed descriptor", also referred to as an "SBP descriptor", both having the same meaning.

The logical nodes in the logical topology for performing data distributed processing, i.e. some of the operation nodes, are also provided with tensors or data "SBP distributed descriptors" of the respective inputs and outputs which form a signature for the logical nodes, i.e. the signature of the operation logical nodes with the distributed descriptors of the logical tensors. For convenience of description, the english initials of these three distributed descriptors are used to refer to this signature as "SBP signature", also called "SBP distributed signature", both having the same meaning.

Such descriptors may include at least three S, B and P, depending on the user's description of the computing task and the data parallelism requirements in each distributed computing system. If there are multiple partitioning modes for the data and model, then each partitioning mode is added, then a descriptor is added. If a tensor is split in two different dimensions sequentially or simultaneously, its distribution descriptor is a two-dimensional distribution descriptor as described above. If a tensor is distributed in two distribution ways, its distribution descriptor may be a two-dimensional distribution descriptor as described above. If a tensor is split in one dimension before the split tensor is split in the same dimension, the distribution descriptor is also a two-dimensional distribution descriptor as described above. Similarly, a distributed SBP descriptor may be three-dimensional or more. For each logical node, its signature contains various combinations of these descriptors. Thus, for one-dimensional SBP descriptors in accordance with the present disclosure, there are at least three, and typically four, distributed descriptors, e.g., the following four SBP descriptors, S (0), S (1), P, and B. There may be more distributed descriptors depending on the number of logical tensor dimensions. In the case of four SBP descriptors, multiple SBP signatures may be formed in a permutation and combination of inputs and outputs. Some SBP signatures are listed below: examples of one-dimensional SBP signatures, for example: (S (0), B) S (0), (S (1), B) S (1), P-P, B, B, (S (0), S (1)) -P, S (0) P, S (0) S (0), S (0) S (1), P-B. For a two-dimensional SBP signature, which is composed of two-dimensional distributed descriptors, which are combined from one-dimensional distributed descriptors, for example, (S (0), S (0)), (S (1), S (1)), (S (0), B)), (S (1), B)), (B, B)), (P, S (0)), and so on, the two-dimensional SBP signature is, for example: [ (S (0), S (0)) (B, B) → (S (0), S (0)) ], [ (S (1), S (1)) (B, B) → (S (1), S (1)) ], [ (S (0), B) (S (1), S (1))→ (P, S (1)) ], [ (S (0), B) (B, S (1))→ (0), S (1)) ], and the like. The SBP signature may also have more dimensions, such as three or four dimensions or more, depending on the actual situation. All SBP signatures are a result of various SBP descriptor combinations. For a matrix multiplication logical node, if its input logical tensor is cut in the first dimension, its output result logical tensor is also cut in the first dimension. In summary, S, B, P is a descriptor for describing the distribution of tensors in a data processing system, while SBP signatures describe task operations of logical nodes with multiple SBP descriptors. There may be multiple SBP descriptors per tensor, and the manner of operation represented by each logical node may be the case for multiple SBP signatures. For example, SBP-1 may be in the form of a signature of (S (0), B) →S (0), while SBP-2 may be in the form of a signature of (S (1), B) →S (1). In practical applications, different signature forms may have different numbers, where the numbers are given only for convenience of description, and do not mean that each signature needs to be given a number, and there may be no number at all, and different forms of signatures may be distinguished from each other without a number. For example, SBP-1 may be a two-dimensional SBP signature, such as [ (S (0), B) (B, S (1)) → (S (0), S (1)) ].

Each initial logical node may be given an SBP signature as described above based on the user's task description. Typical task logical nodes are some arithmetic operation nodes that perform a particular arithmetic operation and thus have a particular candidate SBP signature. It should be noted that not every task logical node has the same SBP signature, and that the task logical node that normally performs a multiplication operation does not have its SBP signed input logical tensor containing part and logical tensor, and therefore its SBP descriptor containing no distributed descriptor P. The candidate SBP signatures for the task logic nodes performing the addition operation may then include any combination of the various SBP descriptors with each other or with themselves. For example, task logic nodes performing matrix multiplication, in the case of data-only parallelism, whose candidate SBP signatures are typically (S (0), B) →S (0), (S (1), B) →S (1), (S (0), S (1))→P, etc., but not only this, as technology advances, some signatures previously unsuitable for matrix multiplication may also be applied to matrix multiplication, just to name a few. With a two-dimensional SBP signature [ (S (0), B) (B, S (1)) → (S (0), S (1)) ], for a logical node with such an SBP signature, the tensor descriptors of its two inputs, namely (S (0), B) and (B, S (1)), and the tensor descriptors of its outputs (S (0), S (1)), namely the tensor descriptors of the two-dimensional SBP signature, are also two-dimensional. The descriptor (S (0), B) of the first tensor means that the first tensor is first split in dimension 0 (here referred to as the dimension of the tensor itself) (i.e. S (0) of the first dimension)) into a plurality of first split tensors, then the split plurality of first split tensors are spatially broadcast or are output continuously in time (i.e. B of the second dimension), the descriptor (B, S (1)) of the second tensor means that the first tensor is first spatially broadcast, then the second tensor is split in dimension 1 (here referred to as the dimension of the tensor itself) (i.e. S (1) of the second dimension) into a plurality of second split tensors, and finally the distribution descriptor of the resulting tensor formed by the first tensor and the second tensor processed by the roadbed nodes is (S (0), S (1)). Each initial logical node is accompanied by a candidate set of logical distributed signatures based on the task configuration data. Each logical distributed signature in the candidate set of logical distributed signatures specifies a distributed descriptor for each input logical tensor and a distributed descriptor for each output logical tensor for the initial logical node to which it belongs.

Fig. 1 is a schematic diagram of a data exchange communication path auto-build system 100 for a distributed data processing system according to the present disclosure. As shown in fig. 1, the automatic switched data switched communication path construction system 100 includes at least an initial logical node generating component 110, a communication through path determining component 120, a communication indirect path obtaining component 130, and an intermediate tensor generating node inserting component 140. The initial logical node generating component 110 receives user-entered task configuration data, generates an initial logical node topology for the distributed data processing system, each initial logical node for performing a predetermined data processing operation and is appended with a predetermined node attribute comprising a location tag of the logical data processing device to which the initial logical node belongs, the location tag representing a deployment structure of the logical data processing device to which the initial logical node belongs, a set of candidate SBP distributed signatures specifying a one-dimensional or multi-dimensional SBP distributed descriptor for each input logical tensor for each output logical tensor for each SBP distributed signature in the set of candidate SBP distributed signatures. Specifically, after the job is input, the initial logical node generating component 110 automatically breaks the job into a plurality of tiny job tasks based on the job description input by the user, where the tiny job tasks are composed of various operation components, and the operation components are interconnected as logical nodes after each other to form a preliminary logical tensor processing neural network topology map. Each of these neural networks includes a plurality of logical nodes, and two adjacent neural networks are connected to each other, so as to provide a guide for the arrangement or the position mark (PLACEMENT) of the execution body for executing the actual job processing in the distributed data processing system. The location markers indicate that each logical node is deployed on those computing devices that are divided into several levels and how many computing devices each level contains. These computing devices are numbered based on this deployment and hierarchy, so that the components to be described later learn about the tensors they use and the specific distribution of the tensors that they output by having the computing logical nodes provided with these location markers.

A simple initial logical node topology 101 is only schematically shown in fig. 1, where nodes A, B, C, D, E, F, L and K are shown. Other omitted alternatives are not shown. In actual data processing, the initial node topology 101 may be more complex. The initial logical node topology 101 contains basic operational nodes that implement the computational tasks described by the user. The manner in which this initial logical node topology 101 is generated is conventional in the art and is therefore not described in detail herein. The various initial logical nodes in the initial logical node topology 101 each contain a plurality of SBP signatures. As the source logical node that has been configured with the SBP signature by the user or the initial logical nodes that have determined the SBP signature based on the user's task description, such as initial logical nodes A, E and B, have only unique SBP signatures, such as SBP-1 for initial logical node A, SBP-2 for initial logical node C and SBP-3 for initial logical node E. While other initial logical nodes contain some of their inherent SBP signatures candidate. The initial logical node B, as in FIG. 1, has a plurality of candidate SBP signatures, e.g., three, including SBP-1, SBP-2, and SBP-3. Other initial logical nodes also each have a different candidate SBP signature, not listed here. Different initial logical nodes will have different fixed candidate SBP signatures depending on the operation they specifically perform. Sometimes, on some critical computing logical nodes, the user needs to pre-determine his unique SBP signature for the entire task to execute. Such a fixed SBP signature has an unalterable case.

Although the initial logical node generating component 110 generates the initial logical node topology 101, each logical node in the initial logical node topology 101 will need to further determine the logical tensor determined using which SBP signature or which distributed logical tensor to use and which distributed logical tensor to input. The present disclosure does not relate to the determination of the SBP signature of the initial logical node, which may be performed using the previously disclosed patent techniques of the present application, for example, by selecting the smallest transmission cost among the candidate SBP signatures after calculating the cost using the different SBP signatures, or by means of global transmission costs, and thus will not be described in detail.

The communication cut-through path determining component 120 traverses each initial logical node as a current initial logical node, based on the SBP distributed descriptor of the input logical tensor of each input of each SBP distributed signature of its candidate SBP distributed signature set and the SBP distributed descriptor of the output logical tensor of the output corresponding to the input, in case the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor are different, determines that a candidate communication cut-through path exists between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor by only one basic communication primitive of the collective communication.

Thus, the communication cut-through path determining component 120 of the logical node data exchange communication path automatic construction system 100 according to the present disclosure, starting from the source logical node in the initial logical node topology map 101, determines whether the data required to be transmitted to transform the distributed descriptor of the logical tensor of each upstream logical node output into the distributed descriptor of the logical tensor of one of the candidate logical distributed signatures of the corresponding input of the logical node B can be implemented with only one communication primitive based on the distributed descriptors of the outputs of all upstream logical nodes of the logical node B corresponding to the inputs of the logical node B when the logical labels or SBP labels of all upstream logical nodes (e.g., logical nodes a and E) of the current logical node (e.g., logical node B) have been determined. As shown in FIG. 1, a logical node B has many candidate SBP signatures, such as SBP-1, SBP-2, and SBP-3. For example, SBP-1 may be in the form of a signature of (S (1), B) →S (1) or (S (1), P) →S (1), [ (S (0), B) (B, S (1))→ (S (0), S (1)), [ (S (0), S (0)) (B, B) → (0), S (0)) ], [ (S (0), B) (S (1), S (1))→ (P, S (1)) ] ], SBP-5 may be in the form of a signature of (S (0), B) →S (0), and SBP-3 may be in the form of B→B or S (0) →P. In each signature form, the left side of the arrow is a distributed descriptor of the input logical tensor and the right side of the arrow is a distributed descriptor of the output logical tensor. For convenience of description, the "logical tensor of the" distribution descriptor "S (0) will be hereinafter abbreviated as" S (0) logical tensor ", the" logical tensor of the "distribution descriptor" B "will be abbreviated as" B logical tensor ", the" logical tensor of the "distribution descriptor" P "will be abbreviated as" P logical tensor ", the" logical tensor of the "distribution descriptor" S (0), B "will be abbreviated as" (S (0), B) logical tensor ", the" logical tensor of the "distribution descriptor" S (1) will be abbreviated as "(B, S (1)) logical tensor", the "logical tensor of the" distribution descriptor "S (P, S (1)) will be abbreviated as" (P, S (1)) logical tensor ", the" logical tensor of the "distribution descriptor" S (0), B, S (1)) will be abbreviated as "(S (0), B, S (1)), the" logical tensor of the "distribution descriptor" S (1) (B, S (1)), the "logical tensor" S (1)), the "distribution descriptor" S (0), the "S (1)), the" logical tensor "S (1), and so on" S (1).

As shown in FIG. 1, the tag SBP-3 of the logical node E in the initial logical node topology 101 is in the form of "S (0) →S (0)", its output logical tensor distribution descriptor is S (0), and thus its output logical tensor is S (0) logical tensor. If the signature SBP-3 of the logical node E is in the form of "B→B" or "P→P", the distribution descriptor of the logical tensor it outputs is B or P, and thus its output logical tensor is B or P logical tensor. If the candidate signature SBP-1 of the logical node B, i.e. (S (0), S (1)) → P "), is selected as the determined signature, its distribution descriptor of the input logical tensor at the first input of the output of the corresponding node E must be S (0), i.e. the first input must obtain an S (1) logical tensor, while its distribution descriptor of the input logical tensor at the second input of the output of the node a must be S (0), i.e. the second input must obtain an S (0) logical tensor. It is obvious that at this time, P of the output logical tensor distribution descriptor of the node a is different from S (0) of the input logical tensor of the first input terminal of the node B, and therefore, in order for the logical node B to perform a correct operation, it is necessary to transform the logical tensor of the distribution descriptor P output by the node a into the logical tensor of the distribution descriptor S (0). Also, if the distribution descriptor of the logical tensor output by the node E is S (0), it is inconsistent with the distribution descriptor S (1) of the quantity input sheet of the second input terminal of the node B, and therefore, in order for the logical node B to perform a correct arithmetic operation, it is necessary to transform the logical tensor of the distribution descriptor S (0) output by the node E into the logical tensor of the distribution descriptor S (1).

Thus, in a distributed computing system, since the operational tasks of the respective logical nodes, in particular the computation tasks, are cut and distributed to the respective computing devices (e.g. the computing card CPU, GPU or TPU), in order to finally obtain the correct result, the intermediate parameters need to be synchronized constantly, which involves an exchange of intermediate parameters between the different computing devices. When the SBP descriptor of the output logical tensor contained in the SBP signature of the last logical node is inconsistent with the SBP descriptor of the corresponding input logical tensor of the SBP signature of the current node, the output conversion is typically performed during actual operation, and this conversion process typically requires the retrieval of a portion of the data located on another computing device to form, together with the locally available data, the data needed at the input of the current logical node to conform to the distributed descriptor of the data logical tensor at the input of the current logical node. This process of obtaining partial data from another device is typically performed using a transmission node formed by a communication primitive that gathers communications. For example: broadcast communication (Broadcast), scatter communication (Scatter), protocol communication (Reduce), full protocol communication (All Reduce), gather communication (Gather), full Gather communication (All Gather), protocol Scatter (Reduce Scatter), full exchange communication (All to All), and so forth.

Another chinese patent (CN 112764940B) of the applicant of the present application describes a corresponding situation of SBP distributed descriptors for data exchange communication that can be accomplished with one communication primitive and a transmission cost calculation method thereof. The tables are cited herein below.

Tables 1-6 show a list of transmission costs for transforming from a two-dimensional distributed descriptor listed in a row to a two-dimensional distributed descriptor listed in a column. The two logical tensors can be directly obtained for conversion by directly looking up the tables, which is the required data transmission cost. Although only the two-dimensional SBP distributed descriptor is illustrated here, those skilled in the art can derive more dimensions based on the examples herein, enabling simple mathematical deductions.

Each look-up table is indicative of a computational condition, and those skilled in the art, given the teachings of the present disclosure, may implement modifications to the physical topology formed by the physical computing device and the physical job configuration and the logical relationship of the physical computing device, to initialize the look-up table during further physical operation, to obtain a correct look-up table. In addition, the user can normalize the data blocks (BLOBs) according to the actual operation, so that normalized weights are formed for each specific data block, that is, when the lookup table is initialized by calculating the transmission cost, the transmission coefficient can be generated by using the table, and the transmission cost is obtained by multiplying the coefficient by the data quantity T, so that the lookup table is simplified.

TABLE 1

Go->Column of

(S(j)，S(l))

(S(j)，B)

(B，S(I))

(S(j)，P)

(P，S(I))

(B，B)

(B，P)

(P，B)

(P，P)

(S(i)，S(k))

(1-1/(nm))T

(m-1/n)T

(n-1/m)T

(1-1/n)T

(1-1/m)T

(nm-1)T

(n-1)T

(m-1)T

0

(S(i)，B)

(1-1/n)T

m(1-1/n)T

(n-1)T

(1-1/n)T

0

m(n-1)T

(n-1)T

0

(B，S(k))

(1-1/m)T

(m-1)T

n(1-1/m)T

0

(1-1/m)T

n(m-1)T

0

(m-1)T

0

(S(i)，P)

(m-1/n)T

(2m-1-1/n)T

(m+n-2)T

m(1-1/n)T

(m-1)T

(nm+m-2)T

(m+n-2)T

2(m-1)T

0

(P，S(k))

(n-1/m)T

(n+m-2)T

(2n-1-1/m)T

(n-1)T

n(1-1/m)T

(nm+n-2)T

2(n-1)T

(m+n-2)T

0

(B，B)

0

(B，P)

(m-1)T

2(m-1)T

(m+n-2)T

0

(m-1)T

(nm+m-2)T

0

2(m-1)T

0

(P，B)

(n-1)T

(n+m-2)T

2(n-1)T

(n-1)T

0

(nm+n-2)T

2(n-1)T

0

(P，P)

(nm-1)T

(nm+m-2)T

(nm+n-2)T

(nm-m)T

(nm-n)T

2(nm-1)T

(nm+n-2)T

(nm+m-2)T

0

Conditions are as follows: i-! =j, k-! =l (i is not equal to j and k is not equal to l)

T is the total data size of the data block, when calculating the transmission cost, a table can be used to generate a transmission coefficient, and the transmission cost is obtained by multiplying the coefficient by the data size T.

The first classification divides n primary machines, and the second classification divides m secondary machines

Arranging on the same cluster of machines

The calculation mode is as follows: the overlapping parts are calculated when P is not involved, and the overlapping parts of the sender and the receiver are symmetrical at this time, and the overlapping parts are subtracted from the total data amount required by the receiver.

When P is referred to, if P is the receiver, P can always take the most similar part (S for the transmitter) or the sub-part (B for the transmitter) of the same part as the transmitter, and the transmission cost is minimized by assigning 0 elsewhere.

If P is on the transmitter, P is transmitted through the Reduce channel.

TABLE 2

(S(i)，S(k))->(S(j)，S(l))	k＝l	k！＝l
			i＝j	0	(1-1/m)T
i！＝j	(1-1/n)T	(1-1/(nm))T

Conditions are as follows: t is the total data size of the data block

First dividing n first-stage machines

The second classification divides m secondary machines

Arranging on the same cluster of machines

TABLE 3 Table 3

Conditions are as follows: i=j, k-! =l (i equals j and k is not equal to l)

Grey indicates no effect at all relative to the first table

T is the total data size of the data block, when calculating the transmission cost, a table can be used to generate a transmission coefficient, and the transmission cost is obtained by multiplying the coefficient by the data size T

The arrangement calculation is performed on the same cluster of machines in such a way that the overlapping parts of the sender and the receiver are calculated when P is not involved, and the overlapping parts of the sender and the receiver are symmetrical at this time, and the overlapping parts are subtracted from the total data amount required by the receiver.

If P is on the transmitter, P is transmitted through the Reduce channel.

TABLE 4 Table 4

Conditions are as follows: i-! =j, k=l (i is not equal to j and k is equal to l)

Grey indicates no effect at all relative to the first table

TABLE 5

Conditions are as follows: i=j, k=l (i equals j and k equals l)

Grey indicates no effect at all relative to the first table

Arranging on the same cluster of machines

The calculation mode is that the overlapping parts of the sender and the receiver are calculated when P is not involved, and the overlapping parts are symmetrical when the sender and the receiver are subtracted by the total data amount needed by the receiver.

If P is on the transmitter, P is transmitted through the reduce scanner.

TABLE 6

Conditions are as follows: at this time, whether i and j, k and l are equal or not is no longer important

On a cluster of transfer machines, the first classification divides n primary machines and the second classification divides m secondary machines

On the receiving machine cluster, the first classification divides N primary machines, and the second classification divides M secondary machines

Arranging on the same cluster of machines

The amount of transmission is equal to the amount of data to be accepted when the sender has no P

When the sender has P, the sender performs a Reduce Scatter operation locally and then sends the result to the receiver.

Of course, the effect is equivalent to the corresponding data blocks being directly summed at the recipient (S (j), S (l)), and then, if necessary, additional broadcasting

From tables 1-6 above, it can be seen that in the multidimensional case, as long as the respective SBP distributed descriptors of the upstream and downstream tensors can be converted into one-dimensional SBPs (1D SBPs), the transformation of the data tensors by the data exchange communication can be accomplished by the communication logical nodes formed by one underlying communication primitive, whether or not the computing devices and hierarchies to which the upstream and downstream logical nodes belong are the same. It should be noted, however, that in the case of nDSBP, if the transformation is s→p in the same dimension, the transmission cost is still infinite (the middle needs to pass through a transition tensor described by B in the same dimension, so that two data traffic procedures will occur). In the multidimensional case, if p→p in the same dimension exists on different computing devices, the transmission cost remains infinite. Examples of one-dimensional SBP (1D SBP) that can be converted to are as follows:

{0},[1]:B→{0,1,2,3},[2,2]:(S0,S0)

{0,1,2,3},[2,2]:(B,B)→{0,1,2,3,4,5},[3,2]:(S0,S0)

{0,1,2},[3]:P→{0,1,2,3},[2,2]:(S1,S1)

[2,3]:(B,B)→[2,3]:(S0,S0)

As can be seen from tables 1-6 above, in the multidimensional case, if the computing devices and hierarchies distributed by the logical nodes upstream and downstream are the same, but the SBP descriptors of only one corresponding dimension are different, data exchange communication can be achieved by the communication logical nodes composed of one basic communication primitive. For example:

[2,2]:(S0,S1)→[2,2]:(S0,S2)

[2,2]:(B,S1)→[2,2]:(S0,S1)

[2,3]:(B,B)→[2,3]:(S0,B)

it should be noted, however, that in a special case the transmission cost is infinite, where the distributed computing devices and the hierarchy are the same, if the SBP distributed descriptors of one of the tensors of the logical nodes upstream and downstream have the same split descriptor, if the second one of the co-located SBP descriptors of the other tensor is the same and the first one is different, the transmission cost is infinite, and therefore a basic communication primitive cannot be used to implement the exchange communication of tensor data. For example, (S0, S1) → (S1, S1), (S1, S1) → (B, S1).

If T is used to represent the amount of data of the tensor that needs to be transmitted. Then, in the case of adopting the example of the transmission cost given above, through extension deduction, a calculation process of the cost can be cited, the number and classification cases of the computing devices deployed by the two upstream and downstream logic nodes are expressed in brackets, and the cases of the two upstream and downstream logic nodes deployed on different computing devices and different classifications are expressed in brackets. Taking four computing devices as an example, an example of a transmission cost is computed.

[4] S0 → [4] B, cost=3T (same computing device and same device hierarchy)

[4] S0 → [4] S1, cost= (3/4) T (same computing device and same device hierarchy)

[2,2]: (S0, S0) → [2,2]: (S0, S1), cost= (1/2) T (same computing device and same device hierarchy)

[2,3] (S0, S1) → [3,2]: (P, B), cost=infinity (same computing device and different device hierarchies)

[2,2]: (S0, S0) → [2,2]: (P, S0), cost=infinity (same computing device and same device hierarchy)

{0}, [1]: B → {0,1,2,3}, [4]: S0, cost= (3/4) T (different computing devices and same device hierarchy). Computing devices refer to content within { }. Device classification refers to content within [ ], not just to the dimension of the device classification. It is noted that, for example, {0-7}, [2,4]: (B, S0) - > {0-7}, [4,2]: (S0, S1). This is the same device as well as a different device hierarchy. {0-3}, [2,2]: (B, S0) - > {4-7}, and [2,2]: (S0, S1), which are different devices and the same device hierarchy. {0-7}, [2,4]: (S0, S1) - > {0-7}, [4,2]: (S0, S1), which is the same device and a device hierarchy that is not in communication. Even the same device, the same SBP may require communication because of the different hierarchies.

{0}, [1]: B → {0,1,2,3}, [4]: B, cost=3T (different computing devices and different device hierarchies).

{0}, [1]: B → {0,1,2,3}, [2,2]: (S0, S1), cost = infinity (different computing devices and different device hierarchies)

As can be seen from tables 1-6 above, although some transformation processes can calculate the transmission cost, which does not mean that they can be implemented directly by data exchange communication through a corresponding communication primitive, the transmission cost is very large and considered as infinite. For example, (P, S1) → (B, B) and (S0, S1) → (B, B) cannot realize data exchange communication by the communication primitive of the existing collective communication. This is why a table change is required. Thus, the communication can be directly performed after the generalized basic transmission is realized.

With different computing devices and different device hierarchies, the vast majority of transmission costs are infinite. That is, data conversion communications such as (P, S1) → (B, B) and (S0, S1) → (B, B) are currently not available. For the case that the tensor transformation cannot be implemented by using one-time communication primitive for data exchange communication, the transmission cost is generally considered to be infinite, so that the candidate SPB signature of each logic node is eliminated when the candidate SPB signature is selected later.

In many cases, however, some logical nodes may be assigned a fixed SBP signature or may employ some SBP signature to be more advantageous for subsequent computation, or may be subject to multiple upstream constraints such that the downstream op cannot find a legitimate SBP signature based on the underlying communication primitives, and thus how to implement the data processing tasks of the overall logical topology without being precluded, and thus the present disclosure provides for the communication indirection path acquisition component 130 to find an indirection path communication means to implement data exchange communications that cannot be accomplished with one communication primitive. In the case where the communication indirect path obtaining component 130 determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication, the communication indirect path obtaining component 120 obtains one or more pieces by using a shortest path method, transforms the tensor described by the SBP distributed descriptor of the output logical tensor to a candidate communication indirect path of the tensor described by the SBP distributed descriptor of the input logical tensor by using one or more intermediate tensors described by different SBP distributed descriptors, and completes a transformation from the upstream tensor to the downstream tensor in any two adjacent tensors through one basic communication primitive of collective communication between the two adjacent tensors.

This inability to complete data exchange communications directly with one communication primitive is due to, on the one hand, the lack of an adapted communication primitive due to the inherent format between the tensor's SBP distributed descriptors, and, on the other hand, the different deployment of the computing devices to which each logical node belongs and the different manner in which the tensor is split. There are three main types, by way of example. The first is that the current initial logical node (also referred to as a downstream initial logical node) is the same as the computing device and level partition in which each of the upstream initial logical nodes is deployed, but the SBP distributed descriptors of the output logical tensor are different from those in which there are at least two or more corresponding dimensions in the SBP distributed descriptors of the input logical tensor, or the SBP distributed descriptors of the output logical tensor are the same as those in which there are two different dimensions in one of the SBP distributed descriptors of the input logical tensor and the first dimension in the corresponding two dimensions in the other has a different SBP distributed descriptor. The first is that the current initial logical node is the same as the computing devices each of the upstream initial logical nodes is deployed but the level division is different. The third is that the current initial logical node is different from the computing devices each deployed by the upstream initial logical node.

For the first case, one is that the current initial logical node and the computing device and level partition each deployed by the upstream initial logical node are the same, but the SBP distributed descriptors of the output logical tensor are different from those of the input logical tensor in which there are at least two or more corresponding dimensions. For example: format: (. Also for example: (S0, S1) → (P, B), (S0, S0) → (S1, S2), (P, S0, S1) → (B, S0, S2), (S0, S1) → (S1, S2). For the first case, the other is that the SBP distributed descriptor of the output logical tensor is identical to the Split (Split) SBP distributed descriptor present in one of the SBP distributed descriptors of the input logical tensor and the first dimension in the corresponding two dimensions of the other has a different SBP distributed descriptor. For example, the SBP distributed descriptor form of the upstream and downstream tensor is (. In this case, although only the SBP descriptors of the corresponding one dimension are different, the equivalent transformation into a one-dimensional SBP (1D SBP) case is still impossible due to the different segmentation of the previous dimension. Likewise, (. Examples of these two formats are: (S0, S1) → (S1, S1), (S1, S1) → (B, S1), (B, S0) → (S0, S0), (S2, S2) → (S0, S2), (B, S1, S2, B, S2) → (B, S1, S0, B, S2), (B, S1, S2, B, S2) → (P, S3, S0, S1, S2).

In these examples, the deployment status of the respective deployed computing devices is not shown, where default is the same, e.g., (S0, S1) → (P, B), and if the deployment of the logical nodes upstream and downstream is all deployed at 4 computing devices and is classified as level 2, then it would be [2,2], so its complete distributed SBP label would be [2,2] (S0, S1) → [2,2] (P, B), supplementing the location identifier. And for the situation that the computing equipment and the level division deployed by the current initial logic node and the upstream initial logic node are the same, the computing equipment and the level division deployed by the current initial logic node and the upstream initial logic node are not judged, and only different parts are judged, so that whether the tensor data exchange communication can be completed by adopting the one-time communication primitive is determined.

For this purpose, an intermediate tensor or tensors need to be found between two tensors for which tensors of data exchange communication cannot be implemented by a basic communication primitive, so that tensors for which tensors of data exchange communication can be implemented by a basic communication primitive between two adjacent tensors.

For the first case described above, where the computing devices and level divisions where the current initial logical node and the upstream initial logical node are each deployed are the same, the communication indirect path acquisition component 130 may find an intermediate tensor for relay switched communication, which may be generated by an intermediate tensor generation node, for the case of infinite transmission costs. Such an intermediate tensor must be found.

For example:

[2,2]: (S0, S0) → [2,2]: (B, S1), cost=infinity

By the intermediate tensors [2,2] (S0, S1), two tensors can be connected in series to form an indirect communication path, so the indirect communication path obtained by the communication indirect path obtaining component 130 of the above example is:

[2,2]:(S0,S0)→[2,2]:(S0,S1)→[2,2]:(B,S1),Cost＝(1/2)T+T

the communication transmission cost is changed from infinity to a small transmission cost, and the communication can be completed through two data exchange communication 'to' based on basic communication primitives.

In some cases, a transfer transformation from one tensor to another tensor may require multiple intermediate tensors to be able to be implemented. For example:

[2,2] (S0, S1) → [2,2] (S1, S0), cost=infinity

The intermediate tensors are [2,2]: (S0, S2) and [2,2]: (S1, S2). After inserting the intermediate tensor, the indirect communication path can then be expressed as:

[2,2]:(S0,S1)→[2,2]:(S0,S2)→[2,2]:(S1,S2)→[2,2]:(S1,S0),

Cost＝(1/2)T+(1/2)T+(1/2)T

or alternatively

[2,2]:(S0,S1)→[2,2]:(S2,S1)→[2,2]:(S2,S0)→[2,2]:(S1,S0),

Cost＝(1/2)T+(1/2)T+(1/2)T

It is therefore pointed out that the intermediate tensor of the indirect communication path between two tensors, which is capable of achieving an infinite transmission cost between the two itself, is not unique and that the total transmission cost formed by the different intermediate tensors is not the same. The present disclosure thus communicates the query means (e.g., dijkstra's algorithm (Dijkstra), but any other algorithm) of the indirect path acquisition component 130 using a conventional shortest path to determine the intermediate tensor.

Nevertheless, the communication indirect path acquisition component 130 also needs to consider the number of dimensions of the upstream and downstream tensors, for example. For the above example "[2,2] (S0, S1) → [2,2] (S1, S0)" a split dimension S2 was used above, but if the number of dimensions of the upstream tensor to be transmitted is only 2 dimensions S0 and S1, e.g. 4X 4, i.e. no data of the third dimension at all, then S2 is not applicable to this tensor. Therefore, in practical applications, the above-mentioned indirect communication path cannot be actually selected because the intermediate tensors [2,2]: (S0, S2) and [2,2]: (S1, S2) are not available at all. Therefore, although the transmission cost of the communication indirect path is minimum, the communication indirect path cannot be practically used, and therefore, the intermediate tensor needs to be selected within the maximum slicing dimension of the tensor of the data processed by the whole system or the slicing dimension actually possessed by the processed data to select the communication indirect path with the minimum transmission cost.

Therefore, in the case where the pre-stored communication indirect paths do not satisfy the above-mentioned requirement, the communication indirect path obtaining component 130 selects, using the shortest path query method, to obtain the communication indirect path having the smallest transmission cost among the communication indirect paths capable of containing the intermediate tensors applicable to the upstream and downstream tensor dimensions, that is, regenerates the intermediate tensor string of the SBP distributed descriptor to which the one or more upstream and downstream tensors are applicable, and then selects one of the communication indirect paths having the smallest transmission cost. For the above upstream and downstream tensors "[2,2]: (S0, S1) → [2,2]: (S1, S0)", if the upstream and downstream tensors are not appropriate as the SBP distributed descriptor S2, the following examples of communication indirect paths exclude the S2 descriptor:

[2,2]:(S0,S1)→[2,2]:(S0,S0)→[2,2]:(S1,S1)→[2,2]:(S1,S0),

Cost＝(1/2)T+(3/4)T+(1/2)T

For a tensor shape of 4 x 4, this transmission cost is the smallest inside of all applicable shortest path communication indirect paths.

For the above example "[2,2]: (S0, S1) → [2,2]: (S1, S0)", if the upstream and downstream tensors are tensors of shape 2×2, it is obvious that the intermediate tensors (S0, S0) and (S1, S1) are also unsuitable as intermediate tensors, and therefore the communication indirect path acquisition component 130 will exclude them when selecting or acquiring the intermediate tensors, and thereafter the acquired communication indirect paths are, for example:

[2,2]:(S0,S1)→[2,2]:(B,S1)→[2,2]:(B,S0)→[2,2]:(S1,S0),

Cost＝T+T+0

for a tensor shape of 2 x 2, this transmission cost is the smallest inside of all applicable shortest path communication indirect paths.

Alternatively, the transmission cost is limited, for example, to two tensors at infinity,

[2,2]: (S0, S1) → [2,2]: (B, S2), cost=infinity

At this time, the intermediate tensor has 2 groups of choices, and the indirect communication path is formed as follows:

[2,2]:(S0,S1)→[2,2]:(B,S1)→[2,2]:(B,S2),Cost＝T+T

[2,2]:(S0,S1)→[2,2]:(S0,S2)→[2,2]:(B,S2),Cost＝(1/2)T+T

of the two indirect communication paths, the latter has a lower transmission Cost (Cost), and therefore, in the present disclosure, the communication indirect path acquisition component 130 selects the indirect communication path with the lowest transmission Cost.

In case there are multiple indirect communication paths of the same lowest transmission cost, multiple indirect communication paths may be reserved. For example, such as:

[2,2]: (P, S0) → [2,2]: (B, S1), cost=infinity

The nearest path method obtains two communication indirect paths with the same transmission cost and containing intermediate tensors:

[2,2]:(P,S0)→[2,2]:(B,S0)→[2,2]:(B,S1),Cost＝3T+T

[2,2]:(P,S0)→[2,2]:(P,S1)→[2,2]:(B,S1),Cost＝T+3T

at this time, the transmission costs of the two indirect paths are the same, or the communication indirect paths are the shortest, and the communication indirect path obtaining component 130 of the present disclosure stores the two communication indirect paths for the transformation of the two upstream and downstream tensors as candidate paths for selection by the path selecting component.

The above example is a determination of the transmission cost of an indirect path made with the same and same hierarchy of computing devices distributed at the upstream and downstream logical nodes. However, in the case of different computing devices or different hierarchies, such stored candidate communication indirect paths may also be suitable or may be candidate communication indirect paths, which may require an attempt or proof. The results of the attempts or enumeration and the simple data prove that the results are consistent, and the selection of such communication indirect paths remains the same in the case where the different computing devices are ranked differently.

For example, also taking the example (S0, S1) → (B, S2) above as an example, if the same SBP distributed descriptor is used for both upstream and downstream tensors, but the deployment at the computing device is not [2,2] (2 machine 4 card), but at 2 machine 800 card [2,800], the result of the above-described choice of intermediate tensors or indirect paths may be used. The costs in both indirect paths are as follows:

[800,2]:(S0,S1)→[800,2]:(B,S1)→[800,2]:(B,S2),

Cost＝799T+400T

[800,2]:(S0,S1)→[800,2]:(S0,S2)→[800,2]:(B,S2),

Cost＝(1/2)T+799T

Obviously, as a result of the deployment at the computing device being [2,2] (2 set 4 card), the cost of communicating the indirect path acquisition component 130 to select the second indirect path is less. Thus, this tensor transformation approach under the same upstream and downstream SBP distributed descriptors, the difference in computing device deployment approach has no impact on the comparison of transmission costs. Enumeration by tensor transformation under other identical upstream and downstream SBP distributed descriptors is also the same result, so that the computational effort and computational time for obtaining the communication indirect path can be reduced by storing the communication indirect path of the smallest transmission cost of the data exchange communication of the identical upstream and downstream SBP distributed descriptors for use in the same subsequent situation.

Also taking the above example as an example, if the tensor upstream and downstream of the SBP distributed descriptor is the same, but the deployment at the computing device is not [2,2] (2 machine 4 card), but at 2 machine 800 card [2,800], the result of the above-described choice of intermediate tensor or indirect path may be used. Thus, the SBP distributed descriptor of the input tensor is [2,800]: (P, S0), and the SBP distributed descriptor of the output tensor is [2,800]: (B, S1), such that the communication indirect path acquired by the communication indirect path acquisition component 130 according to the SBP distributed descriptor of the intermediate tensor above is expressed as follows:

[2,800]:(P,S0)→[2,800]:(B,S0)→[2,800]:(B,S1),

Cost＝3T+(799/400)T

[2,800]:(P,S0)→[2,800]:(P,S1)→[2,800]:(B,S1),

Cost＝(799/400)T+3T

It can be seen that under the same input or output SBP distributed descriptor, the transmission costs of the two communication indirect paths are still equal in a machine deployment of 2 machine 800 cards.

With another deployment scenario of a computing device, such as an 800-set 2 card, in the above example, the transmission costs of the following communication indirect paths remain equal:

[800,2]:(P,S0)→[800,2]:(B,S0)→[800,2]:(B,S1),

Cost＝1599T+400T

[800,2]:(P,S0)→[800,2]:(P,S1)→[800,2]:(B,S1),

Cost＝400T+1599T

in a machine deployment of 800 machine 2 cards, the costs of the two paths are also equal.

For the second type in the first case, e.g., (S0, S1) → (S1, S1), (S1, S1) → (B, S1), etc., the SBP distributed descriptor of the intermediate tensor is (S0, S0) or (B, B), i.e., the communication indirect path is: respectively is

(S0, S1) → (S0, S0) → (S1, S1) and

(S1,S1)→(B,B)→(B,S1)

the applicant has enumerated all the possibilities by means of a computer, knowing that the comparison of the transmission costs of all the communication indirect paths for the upstream and downstream tensors of the same SBP distributed descriptor does not change due to the change of the deployment hierarchy of the computing device, which conclusion can be easily demonstrated in mathematical logic. And will not be described in detail herein. Thus, by storing the least costly communication indirection paths of the upstream and downstream tensors of the same SBP distributed descriptor that have already been completed, the time cost of a large number of duplicate computation paths can be saved, and the least costly communication indirection paths of the upstream and downstream tensors of the same SBP distributed descriptor can be directly used as an extension of a communication primitive in a subsequent or other distributed data processing system, which greatly reduces the problem of requiring a technician to handle complex data exchange communications separately.

A second scenario in which the presently disclosed communication indirect path acquisition component 130 also needs to acquire a communication indirect path is where the computing devices deployed by each of the upstream initial logical nodes (also referred to as downstream initial logical nodes) of the current initial logical node are the same but of different level divisions. For example, if the number of computing devices is 4 GPU cards, the distribution manner of the upstream initial logical node adopts 2 machine 2 cards, that is, two cards are deployed on each device, so that all computing devices are divided into two stages, alternatively, four GPU cards on one machine can be divided into two stages by setting. The distribution mode of the current initial logic node at the downstream directly adopts a parallel stage of 4 GPU cards. The position grading marks displayed are [2,2] and [4]. This makes the existence dimension of the SBP distributed descriptors of the upstream and downstream initial logical nodes unequal or equal but in a different hierarchical manner. For example, 6 cards, may be rated in different dimensions, [2,3] and [6], or rated in equal but different manners, [2,3] and [3,2]. For example, examples where the tensor dimensions of the current initial logical node and the upstream initial logical node are not equal, for example:

[4] S0 → [2,2]: (B, S1) (broadcast from uniform division of 4 cards to device-to-device division, different card bit division in the device is S1).

[4]:S1→[2,2]:(B,S1)

[2,2]:(P,S0)→[4]:S0

[6]:S0→[2,3]:(B,S1)

[2,2,2]:(S0,S1,S2)→[8]:B

[2,3,5]:(S0,S1,S2)→[15,2]:(S0,S1)

Examples of the current initial logical node and the upstream initial logical node each having equal tensor dimensions but different hierarchies, for example:

[2,3]:(S0,S1)→[3,2]:(S0,S1)

the 2D SBP is the same here, but the slicing at each card is different because of the different hierarchy of computing device deployments.

[2,3]:(S0,S1)→[3,2]:(S1,S0)

The slice shapes here are exactly the same, but the cards where the data is located are different because of the different hierarchical order of deployment.

In some cases, the hierarchical order for deployment is not the same and the SBP descriptors upstream and downstream may not be the same, for example:

[2,3]:(S0,S1)→[3,2]:(P,B)

[2,3]:(S0,S1)→[3,2]:(S0,S0)

[15,2]:(S0,S1)→[6,5]:(P,S0)

examples of higher dimensions where the hierarchical order of deployment is not the same, for example:

[2,3,5]:(S0,S1,S2)→[5,3,2]:(S0,S1,S2)

[2,3,5]:(S0,S1,S2)→[5,3,2]:(S2,S1,S0)

[2,2,2,3]:(S0,S1,S2,P)→[2,3,2,2]:(S0,P,B,S1)

the current initial logical node for the second scenario is the same as the computing devices each deployed by the upstream initial logical node but the level division is different, although the presently disclosed communication indirect path acquisition component 130 can find the smallest transmission cost communication indirect paths by determining the closest paths, through statistical discovery, these closest paths ultimately appear as communication indirect paths that traverse a diagonal point tensor.

The inventors have noted that in the case of identical computing devices, the segmentation of the 1D SBP tensor in the case of identical one-dimensional distribution is exactly the same. For example, [2,3]: (S0, S0) and [6]: S0 are all equally split into 6 parts in dimension 0. [15,2]: (B, B) and [5,6]: (B, B) are all tensor data on 30 cards. [2,3,5] (P, P, P and [10,3] (P, P) are all partial data on each card, the sum of the 30 cards being the original tensor the communication indirect path acquisition component 130 of the present disclosure refers to this intermediate tensor that can be converted into a 1D SBP tensor as the diagonal point tensor of the communication indirect path.

For example, [2,3]: (S0, S1) → [3,2]: (S1, S0) with 2 diagonal tensors that minimize the path, [2,3]: (S0, S0) or [3,2]: (S0, S0) (both are exactly equal), and [2,3]: (S1, S1) or [3,2]: (S1, S1) (both are exactly equal). The indirect communication path is thus:

[2,3]:(S0,S1)→[2,3]:(S0,S0)＝[3,2]:(S0,S0)→[3,2]:(S1,S1)→

[3,2]:(S1,S0),Cost＝(2/3)T+(5/6)T+(1/2)T

Wherein "[2,3] (S0, S0) = [3,2] (S0, S0)" indicates only perfect equality, and any one of the pickers can constitute a communication indirect path, wherein the equality sign is not a transmission "→", so that the above communication indirect path uses only 3 basic communication primitives. And the two tensors in "[2,3] (S0, S0) = [3,2] (S0, S0)" are diagonal tensors (S0, S0), the classification selection is not so-called, and the two tensors are uniformly cut into 6 parts in the 0 th dimension no matter [2,3] or [3,2 ].

Alternatively, diagonal point tensors (S1, S1) may also be employed, forming the following communication indirect paths:

[2,3]:(S0,S1)→[2,3]:(S0,S0)→[2,3]:(S1,S1)＝[3,2]:(S1,S1)→[3,2]:(S1,S0),Cost＝(2/3)T+(5/6)T+(1/2)T

although the diagonal point Zhang Liangbian is for (S1, S1) at this time, the actual communication path is the same.

It can be seen from the above that with diagonal tensors, the indirect path does not only contain one diagonal tensor as an intermediate tensor, but also an intermediate tensor, between the diagonal tensor and the upstream tensor and between the diagonal tensor and the downstream tensor. In order to reduce the total communication transmission cost of the indirect communication path, in the upstream sub-path before the upstream tensor to the diagonal tensor, all the computing devices and the grades of the logical nodes generating the tensors are the same as those of the logical nodes of the upstream tensor, and in the downstream sub-path after the diagonal tensor to the downstream tensor, all the computing devices and the grades of the logical nodes generating the tensors are the same as those of the logical nodes of the downstream tensor. Thus, the inventors have found that in the second case, such rules can be employed to find the communication indirect path of the shortest path faster, i.e. to find the upstream sub-path with the same computing device and hierarchy before the upstream tensor to the diagonal tensor based on the diagonal tensor and the computing device and hierarchy of the upstream logical node and to find the downstream sub-path with the same computing device and hierarchy after the diagonal tensor to the downstream tensor, resulting in a communication indirect path of minimum transmission cost.

Since the computing device and the hierarchy to which the diagonal point tensor belongs are the same as the upstream tensor and the downstream tensor, the acquisition of the sub-path belongs to the first scenario mentioned in this disclosure. Therefore, in order to reduce the computation cost of computing the communication indirect path in the second case, the sub-path transmission cost may be traversed for each diagonal point tensor based on the path stored for the first case and the cost correspondence table, thereby obtaining the communication indirect path of the shortest path. The communication indirect path of the shortest path can also be obtained by directly obtaining the upstream sub-path and the downstream sub-path based on the shortest path algorithm under the condition that the pre-stored path and the cost are not available.

A third scenario in which the presently disclosed communication indirect path acquisition component 130 also needs to acquire a communication indirect path is where the current initial logical node (also referred to as a downstream initial logical node) is different from the computing devices each deployed by the upstream initial logical node. Because the computing devices are different, the upstream initial logical node and the downstream initial logical node are different in terms of distributed computing devices by the accompanying location marker expressions. Each computing device is assigned a unique tag. The different computing devices include three basic types, namely, the computing devices are quite different, the number of computing devices is different, and the cards at the computing devices are different. Any combination of these three basic types is also possible.

Specifically, in an example where the current initial logical node (also referred to as a downstream initial logical node) is completely different from the upstream initial logical node in a computing device, for example, 8 computing devices, the upstream operators are deployed on the 1 st to 4 th machines, and the current initial logical node (also referred to as a downstream initial logical node) is deployed on the 5 th to 8 th machines, so the SBP distributed descriptors of the upstream and downstream tensors are as follows:

{0,1,2,3},[4]:S0→{4,5,6,7},[2,2]:(S1,S0)

{0,1,2,3},[4]:S0→{4,5,6,7},[2,2]:(S1,B)

this often occurs between two scenarios in parallel in a pipeline.

When the number of computing devices is different between a current initial logical node (also referred to as a downstream initial logical node) and an upstream initial logical node, for example, there are 4 computing devices in total, the upstream initial logical node is disposed on the 1 st computing device, and the downstream operators are disposed on the 1 st to 4 th computing devices, for example, data from a hub is distributed to each computing device:

{0},[1]:B→{0,1,2,3},[2,2]:(S0,S1)

or for example:

{0,1,2,3},[2,2]:(S0,S1)→{0,1,2,3,4,5},[2,3]:(S0,S1)

{0,1,2,3,4},[5]:S0→{0,1,2,3},[2,2]:(S1,B)

the current initial logical node (also referred to as a downstream initial logical node) and the upstream initial logical node may be in different components on the computing device, for example, the upstream initial logical node is deployed on a CPU of the computing device, and the downstream initial logical node is deployed on a GPU of the computing device, for example:

{0,1,2,3},CPU,[2,2]:(S0,S1)→{0,1,2,3},GPU,[2,2]:(S1,S0)

For the third scenario, where the current initial logical node and the upstream initial logical node are each deployed with different computing devices, the present disclosure may find the smallest transmission cost communication indirect paths by determining the closest paths, but by statistical discovery, these closest paths ultimately appear as communication indirect paths through a diagonal point tensor pair.

The inventors have noted that after solving the transformation of the upstream and downstream tensors of the second case, data exchange communication for the upstream and downstream tensors deployed on different computing devices can be established by a pair of diagonal point tensors. In particular, finding a pair of diagonal point tensors belonging to different computing devices, both of which can complete data exchange with each other based on basic communication primitives, wherein one diagonal point tensor (referred to as an "upstream diagonal point tensor" in this disclosure) becomes an upstream-downstream tensor relationship belonging to the second case due to being distributed on the same computing device as the upstream tensor, and the other diagonal point tensor (referred to as a "downstream diagonal point tensor") becomes an upstream-downstream tensor relationship belonging to the second case due to being distributed on the same computing device as the downstream tensor. Thus, a basic format indirect path for data exchange communication is formed:

Upstream tensor → upstream diagonal tensor → downstream tensor

For example, for the upstream and downstream tensors [2,3]: (S2, P) → [5,6]: (B, S0), the diagonal tensor pairs are (S2, S2) and (B, B), so that the communication indirect path with the shortest path is:

[2,3]:(S2,P)→[2,3]:(S2,S2)→[5,6]:(B,B)→[5,6]:(B,S0)

Cost＝2T+30T+0

for another example, for the upstream and downstream tensors [5,5]: (P, S0) → [2,3,4]: (S1, B, S2), the diagonal pairs of tensors are (S1, S1) and (S1, S1, S1) or (B, B), (S1, S1, S1), so that the shortest communication indirect paths are:

[5,5]:(P,S0)→[5,5]:(S1,S0)→[5,5]:(S1,S1)→[2,3,4]:(S1,S1,S1)

→[2,3,4]:(S1,S1,S2)→[2,3,4]:(S1,B,S2)

Cost＝4T+(4/5)T+T+(3/4)T+2T

or alternatively

[5,5]:(P,S0)→[5,5]:(B,S0)->[5,5]:(B,B)->[2,3,4]:(S1,S1,S1)

→[2,3,4]:(S1,S1,S2)→[2,3,4]:(S1,B,S2)

Cost＝9T+4T+T+(3/4)T+2T

Pairs of diagonal points are { (B, B), (S1, S1, S1) }

It can be seen that the diagonal tensors differ from one another in terms of the cost of the indirect path of communication, so that a small cost or minimal cost can be selected with minimal number of communications, e.g., diagonal tensor pairs { (S1, S1), (S1, S1) } in the above example are stored. As an example:

on a [4,4] scale (S0, S0) - > (B, S0), if the first path is (S0, S0) - > (B, B) - > (B, S0), it is at the cost of 15T+0. If the second path is (S0, S0) - > (S1, S1) - > (S1, S0) - > (B, S0), it is at the cost of (15/16) t+ (3/4) t+3t= (4.6875) T. The second transmission can be said to be less costly, but it takes place 2 transmissions. Because one more transmission occurs, which creates a latency (communication synchronization time), it is not necessarily advantageous to trade one latency for a communication time of about 10T (see how large T is). In addition, besides the communication cost and the communication times, there is actually a memory consumption of the intermediate node. The priority is to consider the lowest communication frequency, the lowest communication cost and the low memory consumption.

By way of further example, for the upstream and downstream tensors [2,3]: (P, S0) → [100,2]: (S2, S2), when the diagonal point tensor pairs thereof are { (S1, S1), (S2, S2) }, the indirect path of communication is as follows:

[2,3]:(P,S0)→[2,3]:(S1,S0)→[2,3]:(S1,S1)→[100,2]:(S2,S2)

Cost＝T+(2/3)T+T,

when the pair of diagonal point tensors is { (S2, S2), (S2, S2) }, the indirect path of communication is as follows:

[2,3]:(P,S0)→[2,3]:(S2,S0)→[2,3]:(S2,S2)→[100,2]:(S2,S2)

Cost＝T+(2/3)T+T.

the transmission costs are the same in both cases, and the communication indirect path formed by the pair of diagonal tensors is stored for selection.

In addition, when the upstream tensor or the downstream tensor may be one of the diagonal tensor pairs, it is only necessary to insert the other of the diagonal tensor pairs.

As described above, the indirect communication path acquiring component 130 needs to take into account the number of dimensions of the upstream and downstream tensors, and excludes some indirect communication paths that do not conform to the number of dimensions of the upstream and downstream tensors. For the case that the computing devices are equal but different in hierarchy or different in computing device (as described later), if there are pre-stored indirect paths for communication that are not all walked but cannot be selected, it is necessary to customize (temporarily generate) the paths for the deployment hierarchy at the upstream and the deployment hierarchy at the downstream, respectively, and then pick pairs of diagonal points to splice the paths. For example, for the upstream and downstream tensors "[2,2]: (S2, P) → [3,3]: (S0, S1)", the communication indirect path that selects the shortest path in terms of the diagonal point tensor pair selection is:

[2,2]:(S2,P)→[2,2]:(S2,S2)→[3,3]:(S0,S0)[3,3]:(S0,S1)

Cost＝T+T+(2/3)T

However, for a tensor with a shape of 3×9×2, no dimension is divisible by 4, so that the upstream diagonal tensor can only select (B, B) or (P, P), and the downstream diagonal tensors (S0, S0) and (S2, S2) cannot be selected, neither dimension 0 nor dimension 2 is divisible by 9, so that the least costly shortest path communication indirect path can be:

[2,2]:(S2,P)→[2,2]:(B,P)→[2,2]:(P,P)→[3,3]:(B,B)→[3,3]:(S0,B)→[3,3]:(S0,S1)

Cost＝2T+0+12T+0+0

although this communication indirect path appears to be more costly to transmit than the above path, it is a communication indirect path conforming to a tensor of 3 x 9 x 2 shape.

In summary, the communication indirect path acquisition component 130, given the computing devices and computing device hierarchies distributed by the upstream and downstream initial logical nodes, acquires a string of intermediate tensors describing the computing devices and hierarchies distributed by the upstream and downstream tensors and the SBP distributed descriptors, such that each transmission of an upstream tensor to a downstream business via the intermediate tensor can be performed directly with the underlying communication primitive.

From the above, it may be noted that, among the intermediate tensors of the acquired or generated communication indirect paths, the computing devices and hierarchies to which the intermediate tensors of the upstream part of the intermediate tensors of the communication indirect paths are distributed use the computing devices and hierarchies to which the upstream tensors are distributed, and the computing devices and hierarchies to which the intermediate tensors of the downstream part are distributed use the computing devices and hierarchies to which the downstream tensors are distributed. Thus, the system of the present disclosure does not need to store the computing device and its classification information for each intermediate tensor (or logical node that generates an intermediate tensor), but rather only needs to determine or store the computing device and classification distributed by one intermediate tensor (or logical node that generates an intermediate tensor), such that the computing device and classification distributed by the upstream tensor is employed before the intermediate tensor (or logical node that generates an intermediate tensor), and the computing device and classification distributed by the downstream tensor is employed after the intermediate tensor (or logical node that generates an intermediate tensor). Alternatively, the computing device and its ranking information may also be stored for each intermediate tensor (or logical node generating the intermediate tensor) for each candidate communication indirect path.

Thus, alternatively, in determining the computing devices and the rankings thereof to which the intermediate tensors are distributed, if the computing devices and rankings of the upstream and downstream tensors themselves are the same, it is unnecessary to specify the intermediate tensors as intermediate points, and one of them may be arbitrarily specified. If the computing devices of the upstream and downstream tensors themselves are the same but the hierarchies are different, then the diagonal point tensor (or the generating logical node of the diagonal point tensor) or the intermediate tensor thereafter (or the generating logical node of the intermediate tensor) is assigned the intermediate point, for example, in the communication indirect path for the upstream and downstream nodes "2, 3: (S0, S1) → [3,2]: (S1, S0)"

[2,3]:(S0,S1)→[2,3]:(S0,S0)＝[3,2]:(S0,S0)→[3,2]:(S1,S1)→[3,2]:(S1,S0),

The 0 th intermediate tensor bit intermediate point may be specified, so the communication indirect path is then:

[2,3]:(S0,S1)→[3,2]:(S0,S0)→[3,2]:(S1,S1)→[3,2]:(S1,S0),

the intermediate tensor after the diagonal point tensor can be designated as the intermediate point, i.e. the 1 st intermediate tensor is the intermediate point, so the communication indirect path is

[2,3]:(S0,S1)→[2,3]:(S0,S0)→[3,2]:(S1,S1)→[3,2]:(S1,S0),

If the computing devices of the upstream and downstream tensors themselves are different, the latter of the pair of diagonal point tensors (or the generating logical node of the pair of diagonal point tensors) (i.e., the downstream diagonal point tensor or the generating logical node thereof) is assigned the intermediate point. For example, the indirect path of communication between [5,5]: (P, S0) → [2,3,4]: (S1, S1, S1)

[5,5]:(P,S0)→[5,5]:(S1,S0)→[5,5]:(S1,S1)→[2,3,4]:(S1,S1,S1)

In [2,3,4]: (S1, S2) [2,3,4]: (S1, B, S2), only the downstream diagonal point tensor [2,3,4]: (S1, S1) can be taken as the intermediate point.

The communication indirect path obtaining component 130 may, when obtaining and selecting the communication indirect paths, in addition to taking into account the transmission costs of the communication indirect paths, select any one of the two or more candidate communication indirect paths in case that the two or more candidate communication indirect paths have the same transmission costs, and may, on the other hand, also take into account the amount of memory space required by the intermediate tensor generation logic node in performing the intermediate tensor generation. For example, for an upstream-downstream tensor "[100,2]: (S0, S1) → [100,2]: (B, B)", there are two communication indirect paths with equal transmission costs, such as:

[100,2]:(S0,S1)→[100,2]:(S0,B)→[100,2]:(B,B)

Cost＝T+198T

and

[100,2]:(S0,S1)→[100,2]:(B,S1)→[100,2]:(B,B)

Cost＝99T+100T

however, the amount of memory space required for the intermediate tensor is different from the intermediate tensor [100,2] (S0, B) which occupies a space of (1/100) T on each computing device, and the intermediate tensor [100,2] (B, S1) which occupies a space of (1/2) T on each computing device. The communication indirect path acquisition component 130 selects the latter to be better, saving some storage space. Thus, when selecting a candidate communication indirect path, the communication indirect path obtaining component 130 may also optionally calculate the sum of the storage space required for the intermediate tensor of each communication indirect path in the case of the shortest path, in addition to the first selection of the smallest transmission cost, so as to select the communication indirect path with the smallest sum of the storage space from the communication indirect paths of the shortest path. In the case of choosing the same computing device but a different hierarchy or different computing devices, the communication indirect path may also be selected taking into account the sum of the storage spaces of the diagonal point tensors or diagonal point tensor pairs.

After the communication indirect path obtaining component 130 obtains or selects one of the candidate communication indirect paths, the intermediate tensor generation node inserting component 140 inserts an intermediate tensor generation node having the current intermediate tensor as an output tensor between any current intermediate tensor for the candidate communication indirect path, the intermediate tensor generation node having an upstream tensor of the current intermediate tensor as an input tensor, and the assigned SBP distributed signature having the SBP distributed descriptor of the upstream tensor as an SBP distributed descriptor of its input tensor and the SBP distributed descriptor of the current intermediate tensor as an SBP distributed descriptor of its output tensor, thereby obtaining the resulting logical node topology map 141. For example, intermediate tensor generating nodes (1) and (2) are inserted between the nodes E and B, the output tensor of the intermediate tensor generating node (1) is one of the intermediate tensors, and if one conversion is insufficient to complete data exchange between the nodes E and B through multiple communication source languages, the intermediate tensor generating node (2) is added to form a second intermediate tensor. Similarly, two selectable intermediate paths are formed between nodes F and K, an intermediate tensor generating node chain, intermediate tensor generating node chains (1) and (3) and intermediate tensor generating node chains (3) and (2).

Alternatively, as described above, the present system may employ the communication path library component 150 to store a transmission cost scaling table based on the basic primitives of aggregate communication, in accordance with the practice of data processing, storing corresponding communication transmission costs for tensor communication indirect paths described by any SBP distribution descriptor into another, different SBP distribution descriptor. In this way, the communication indirect path obtaining component 130 may first query a communication path library component based on the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor to obtain a corresponding communication indirect path and obtain the candidate communication indirect path using a shortest path method without querying to obtain the corresponding communication indirect path, in the case that the communication direct path determining component 120 determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication.

Alternatively, the communication path selecting component 160 searches the transmission cost conversion table stored in the communication path library component for the candidate communication through path or candidate communication indirect path determined by the current initial logical node and the upstream initial logical node based on the communication through path determining component 120 or the communication indirect path obtaining component 130, and selects the communication path with the smallest transmission cost as the communication path between the current initial logical node and the upstream initial logical node, so as to obtain the topology map 161 with the smallest transmission cost.

Through the automatic construction system and the method of the data exchange communication path according to the present disclosure, technicians performing complex distributed data processing face the problem that data exchange communication between different high-dimensional SBPs (ndsbps) cannot be completed between tensors described by different high-dimensional SBPs (ndsbps) through basic communication primitives, and only the problem that the data exchange communication in the data processing process is solved by the system of the present disclosure is automatically solved without spending more effort to write lengthy codes completely, only the study on the data to be processed itself is focused on, which greatly liberates the technicians, reduces the requirements of the data processing technicians in terms of knowledge construction (without becoming experts in terms of data exchange communication), and eliminates the obstacle of the common data processing technicians entering the data processing field.

While the basic principles of the present disclosure have been described above in connection with specific embodiments, it should be noted that all or any steps or components of the methods and apparatus of the present disclosure can be implemented in hardware, firmware, software, or combinations thereof in any computing device (including processors, storage media, etc.) or network of computing devices, as would be apparent to one of ordinary skill in the art upon reading the present disclosure.

Thus, the objects of the present disclosure may also be achieved by running a program or set of programs on any computing device. The computing device may be a well-known general purpose device. Thus, the objects of the present disclosure may also be achieved by simply providing a program product containing program code for implementing the method or apparatus. That is, such a program product also constitutes the present disclosure, and a storage medium storing such a program product also constitutes the present disclosure. It is apparent that the storage medium may be any known storage medium or any storage medium developed in the future.

It should also be noted that in the apparatus and methods of the present disclosure, it is apparent that the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure. The steps of executing the series of processes may naturally be executed in chronological order in the order described, but are not necessarily executed in chronological order. Some steps may be performed in parallel or independently of each other.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data exchange communication path auto-construction system for distributed data processing, the system comprising:

an initial logical node generating component that receives task configuration data entered by a user, generates an initial logical node topology map for the distributed data processing system, each initial logical node being configured to perform a predetermined data processing operation and being appended with a predetermined node attribute, the node attribute comprising a location tag of a logical data processing device to which the initial logical node belongs, the location tag representing a deployment structure of the logical data processing device to which the initial logical node belongs being appended based on the task configuration data, each SBP distributed signature in the candidate SBP distributed signature set specifying a one-dimensional or multi-dimensional SBP distributed descriptor for each input logical tensor to which it belongs and a one-dimensional or multi-dimensional SBP distributed descriptor for each output logical tensor;

A communication through path determining component traversing each initial logical node as a current initial logical node, based on the SBP distributed descriptor of the input logical tensor of each input of each SBP distributed signature of its candidate SBP distributed signature set and the SBP distributed descriptor of the output logical tensor of the output of the upstream initial logical node corresponding to the input, determining that a candidate communication through path exists between the current initial logical node and the upstream initial logical node of a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the SBP distributed descriptor of the input logical tensor can be completed by only one basic communication primitive of collective communication in case that the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor are different; and

a communication indirect path obtaining component that obtains one or more pieces by a shortest path method, transforms, via one or more intermediate tensors described by different SBP distributed descriptors, a tensor described by an SBP distributed descriptor of the output logical tensor to a candidate communication indirect path of a tensor described by an SBP distributed descriptor of the input logical tensor, between any two adjacent tensors of the candidate communication indirect path by one basic communication primitive of collective communication, in a case where the communication direct path determining component determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor by only one basic communication primitive of collective communication; a kind of electronic device with a high-performance liquid crystal display

An intermediate tensor generation node insertion component that inserts, for any current intermediate tensor of the selected candidate communication indirect path, an intermediate tensor generation node having the current intermediate tensor as an output tensor, based on a generation result of the communication indirect path acquisition component, an SBP distributed descriptor having an SBP distributed descriptor of the upstream tensor as an input tensor and an SBP distributed descriptor of the current intermediate tensor as an output tensor, with the SBP distributed descriptor of the current intermediate tensor as an input tensor, thereby obtaining a result logical node topology.

2. The data exchange communication path automatic construction system according to claim 1, further comprising:

a communication path library component storing a transmission cost conversion table storing corresponding communication transmission costs for converting tensors described by any SBP distribution descriptor into tensor communication indirect paths described by another, different SBP distribution descriptor;

wherein the communication indirect path obtaining component first queries a communication path library component based on the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor to obtain a corresponding communication indirect path and obtains the candidate communication indirect path using a shortest path method without querying to obtain the corresponding communication indirect path, in case that the communication direct path determining component determines that there is no communication direct path between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication.

3. The data exchange communication path automatic construction system according to claim 2, wherein the communication indirect path acquisition component stores the transmission costs of the candidate communication indirect path in the transmission cost conversion table of the communication path library component after generating the candidate communication indirect path.

4. The data exchange communication path automatic construction system according to claim 1 or 2, wherein the case where there is no communication through path capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication is one of the following cases:

the current initial logical node and the upstream initial logical node are each deployed with the same computing device and level division, but the SBP distributed descriptor of the output logical tensor is different from the SBP distributed descriptor of the input logical tensor in that there are at least two or more corresponding dimensions, or the SBP distributed descriptor of the output logical tensor is the same as the SBP distributed descriptor of one of the SBP distributed descriptors of the input logical tensor in that there are two different dimensions and the first dimension of the corresponding two dimensions in the other has a different SBP distributed descriptor and the same SBP distributed descriptor must be a split descriptor;

The current initial logical node and the upstream initial logical node are respectively deployed with the same computing equipment but different in level division; and

the current initial logical node is different from the computing devices each deployed by the upstream initial logical node.

5. The data exchange communication path automatic construction system according to claim 4, further comprising:

and the communication path selection component searches a transmission cost conversion table stored by the communication path library component for the candidate communication through path or the candidate communication indirect path determined by the current initial logic node and the upstream initial logic node based on the communication through path determination component or the communication indirect path acquisition component, and selects a communication path with the minimum transmission cost or the minimum transmission times and the minimum transmission cost as a communication path between the current initial logic node and the upstream initial logic node to obtain a logic node topological graph with the minimum transmission cost result.

6. A method of automatically constructing a data exchange communication path for distributed data processing, the method comprising:

receiving, by an initial logical node generating component, task configuration data input by a user, generating an initial logical node topology map for the distributed data processing system, each initial logical node being configured to perform a predetermined data processing operation and being appended with a predetermined node attribute, the node attribute comprising a location tag of a logical data processing device to which the initial logical node belongs, the location tag representing a deployment structure of the logical data processing device to which the initial logical node belongs being appended based on the task configuration data, each SBP distributed signature in the candidate SBP distributed signature set specifying a one-dimensional or multi-dimensional SBP distributed descriptor for each input logical tensor for the initial logical node to which it belongs and a one-dimensional or multi-dimensional SBP distributed descriptor for each output logical tensor;

Traversing, by a communication pass-through path determining component, each initial logical node as a current initial logical node, based on the SBP distributed descriptor of the input logical tensor of each input of each SBP distributed signature of its candidate SBP distributed signature set and the SBP distributed descriptor of the output logical tensor of the output corresponding to the input, determining that a candidate communication pass-through path exists between the current initial logical node and the upstream initial logical node capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor by only one basic communication primitive of the collective communication in case that the SBP distributed descriptor of the output logical tensor and the SBP distributed descriptor of the input logical tensor are different; and

obtaining, by a communication indirection path obtaining component, one or more pieces of communication indirection paths through a shortest path method in a case where a communication indirection path capable of completing a communication transformation from a tensor described by an SBP distributed descriptor of the output logical tensor to a tensor described by an SBP distributed descriptor of the input logical tensor by only one basic communication primitive of collective communication does not exist between a current initial logical node and the upstream initial logical node, transforming the tensor described by the SBP distributed descriptor of the output logical tensor to a candidate communication indirection path of the tensor described by an SBP distributed descriptor of the input logical tensor by one basic communication primitive of collective communication between any two adjacent tensors of the candidate communication indirection paths, the transformation of the upstream tensor to the downstream tensor in the two adjacent tensors being completed by one basic communication primitive of collective communication; a kind of electronic device with a high-performance liquid crystal display

An intermediate tensor generation node with the current intermediate tensor as an output tensor is inserted for any current intermediate tensor of the selected candidate communication indirect path based on the generation result of the communication indirect path acquisition component, wherein the intermediate tensor generation node takes an upstream tensor of the current intermediate tensor as an input tensor, and the given SBP distributed signature takes an SBP distributed descriptor of the upstream tensor as an SBP distributed descriptor of an input tensor and takes an SBP distributed descriptor of the current intermediate tensor as an SBP distributed descriptor of an output tensor, thereby obtaining a result logic node topology map.

7. The automatic construction method of a data exchange communication path according to claim 6, further comprising:

storing, by the communication path library component, a transmission cost scaling table, wherein each entry of the transmission cost scaling table stores a corresponding communication transmission cost for converting the tensor described by any SBP distribution descriptor into a tensor communication indirect path described by a different SBP distribution descriptor;

8. The automatic construction method of a data exchange communication path according to claim 7, wherein the communication indirect path acquisition component stores the transmission cost of the candidate communication indirect path in the transmission cost conversion table of the communication path library component after acquiring the candidate communication indirect path.

9. The data exchange communication path automatic construction method according to claim 6 or 7, wherein the case where there is no communication through path capable of completing a communication transformation from the tensor described by the SBP distributed descriptor of the output logical tensor to the tensor described by the SBP distributed descriptor of the input logical tensor through only one basic communication primitive of collective communication is one of the following cases:

10. The automatic construction method of a data exchange communication path according to claim 9, further comprising:

and searching a transmission cost conversion table stored by a communication path library component for the candidate communication through path or the candidate communication indirect path determined by the current initial logic node and the upstream initial logic node based on a communication through path determining component or a communication indirect path acquiring component, and selecting a communication path with the minimum transmission cost or the minimum transmission times and the minimum transmission cost as a communication path between the current initial logic node and the upstream initial logic node to obtain a minimum transmission cost result topological graph.