CN116578522A

CN116578522A - Data processing method, device, equipment and storage medium based on many-core architecture

Info

Publication number: CN116578522A
Application number: CN202310857584.3A
Authority: CN
Inventors: 章威; 赵蓉; 刘学; 蔡炎松; 裴京; 吴海建
Original assignee: CETHIK Group Ltd
Current assignee: CETHIK Group Ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-08-11
Anticipated expiration: 2043-07-13
Also published as: CN116578522B

Abstract

The present application relates to the field of computer technologies, and in particular, to a data processing method, device, equipment and storage medium based on a many-core architecture, where the method includes: acquiring an original calculation graph corresponding to a target task; carrying out node data alignment processing on producer nodes and consumer nodes in each data dependency relationship group to obtain a target dependency relationship group; generating a target calculation graph based on the target dependency group; performing data processing based on the many-core architecture deployed with the target computational graph to obtain a data processing result corresponding to the target task; the core units in the many-core architecture correspond to operator nodes in the target computational graph; and the routing cost among operator nodes in the target computation graph represents the routing traffic among the core units in the many-core architecture. The application can reduce the routing traffic and routing time between cores in the many-core architecture and improve the routing efficiency.

Description

Data processing method, device, equipment and storage medium based on many-core architecture

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium based on a many-core architecture.

Background

With the development of artificial intelligence, artificial intelligence algorithms such as neural networks are being applied to more and more fields. In the compilation field, computational graphs are a commonly used abstract representation of an expressed neural network. In the computation graph, nodes of the computation graph represent a block of data or one computation operation, and edges of the computation graph represent data transfer relationships (or data dependencies) between the nodes. In addition to neural networks, computational graphs may also express high performance computing, graphics, scientific computing, and simulation computing processes.

In recent years, many-core architecture has evolved into an architecture for efficiently executing computational graphs. When deploying the computational graph to the many-core architecture, different nodes of the computational graph may be distributed across different cores of the many-core architecture for execution. On many-core architecture, the cores communicate through routing, and since the amount of routing (data traffic) between the cores may be large, the routing time (data traffic time) between the cores is likely to be a performance bottleneck for the many-core architecture to execute the computational graph.

Disclosure of Invention

The technical problem to be solved by the application is to provide a data processing method, device, equipment and storage medium based on a many-core architecture, which can reduce the routing cost among operator nodes in a computation graph, and further can reduce the routing traffic and routing time among cores when the computation graph is deployed to the many-core architecture, and improve the routing efficiency.

In order to solve the technical problems, in one aspect, the present application provides a data processing method based on a many-core architecture, including:

acquiring an original calculation graph corresponding to a target task; the original calculation graph comprises a plurality of operator nodes, wherein the operator nodes with data dependency relations form at least one data dependency relation group, and each data dependency relation group comprises a consumer node and at least one producer node corresponding to the consumer node;

carrying out node data alignment processing on producer nodes and consumer nodes in each data dependency relationship group to obtain a target dependency relationship group; the routing cost between the producer node and the consumer node in the target dependency group is less than the routing cost between the producer node and the consumer node in each data dependency group;

generating a target calculation graph based on the target dependency group;

performing data processing based on the many-core architecture deployed with the target computational graph to obtain a data processing result corresponding to the target task; the core units in the many-core architecture correspond to operator nodes in the target computational graph; and the routing cost among operator nodes in the target computation graph represents the routing traffic among the core units in the many-core architecture.

In another aspect, the present application provides a data processing apparatus based on a many-core architecture, including:

the original calculation map acquisition module is used for acquiring an original calculation map corresponding to the target task; the original calculation graph comprises a plurality of operator nodes, wherein the operator nodes with data dependency relations form at least one data dependency relation group, and each data dependency relation group comprises a consumer node and at least one producer node corresponding to the consumer node;

the data alignment processing module is used for carrying out node data alignment processing on the producer nodes and the consumer nodes in each data dependency relationship group to obtain a target dependency relationship group; the routing cost between the producer node and the consumer node in the target dependency group is less than the routing cost between the producer node and the consumer node in each data dependency group;

the target calculation graph generation module is used for generating a target calculation graph based on the target dependency relationship group;

the data processing module is used for performing data processing based on the many-core architecture deployed with the target calculation graph to obtain a data processing result corresponding to the target task; the core units in the many-core architecture correspond to operator nodes in the target computational graph; and the routing cost among operator nodes in the target computation graph represents the routing traffic among the core units in the many-core architecture.

In another aspect, the present application provides an electronic device, the device including a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement a data processing method based on a many-core architecture as described above.

In another aspect, the present application provides a computer storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program loaded by a processor and executed by a data processing method based on a many-core architecture as described above.

The embodiment of the application has the following beneficial effects:

the application obtains the target dependency relationship group by carrying out node data alignment on the producer node and the consumer node of the data dependency relationship group in the original calculation graph, and the routing cost between the producer node and the consumer node in the target dependency relationship group after the node data alignment is smaller than the routing cost between the producer node and the consumer node in the data dependency relationship group before the node data alignment; the routing cost among the operator nodes in the calculation graph can be reduced based on the node data alignment processing of the operator nodes in the original calculation graph, and when the target calculation graph after the node data alignment processing is deployed on a many-core architecture for data calculation, the routing traffic and the routing time among core units in the many-core architecture can be reduced, and the routing efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a data processing method based on a many-core architecture according to an embodiment of the present application;

FIG. 3 is a flowchart of a node data alignment processing method according to an embodiment of the present application;

FIG. 4 is a flow chart of a method for updating loops in consumer nodes provided by an embodiment of the present application;

FIG. 5 is a flowchart of another node data alignment processing method according to an embodiment of the present application;

FIG. 6 is a flow chart of a method for updating loops in a producer node according to an embodiment of the present application;

FIG. 7 is a flowchart of another node data alignment processing method according to an embodiment of the present application;

FIG. 8 is an illustration of a dimension dependent graph provided by an embodiment of the present application;

FIG. 9 is a first schematic diagram of a dimension dependent graph provided by an embodiment of the present application;

FIG. 10 is a second schematic diagram of a dimension dependent graph provided by an embodiment of the present application;

FIG. 11 is a third schematic diagram of a dimension dependent graph provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus based on a many-core architecture according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown, where the implementation environment may include: a data processing cluster 110, a processing terminal 120, and a task submitting end 130; the data processing cluster 110 is implemented based on a many-core architecture, one core unit may refer to a processor, and at least one core unit may be deployed on each node in the data processing cluster 110; the processing terminal 120 and the data processing cluster 110 may communicate data through a network, and the task submitting end 130 and the processing terminal 120 may communicate data through a network.

Specifically, the task submitting end 130 may submit the target task to the processing terminal 120; the processing terminal 120 may receive the target task, and in the case that the target task does not include the original computation graph, the processing terminal 120 may generate the original computation graph corresponding to the target task according to the task content of the target task, and perform node data alignment processing on each operator node in the original computation graph to obtain the target computation graph; in the case that the target task includes the original computation graph, the processing terminal 120 may directly perform the node data alignment process on each operator node in the original computation graph, to obtain the target computation graph.

Further, the processing terminal 120 may deploy the target computation graph into the data processing cluster 110, so as to perform data processing based on the data processing cluster 110, and obtain a data processing result corresponding to the target task.

The data processing cluster 110 may be a server cluster composed of a plurality of servers, where the servers may be physical servers or cloud servers.

The task presenter 130 may communicate with the second terminal 120 based on Browser/Server (B/S) or Client/Server (C/S) mode. The task submitting end 130 may include: the smart phones, tablet computers, notebook computers, digital assistants, smart wearable devices, vehicle terminals and other types of physical devices may also include software running in the physical devices, such as application programs and the like. The operating systems running on the task submission end 130 in embodiments of the present application may include, but are not limited to, android systems, IOS systems, linux, windows, and the like.

The task submitting end 130 and the processing terminal 120 may establish a communication connection through a wire or wirelessly, and the processing terminal 120 may include a server that operates independently, or a distributed server, where the server may be a cloud server or a physical server.

In order to solve the problems of large routing amount and long routing time between cores in a many-core architecture in the prior art, an embodiment of the present application provides a data processing method based on a many-core architecture, where an execution body of the method may be the above-mentioned processing terminal, referring to fig. 2, the method specifically may include:

s210, acquiring an original calculation map corresponding to a target task; the original calculation graph comprises a plurality of operator nodes, the operator nodes with data dependency relations form at least one data dependency relation group, and each data dependency relation group comprises a consumer node and at least one producer node corresponding to the consumer node.

The computation graph is a graph formed by operator nodes and edges, the operator nodes are used for representing computation, and the edges are used for representing data dependency relations among the computation. Assuming that two operator nodes A and operator nodes B with data dependency relationship are arranged, the data generated by the operator nodes A are used by the operator nodes B, the operator nodes A can be called producer nodes, and the operator nodes B can be called consumer nodes; correspondingly, in the computational graph, a connecting edge exists between the operator node A and the operator node B.

S220, performing node data alignment processing on the producer nodes and the consumer nodes in each data dependency relationship group to obtain a target dependency relationship group; the routing cost between the producer node and the consumer node in the target dependency group is less than the routing cost between the producer node and the consumer node in each data dependency group.

The route locality cost of each consumer node in the original calculation graph can be calculated, K producer nodes are set as the current consumer node CThe method comprises the steps of carrying out a first treatment on the surface of the First a coordinate of consumer node C is calculated as +.>Basic calculation of (2) for producer node->Is->According to the coordinates->Producer node necessary for basic calculation of consumer node of (2)>The data generated, the producer node is obtained>Basic calculated coordinates of the data are generated +.>Then calculate the route locality cost +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein->Representing a certain distance function, such as a first norm, a second norm, etc. If->And->If the coordinate lengths of the coordinates are not equal, the shorter coordinates are complemented by 0 until the two coordinate lengths are the same. Then calculate one coordinate of consumer node C as +. >Is a basic calculation of route locality cost for all K producer nodes +.>The method comprises the steps of carrying out a first treatment on the surface of the Finally, calculating the route locality cost between all producer nodes and consumer node C>. The route locality cost for computation graph G may be defined: />。

In example 1, the loop nesting of producer node 1, producer node 2, and consumer node is illustrated as follows:

producer node 1:

for y: 0 to 2:

for x: 0 to 2:

a[y][x]= ...

producer node 2:

for y: 0 to 2:

for x: 0 to 2:

c[y][x]= 2；

consumer node:

for y: 0 to 2:

for x: 0 to 2:

b[y][x]= a[y][2 - x]+ c[y][x]；

in the consumer node of example 1, the data required for the calculation of coordinates (0, 0) is a [0, 2]And c [0, 0 ]]The coordinates calculated in the producer node 1 that generated these two data are (0, 2), the coordinates calculated in the producer node 2 are (0, 0), so(here the distance function takes a norm). By analogy, the route locality cost between a consumer node and two producer nodes +.>。

The node data alignment process in this embodiment may be such that the route locality cost of graph G is computedThe data processing manner as small as possible may be specifically a data processing manner that makes the distance between the coordinates of the consumer node and the coordinates of the producer node as small as possible, so that the route locality cost of the graph G is calculated as small as possible >As small as possible.

S230, generating a target calculation graph based on the target dependency relation group.

In this embodiment, the node data alignment process may be performed in each data dependency group with each data dependency group as a unit, so as to obtain a corresponding target dependency group, thereby generating the target computation graph based on the target dependency group corresponding to each data dependency group. It should be noted that, in this embodiment, the original computation graph may correspond to a plurality of data dependency groups, and there may be a data dependency group in the plurality of data dependency groups that does not need to perform the node data alignment process; in a data dependency group, if a consumer node corresponds to a plurality of producer nodes, there may be producer node-consumer node pairs that do not need to perform the node data alignment process. Thus, when the target calculation map is generated based on the target dependency group, the target calculation map can be generated based on the target dependency group subjected to the node data alignment processing and the data dependency group not required to be subjected to the node data alignment processing.

S240, performing data processing based on the many-core architecture deployed with the target calculation graph to obtain a data processing result corresponding to the target task; the core units in the many-core architecture correspond to operator nodes in the target computational graph; and the routing cost among operator nodes in the target computation graph represents the routing traffic among the core units in the many-core architecture.

The many-core architecture in this embodiment may be any form or any structure of a cluster of many-core units, and by deploying the target computation graph on the many-core architecture, correspondence between each operator node in the target computation graph and each core unit in the many-core architecture may be achieved, so that routing traffic between each core unit in the many-core architecture may be represented by routing cost between each operator node in the target computation graph, that is, by reducing routing cost between each operator node in the target computation graph, routing traffic between each core unit in the many-core architecture may be reduced.

In particular, in the case of generating a target computational graph, the target computational graph may be mapped into a many-core architecture, each operator node in the target computational graph may be mapped into one or more core units in the many-core architecture, such that the computing operations in each operator node may be performed in one or more core units, and the computing operations in each operator node may be operations that compute task data for a target task. The original calculation graph corresponding to the target task can represent the original data processing logic of the task data of the target task, so that the corresponding target calculation graph can represent the target data processing logic of the task data of the target task after data alignment processing, and the routing cost for data processing based on the original data processing logic is greater than the routing cost for data processing based on the target data processing logic; the many-core architecture of the deployed target computing graph can be an architecture for performing data processing on task data of a target task based on target data processing logic, the many-core architecture of the deployed target computing graph can receive the task data of the target task, determine a data flow direction of the task data among core units in the many-core architecture based on the target data processing logic of the task data, each core unit can calculate the task data flowing into the core unit based on a computing operation corresponding to an operator node in the corresponding target computing graph, obtain a task data computing result, and output the task data computing result, so that at least one core processing unit in a next link of the data flow direction can receive the output task data computing result, and determine the task data computing result of the last core processing unit or a plurality of core processing units as a data processing result corresponding to the target task.

In a specific example, the target task may be an object behavior prediction task, the corresponding input task data may include object feature data of an object to be predicted, the output data processing result is the object behavior prediction data, the original computation graph corresponding to the target task may correspond to the original prediction model, the target computation graph corresponding to the target task may correspond to the target prediction model obtained through data alignment processing, the target prediction model may be target data processing logic for implementing object behavior prediction, and the many-core architecture deployed with the target computation graph may process the object feature data of the object to be predicted based on the target data processing logic of the target prediction model, so as to obtain the corresponding object behavior prediction data.

In another specific example, the target task may be an image recognition task, the corresponding input task data may include image feature data, the output data processing result is image recognition data, the original computing graph corresponding to the target task may correspond to the original image recognition model, the target computing graph corresponding to the target task may correspond to the target image recognition model obtained through data alignment processing, the target image recognition model may be target data processing logic for realizing image recognition, and the many-core architecture deployed with the target computing graph may process the image feature data of the image to be recognized based on the target data processing logic of the target image recognition model, so as to obtain the corresponding image recognition result.

The application obtains the target dependency relationship group by carrying out node data alignment on the producer node and the consumer node of the data dependency relationship group in the original calculation graph, and the routing cost between the producer node and the consumer node in the target dependency relationship group after the node data alignment is smaller than the routing cost between the producer node and the consumer node in the data dependency relationship group before the node data alignment; the routing cost among the operator nodes in the calculation graph can be reduced based on the node data alignment processing of the operator nodes in the original calculation graph, and when the target calculation graph after the node data alignment processing is deployed on a many-core architecture for data calculation, the routing traffic and the routing time between core units can be reduced, and the routing efficiency is improved.

Further, the producer nodes in each data dependency group comprise at least one layer of loops and a first computing unit, and the consumer nodes in each data dependency group comprise at least one layer of loops and a second computing unit; in the computational graph scenario, the operator nodes may be expressed as a series of for loops, which may be referred to as for loop nesting, and the loop body of the for loops includes the basic computation units of the operator nodes, so that the first computation unit is the basic computation unit of the producer node and the second computation unit is the basic computation unit of the consumer node. For example, the producer node 2 and the consumer node in example 1 show for loop nesting representations of two operator nodes, for loop nesting representing a calculation process of one operator node, wherein each basic calculation can be represented as a coordinate under one for loop nesting, and a first dimension to a last dimension of the coordinate respectively correspond to each layer for loop from outside to inside for loop nesting, and a value in each dimension represents a loop order of the current calculation in the corresponding layer loop. For example, in example 1, if the inner layer cycle order of the consumer is 0 and the outer layer cycle order is 0, the coordinates corresponding to b [0] [0] =a0 ] [2] +c0 ] [0] are calculated as (0, 0). Accordingly, referring to fig. 3, a node data alignment processing method is shown, which may include:

S310, determining a first parameter corresponding relation between the first computing unit and the second computing unit under the condition that the producer nodes comprise a first type of producer nodes; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero.

The degree of incidence of the first class producer node in the original calculation graph is greater than zero, namely the first class producer node needs to use data generated by other operator nodes in the calculation process, namely the first class producer node depends on the data generated by at least one operator node to calculate, and the first class producer node corresponds to at least one producer node which is depended on by the first class producer node.

The first computing unit of the first type producer node may include a coordinate dimension corresponding to a loop layer of the first type producer node, i.e., the first coordinate dimension to the last coordinate dimension respectively correspond to loops of each layer of loop nesting from outside to inside; the second computing unit of the consumer node may include a coordinate dimension corresponding to the loop layer of the consumer node, i.e., the first coordinate dimension to the last coordinate dimension respectively correspond to each loop of loop nesting from outside to inside; for the original coordinates of the producer node, the coordinate parameters in the producer node are original parameters, the consumer node can perform data calculation depending on the original coordinates, the coordinate parameters are application parameters when the original coordinates are applied, and the original parameters and the application parameters can be the same or different. As shown in example 1, the producer node 1 has an original coordinate of a [ y ] [ x ], an original parameter of y [ x ], an application coordinate of a consumer node in the consumer node when the consumer node applies the data of the producer node 1 is a [ y ] [2-x ], and an application parameter of y [2-x ], that is, the original parameter is different from the application parameter. The first parameter corresponding relation between the first computing unit and the second computing unit can be specifically the corresponding relation between the original parameter in the first computing unit and the application parameter in the second computing unit.

S320, carrying out variable replacement on the circulation variable in the at least one layer of circulation of the consumer node based on the first parameter corresponding relation to obtain a first update circulation corresponding to the consumer node; the routing cost between the consumer node after the update cycle and the first type producer node is less than the routing cost between the consumer node before the update cycle and the first type producer node.

Under the condition that the first parameter corresponding relation represents that a producer node-consumer node pair with inconsistent original parameters and application parameters exists, variable replacement can be carried out on a circulation variable of a consumer in a corresponding circulation to obtain a corresponding first updating circulation; the cycle order of the second computing unit in the first update cycle may be consistent with the cycle order of the first computing unit, and specifically may refer to that the cycle order of the first computing unit for the original coordinates is consistent with the cycle order of the second computing unit for the application coordinates; the cycle order of the second computing unit in the first update cycle may not be identical to the cycle order of the first computing unit, and specifically may refer to that the cycle order of the first computing unit for the original coordinates is not identical to the cycle order of the second computing unit for the application coordinates.

Let the coordinates of the current cycle of consumer nodes beMapping by shuffle +.>Mapping the coordinates of the consumer nodes to new coordinates, which guarantee the route locality cost +.>As small as possible. In special cases, the shuffle map may be degenerated to a linear map +.>. After obtaining the shuffle map (or linear mapping parameter), using/>The original circulation variable of the consumer is replaced. In example 1, the route locality of producer node 1 and consumer node on the innermost loop can be optimized, as +.>When the route locality between the producer node 1 and the consumer node is minimum, the cycle of the consumer node as in example 2 can be obtained after the original cycle variable is replaced, and in example 2, the cycle nesting of the producer node 1, the producer node 2 and the consumer node is shown as follows:

producer node 1:

for y: 0 to 2:

for x: 0 to 2:

a[y][x]= ...

producer node 2:

for y: 0 to 2:

for x: 0 to 2:

c[y][x]= 2；

consumer node:

for y: 0 to 2:

for x: 2 to 0:

b[y][x]= a[y][2 - x]+ c[y][x]；

namely, the loop variable substitution is performed on "for x:0 to 2" of the consumer node in example 1, and the loop "for x:2 to 0" shown in example 2 is obtained, so that the a [ y ] [ x ] of the for x:0 to 2 loop in the producer node 1 is consistent with the loop order of the a [ y ] [2-x ] of the for x:2 to 0 loop in the consumer node, and the route locality cost between the producer node 1 and the consumer is reduced to 0.

S330, obtaining the target dependency relationship group based on the first updating cycle corresponding to the consumer node.

Therefore, when the producer node corresponding to the consumer node is the first type producer node, the circulation variable of the consumer node can be adjusted based on the parameter corresponding relation between the consumer node and the producer node, and the adjusted circulation realizes the alignment of node data between the producer node and the consumer node, thereby reducing the routing cost between the producer node and the consumer node.

Specifically, the consumer node comprises a multi-layer loop, and the second computing unit comprises coordinate parameters respectively corresponding to the multi-layer loop; thus, when updating the loops of the consumer node, each loop in the consumer node can be traversed to confirm whether the variable replacement is needed for the corresponding loop; referring accordingly to FIG. 4, a method of updating loops in a consumer node is illustrated and may include:

and S410, carrying out parameter relation analysis on the coordinate parameters corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameters corresponding to the corresponding circulation in the first calculation unit to obtain the parameter corresponding relation corresponding to the at least one layer of circulation.

For the original coordinates of the producer node, the coordinate parameters in the producer node are original parameters, the consumer node can perform data calculation depending on the original coordinates, the coordinate parameters are application parameters when the original coordinates are applied, and the original parameters and the application parameters can be the same or different. As shown in example 1, for the for x:0 to 2 loop in the consumer node, the original coordinates of the producer node 1 are a [ y ]][x]The original parameter corresponding to the for x 0 to 2 cycle is [ x]When the consumer node applies the data of the producer node 1, the application coordinate in the consumer node is a [ y ]][2-x]The application parameter corresponding to the for x 0 to 2 cycle is [2-x ]]That is, the original parameters are different from the application parameters, so that the corresponding relation of the parameters is that the coordinate parameters corresponding to the current cycle of the consumer node are inconsistent with the coordinate parameters corresponding to the corresponding cycle in the first calculation unit, and the parameter mapping information is that. For the for y 0 to 2 loop, the original coordinates of the producer node 1 are a [ y ]][x]The application coordinates in the consumer node are a [ y ]][2-x]The original parameter corresponding to the for y 0 to 2 cycle is [ y ]]The application parameter corresponding to the for y 0 to 2 cycle is [ y ]]That is, the original parameters are the same as the application parameters, so that the corresponding relation of the parameters is the current circulation of the consumer node The coordinate parameters corresponding to the rings are consistent with the coordinate parameters corresponding to the corresponding cycles in the first calculation unit.

S420, determining the first parameter corresponding relation based on the parameter corresponding relation corresponding to the at least one layer of circulation.

According to the parameter correspondence of each cycle of the consumer node obtained in step S410, a first parameter correspondence corresponding to each data dependency group may be obtained.

S430, traversing the multi-layer loops of the consumer node, and executing the following operations on each layer of loops: under the condition that the parameter corresponding relation corresponding to the current cycle comprises first parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the first parameter mapping information to obtain an updated cycle corresponding to the current cycle; the first parameter mapping information characterizes a mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the first calculation unit.

Under the condition that the parameter corresponding relation corresponding to the current circulation comprises first parameter mapping information, describing that the current circulation in the consumer node needs to be subjected to variable replacement, wherein the replacement basis can be the first parameter mapping information, and carrying out variable replacement on the current circulation to obtain updated circulation; the first parameter mapping information may be a mapping relation, for example . As in example 1, based on +.>Replacement of +.>The for x:0 to 2 loop in the consumer node may be updated to for x:2 to 0 loop. And under the condition that the parameter corresponding relation corresponding to the current cycle does not comprise the first parameter mapping information, the variable replacement of the current cycle is not required.

S440, obtaining the first updating cycle based on at least one updated cycle.

In the event that there is an updated cycle, a first updated cycle may be derived based on the updated cycle; in the case where there is both a cycle requiring an update and a cycle not requiring an update, a first update cycle corresponding to the consumer node may be derived based on the updated cycle and the cycle not requiring an update.

In the case that the consumer node comprises multiple layers of loops, parameter relation analysis can be performed on the coordinate parameters under each layer of loops and the coordinate parameters of the corresponding producer, and loop variable replacement is performed on the loop layer corresponding to the parameters with unmatched loop sequences, so that the loop sequences of the first computing unit and the second computing unit are uniform in the multiple layers of loops, alignment of node data between the producer node and the consumer node in each layer of loops is further realized, and routing cost between the producer node and the consumer node is reduced.

Further, in the case that the producer nodes in each data dependency relationship group include a first type of producer node, determining a first parameter correspondence between the first computing unit and the second computing unit includes:

under the condition that the number of the first type producer nodes is a plurality of, determining the parameter corresponding relation between a first computing unit and a second computing unit of each first type producer node;

and determining the first parameter corresponding relation based on the parameter corresponding relation between the first computing unit and the second computing unit of each first type producer node.

That is, one consumer node may correspond to a plurality of producer nodes, and in the case that the number of the first type producer nodes is a plurality of, the analysis operation of the parameter corresponding relation between each first type producer node and the consumer node is needed to be performed respectively, so that the circulation variable of the consumer node can be adjusted based on the parameter corresponding relation between each first type producer node and the consumer node, the data alignment of each first type producer node and the consumer node is realized, and the routing cost between the producer node and the consumer node is reduced.

In one embodiment, the producer nodes in each data dependency group include at least one layer of loops; referring to fig. 5, another node data alignment processing method is shown, the node data alignment processing method is executed after performing variable replacement on the loop variable in the at least one layer of loops of the consumer node in each data dependency relationship group based on the first parameter correspondence, so as to obtain a first update loop corresponding to the consumer node, and the method may include:

s510, determining a second parameter corresponding relation between the first computing unit and the second computing unit under the condition that the producer nodes comprise a second class of producer nodes; the degree of invasiveness of the second class producer node in the original computational graph is equal to zero.

The degree of entry of the second class producer node in the original calculation graph is zero, namely the second class producer node does not need to use data generated by other operator nodes in the calculation process, namely the second class producer node does not depend on the data generated by any operator nodes to calculate.

The first computing unit of the second class producer node comprises coordinate dimensions which can correspond to the cyclic layers of the second class producer node, namely, each layer of cyclic nesting from outside to inside is respectively corresponding to the first coordinate dimension to the last coordinate dimension; the second computing unit of the consumer node may include a coordinate dimension corresponding to the loop layer of the consumer node, i.e., the first coordinate dimension to the last coordinate dimension respectively correspond to each loop of loop nesting from outside to inside; for the original coordinates of the producer node, the coordinate parameters in the producer node are original parameters, the consumer node can perform data calculation depending on the original coordinates, the coordinate parameters are application parameters when the original coordinates are applied, and the original parameters and the application parameters can be the same or different.

In example 3, the loop nesting of producer node 1, producer node 2, and consumer node is represented as:

producer node 1:

for x: 0 to 2:

a[x] = ...

producer node 2:

for y: 0 to 2:

for x: 0 to 2:

c[y][x]= x + y；

consumer node:

for y: 0 to 2:

for x: 0 to 2:

b[y] = a[x]* c[y][2 - x]；

in example 3, the producer node 2 is a second type of producer node, the original coordinates of the producer node 2 are c [ y ] [ x ], the original parameters are y ] [ x ], the application coordinates in the consumer node when the consumer node is applying the data of the producer node 2 are c [ y ] [2-x ], the application parameters are y ] [2-x ], i.e., the original parameters are different from the application parameters. The second parameter corresponding relation between the first computing unit and the second computing unit can be specifically the corresponding relation between the original parameter in the first computing unit and the application parameter in the second computing unit.

S520, carrying out variable replacement on the circulation variable in the at least one layer of circulation of the second type producer node based on the second parameter corresponding relation to obtain a second update circulation corresponding to the second type producer node; the cycle order of the first computing unit under the second update cycle is consistent with the cycle order of the second computing unit.

Under the condition that the second parameter corresponding relation represents that a producer node-consumer node pair with inconsistent original parameters and application parameters exists, variable replacement can be carried out on the circulation variables of the second class producer nodes in the corresponding circulation to obtain a corresponding second updating circulation; the fact that the cyclic order of the first computing unit is consistent with the cyclic order of the second computing unit in the second update cycle may mean that the cyclic order of the original coordinates in the first computing unit is consistent with the cyclic order of the application coordinates in the second computing unit.

Let the coordinates of the current cycle of the second type producer nodes beMapping by shuffle +.>Mapping the coordinates of the second class producer nodes to new coordinates, the new coordinates ensuring the route locality cost +.>As small as possible. In special cases, the shuffle map may be degenerated to a linear map +.>. After obtaining the shuffle map (or linear mapping parameters), use +.>The original circulation variable of the producer is replaced. Example 3 shows that producer node 2 is a second type of producer node, and by default, the route locality cost between producer node 2 and consumer node is 12, via linear mapping +.>After adjusting the inner loop of producer 2, the route locality cost between producer node 2 and consumer node drops to 0.

And S530, obtaining the target dependency relationship group based on the first updating cycle and the second updating cycle.

Because the data alignment processing is performed on the second type of producer nodes after the data alignment processing is performed on the first type of producer nodes, the target dependency relationship group can be obtained based on a first update cycle obtained by the data alignment processing on the first type of producer nodes and a second update cycle obtained by the data alignment processing on the second type of producer nodes. Namely, for each data dependency relationship group, only the operation of performing data alignment processing on the first type producer node, only the operation of performing data alignment processing on the second type producer node, and both the operation of performing data alignment processing on the first type producer node and the operation of performing data alignment processing on the second type producer node can be performed; the particular type of operation may be determined based on the type of producer node included in each data dependency group.

On the one hand, when the first class producer node is subjected to data alignment, the producer node is used as a reference to adjust the circulation of the consumer node, and when the second class producer node is subjected to data alignment, the consumer node is used as a reference to adjust the circulation of the producer node, if the second class producer node is subjected to data alignment first, and then the first class producer node is subjected to data alignment, the problem that the circulation of the second class producer node needs to be repeatedly adjusted exists, so that the second class producer node is subjected to data alignment after the first class producer node, thereby avoiding the problem that the circulation of the second class producer node is repeatedly adjusted, and improving the data alignment efficiency; on the other hand, when the producer node corresponding to the consumer node is the second type producer node, the circulation variable of the second type producer node can be adjusted based on the corresponding relation of the parameters of the consumer node and the producer node, and the circulation order of the first calculation unit is consistent with the circulation order of the second calculation unit under the circulation after adjustment, so that the alignment of the node data between the producer node and the consumer node is realized, and the routing cost between the producer node and the consumer node is reduced.

Specifically, the second type of producer node comprises a multi-layer loop, and the first computing unit comprises coordinate parameters respectively corresponding to the multi-layer loop; thus, when updating the loops of the second class producer node, each loop in the second class producer node can be traversed to confirm whether the corresponding loop needs variable replacement or not; referring accordingly to FIG. 6, a method of updating loops in a producer node is illustrated and may include:

and S610, carrying out parameter relation analysis on the coordinate parameters corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameters corresponding to the corresponding circulation in the second calculation unit to obtain the parameter corresponding relation corresponding to the at least one layer of circulation.

For the original coordinates of the producer node, the coordinate parameters in the producer node are original parameters, the consumer node can perform data calculation depending on the original coordinates, the coordinate parameters are application parameters when the original coordinates are applied, and the original parameters and the application parameters can be the same or different. As shown in example 3, for the for x:2 to 0 loop in producer node 2, the original coordinates of producer node 2 are c [ y ]][x]Original parameters corresponding to for x 2 to 0 cycles are [ x ]When the consumer node applies the data of the producer node 2, the application coordinate in the consumer node is c [ y ]][2-x]The application parameter corresponding to the for x 0 to 2 cycle is [2-x ]]That is, the original parameters are different from the application parameters, so that the corresponding relation of the parameters is that the coordinate parameters corresponding to the current cycle of the producer node are inconsistent with the coordinate parameters corresponding to the corresponding cycle in the second calculation unit, and the parameter mapping information is that. For the for y 0 to 2 loop, the original coordinates of the producer node 2 are cy][x]The application coordinates in the consumer node are c y][2-x]The original parameter corresponding to the for y 0 to 2 cycle is [ y ]]The application parameter corresponding to the for y 0 to 2 cycle is [ y ]]That is, the original parameters are the same as the application parameters, so that the corresponding relation of the parameters is that the coordinate parameters corresponding to the current cycle of the producer node are consistent with the coordinate parameters corresponding to the corresponding cycle in the second calculation unit.

S620, determining the second parameter corresponding relation based on the parameter corresponding relation corresponding to the at least one layer of circulation.

And according to the parameter corresponding relation of each cycle of the second class producer node obtained in the step S610, obtaining a second parameter corresponding relation corresponding to each data dependency relation group.

S630, traversing the multi-layer loops of the second class producer nodes, and executing the following operations on each layer of loops: under the condition that the parameter corresponding relation corresponding to the current cycle comprises second parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the second parameter mapping information to obtain an updated cycle corresponding to the current cycle; and the second parameter mapping information represents the mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the second calculation unit.

Under the condition that the parameter corresponding relation corresponding to the current circulation comprises second parameter mapping information, describing that the current circulation in the second class producer node needs to be subjected to variable replacement, wherein the replacement basis can be the second parameter mapping information, and carrying out variable replacement on the current circulation to obtain updated circulation; the second parameter mapping information may be a mapping relation, for example. As in example 3, based on +.>Replacement of +.>The for x:0 to 2 loop in the producer node 2 may be updated to for x:2 to 0 loop. And under the condition that the parameter corresponding relation corresponding to the current cycle does not comprise the second parameter mapping information, the variable replacement of the current cycle is not required.

S640, obtaining the second updating cycle based on at least one updated cycle.

In the event that an updated cycle exists, a second updated cycle may be derived based on the updated cycle; in the case where there is both a cycle requiring an update and a cycle not requiring an update, a second update cycle corresponding to the consumer node may be derived based on the updated cycle and the cycle not requiring an update.

In the case that the second class of producer nodes comprise multi-layer loops, parameter relation analysis can be performed on the coordinate parameters under each layer of loops and the coordinate parameters of corresponding consumers respectively, and loop variable replacement is performed on the loop layers corresponding to the parameters with unmatched loop orders, so that the loop orders of the first computing unit and the loop orders of the second computing unit are uniform in the multi-layer loops, alignment of node data between the producer nodes and the consumer nodes in each layer of loops is further realized, and routing cost between the producer nodes and the consumer nodes is reduced.

Further, in the case that the producer nodes in each data dependency relationship group include a second type of producer node, determining a second parameter correspondence between the first computing unit and the second computing unit includes:

Determining the parameter corresponding relation between the first computing unit and the second computing unit of each second class producer node under the condition that the number of the second class producer nodes is a plurality of;

and determining the second parameter corresponding relation based on the parameter corresponding relation between the first computing unit and the second computing unit of each second class producer node.

Under the condition that the number of the second class producer nodes is a plurality of, the analysis operation of the parameter corresponding relation between each second class producer node and the consumer node is needed to be carried out, so that the circulation variable of each second class producer node can be adjusted based on the parameter corresponding relation between each second class producer node and the consumer node, the circulation order of the first calculation units in each second class producer node is consistent with the circulation order of the second calculation units in the consumer node, the data alignment of each second class producer node and the consumer node is realized, and the routing cost between the producer node and the consumer node is reduced.

In a specific embodiment, each operator node in the original calculation graph respectively comprises at least one layer of loops, and the data alignment of each operator node can be realized by adjusting the sequence among the loop layers of the operator nodes, namely reordering the multiple loops of the operator nodes; referring specifically to fig. 7, yet another node data alignment processing method is shown, which may include:

S710, generating a dimension dependent graph based on at least one layer of circulation contained in each operator node in the original calculation graph.

The dimension dependency graph comprises a plurality of operator nodes and a cyclic dimension node sequence corresponding to each operator node, each layer of cycle of each operator node corresponds to one cyclic dimension node, and the dimension sequence of each cyclic dimension node in the cyclic dimension node sequence corresponding to each operator node is consistent with the cyclic nesting sequence of each operator node. The cyclic dimension nodes with data dependency relationships are connected through dimension dependency edges.

Referring to FIG. 8, a schematic diagram of a dimension dependency graph is shown, whereinRepresenting the ith operator, the dots under each operator represent one cyclic dimension in the operator for cyclic nested representation, and the links between the dots represent the dependency between the different operator dimensions.

And S720, under the condition that the producer nodes in each data dependency relationship group comprise the first type of producer nodes, adjusting the dimension sequence of the cyclic dimension nodes corresponding to the consumer nodes in each data dependency relationship group to obtain an adjusted first node sequence.

The routing cost between the consumer node and the first type producer node after the dimension sequence adjustment is smaller than the routing cost between the consumer node and the first type producer node before the dimension sequence adjustment; specifically, in the first node sequence and the cyclic dimension node sequence corresponding to the first type of producer nodes, the dimension order of the cyclic dimension nodes with the data dependency relationship may be the same; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero.

Adjusting the order of the dimensions of the operator nodes is embodied on a dimension dependency graph as adjusting the order of dots representing different dimensions of a certain operator, and numbering the different dimensions of one operator node from top to bottom asThe original dimension number sequence is +.>Conversion toThe shuffle map should ensure that the route locality cost between producer and consumer is reduced.

And S730, obtaining the target dependency relationship group based on the first node sequence.

By adjusting the dimension sequence of the cyclic dimension node corresponding to the consumer node, as can be seen from the dimension dependency graph, the edges before the dimension sequence adjustment are crossed, and the edges after the dimension sequence adjustment are not crossed; therefore, the data alignment of the consumer node and the first type producer node on the circulation level can be realized, and the routing cost between the consumer node and the first type producer node is reduced.

Further, when the producer node in each data dependency group includes a first type of producer node, adjusting a dimension sequence of a cyclic dimension node corresponding to the consumer node in each data dependency group to obtain an adjusted first node sequence, including:

determining a corresponding relation of the dimension sequence of the circulating dimension nodes corresponding to the consumer nodes under the condition that the number of the first type producer nodes is a plurality of;

and adjusting the dimension sequence of the circulating dimension nodes corresponding to the consumer nodes based on the dimension sequence corresponding relation of the circulating dimension nodes corresponding to each first class of producer nodes and the circulating dimension nodes corresponding to the consumer nodes to obtain the first node sequence.

Under the condition that the number of the first class producer nodes is a plurality of, the analysis operation of the dimension sequence corresponding relation between each first class producer node and the consumer node is needed, and further, the dimension sequence of the circulating dimension nodes of the consumer node can be adjusted based on the dimension sequence corresponding relation between each first class producer node and the consumer node, so that the dimension sequence of the circulating dimension nodes with the data dependency relation in the first node sequence and the circulating dimension node sequence corresponding to the first class producer node is the same, the data alignment of each first class producer node and the consumer node is realized, and the routing cost between the producer node and the consumer node is reduced.

In one embodiment, in a case where the producer nodes in each data dependency group include a first type of producer nodes, adjusting a dimension order of the cyclic dimension nodes corresponding to the consumer nodes in each data dependency group, to obtain an adjusted first node sequence, the method further includes:

under the condition that the producer nodes comprise second-class producer nodes, adjusting the dimension sequence of the cyclic dimension nodes corresponding to the second-class producer nodes to obtain an adjusted second node sequence; the dimension sequence of the circulating dimension nodes with the data dependency relationship in the second node sequence and the circulating dimension node sequence corresponding to the consumer node is the same; the degree of invasiveness of the second class producer node in the original computational graph is equal to zero.

Adjusting the order of the dimensions of the operator nodes is embodied on a dimension dependency graph as adjusting the order of dots representing different dimensions of a certain operator, and numbering the different dimensions of one operator node from top to bottom asThe original dimension number sequence is +. >Shift to->The shuffle map should ensure that the route locality cost between producer and consumer is reduced.

The process of adjusting route locality based on dimension dependency graphs is described below in a specific example; referring to FIG. 9, which illustrates a dimension dependency graph to be adjusted, wherein the route locality between S0 and S1 is adjusted, and the for loop nesting of the two operators S0 and S1 is represented as follows:

S0：

for y: 0 to 1:

for x: 0 to 1:

a[y][x]= ....

S1：

for y: 0 to 1:

for x: 0 to 1:

b[y][x]= a[x][y]；

for the data dependency relationship group of S0 and S1, S0 is a producer node, S1 is a consumer node, and producer node S0 is a second type producer node, and there is no first type producer node, so that the dimension sequence of producer node S0 can be directly adjusted, so that the dimension dependency graph in fig. 10 can be obtained after the dots representing two dimensions in S0 are exchanged, and if the route locality in the dimensions is not considered, the route locality cost between S0 and S1 is reduced to 0.

Further, as shown in fig. 10, the route locality between S1 and S2 needs to be adjusted, and for the data dependency group of S1 and S2, which includes both the first type producer node and the second type producer node, the dimension sequence of the consumer node S2 may be adjusted, so that the original dimension sequence in S2 Adjust to->Thereafter, the dimension dependency graph in fig. 11 can be obtained, and if the route locality in the dimension is not considered, the route locality cost between S1 and S2 is reduced to 0.

On the one hand, when the first class producer node is subjected to data alignment, the dimension sequence of the consumer node is adjusted by taking the producer node as a reference, and when the second class producer node is subjected to data alignment, the dimension sequence of the producer node is adjusted by taking the consumer node as a reference, if the second class producer node is subjected to data alignment first, and then the first class producer node is subjected to data alignment, the problem that the dimension sequence of the second class producer node needs to be repeatedly adjusted exists, so that the second class producer node is subjected to data alignment after the first class producer node, thereby avoiding the problem of repeated adjustment of the dimension sequence of the second class producer node, and improving the data alignment efficiency; on the other hand, through adjusting the dimension sequence of the cyclic dimension nodes corresponding to the second class producer nodes, as can be seen from the dimension dependency graph, the edges before the dimension sequence adjustment are crossed, and the edges after the dimension sequence adjustment are not crossed; therefore, the data alignment of the consumer node and the second type producer node on the circulation level can be realized, and the routing cost between the consumer node and the first type producer node is reduced.

Further, when the producer node includes a second type of producer node, adjusting a dimension sequence of a cyclic dimension node corresponding to the second type of producer node to obtain an adjusted second node sequence, including:

determining a corresponding relation of the dimension sequence of the circulating dimension nodes corresponding to the consumer nodes under the condition that the number of the second class producer nodes is a plurality of;

and adjusting the dimension sequence of the cyclic dimension nodes corresponding to the second class producer nodes based on the dimension sequence corresponding relation of the cyclic dimension nodes corresponding to the second class producer nodes and the cyclic dimension nodes corresponding to the consumer nodes to obtain the second node sequence.

Under the condition that the number of the second class producer nodes is multiple, the analysis operation of the dimension sequence corresponding relation between each second class producer node and the consumer node is needed, and further the dimension sequence of the circulating dimension node of the second class producer node can be adjusted based on the dimension sequence corresponding relation between each second class producer node and the consumer node, so that the data alignment of each second class producer node and the consumer node is realized, and the routing cost between the producer node and the consumer node is reduced.

The following describes a specific implementation procedure of the present application with a specific example, where the updating of the cycle of the consumer node or the producer node may correspond to the route locality optimization based on the single-layer cycle data rearrangement, and the adjusting of the dimension sequence of the cycle dimension node of the consumer node or the producer node may correspond to the route locality optimization based on the cycle reordering; thus, the implementation flow of the application can comprise:

s1, calculating the route locality cost of each consumer node on the original calculation graph.

S2, performing topological sorting on all operator nodes on the calculation graph.

S3, route locality optimization is conducted on each producer-consumer relationship based on single-layer circulation data rearrangement. The different loops of the consumer node are respectively carried out as follows, wherein the execution process of each step is as follows:

s3.1, selecting a pair of producer nodes and consumer nodes in the current loop, and executing S3.1.1 or S3.1.2 according to whether the producer nodes meet the judgment condition 1. Priority execution S3.1.2 is performed, and when the current consumer node does not have a corresponding producer satisfying judgment condition 1, execution S3.1.1 is performed again.

Judging condition 1: the producer is a Constant operator node, i.e., a second type of producer node. The Constant operator is defined as: if one operator node does not use the data generated by other operator nodes in the calculation process, the operator node is a Constant operator node.

S3.1.1 if the producer node is a Constant operator node, the producer's loop is directly adjusted to optimize the route locality between the producer node and the consumer node.

S3.1.2 if the producer node is not a Constant operator node, the route locality between the producer node and the consumer node is optimized by adjusting the loop of the consumer node.

S3.2, returning to S3.1, and adjusting the route locality between other producer nodes and consumer nodes under the current cycle, so that the cost of the route locality is as small as possible. The application is not limited to a specific method of reduction, and may employ greedy algorithm, a-algorithm, dynamic programming, and the like, for example. In order to minimize, it may be necessary to traverse through all producer-consumer pairs multiple times, ending S3 when decision condition 2 is met.

Judging condition 2:to a minimum, or the number of iterations to an upper limit.

S4, constructing a dimension dependency graph according to the for-loop nested representation of the operator nodes.

And S5, for each producer-consumer relationship, performing route locality optimization based on the circular reordering according to the dimension dependency graph obtained in the S4. The method comprises the following specific steps:

s5.1, selecting a pair of producer nodes and consumer nodes corresponding to the current consumer node, and executing S5.1.1 or S5.1.2 according to whether the producer meets the judgment condition 1. The priority execution S5.1.2 is executed again when the current consumer does not have a corresponding producer satisfying the judgment condition 1, and s5.1.1 is executed again.

S5.1.1, if the producer node is a Constant operator node, adjusting the dimension sequence of the producer node to optimize the route locality between the producer node and the consumer node.

S5.1.2 if the producer node is not a Constant operator node, the consumer node dimension order is adjusted to optimize route locality between the producer node and the consumer node.

S5.2, returning to S5.1, and adjusting the route locality between other producer nodes and the current consumer node so that the cost of the route locality is as small as possible. The application is not limited to a specific method of reduction, and may employ greedy algorithm, a-algorithm, dynamic programming, and the like, for example. In order to minimize, it may be necessary to traverse through all producer-consumer pairs multiple times, ending S5 when decision condition 2 is met.

The execution sequence of the steps S3-S5 is described as follows:

in the above process, S3 and S5 are iteratively performed on the calculation map. Possible iterative approaches include:

1. the execution of S3 for all producer-consumer relationships and then the execution of S5, S4 for all producer-consumer relationships needs to be completed before the execution of S5, S4 needs to be executed only once, and the iterative manner is shown in algorithm 1.

2. S3 is performed on a set of producer-consumer relationships, S5 is performed, and then all producer-consumer relationships are traversed on the graph and the process is iterated, and S4 is performed before S5 is performed, as shown in algorithm 2.

Algorithm 1

S4 may be performed herein

Each operator node in the for computation graph

Taking the operator node as a consumer node

if current consumer does not have producer nodes

continuous (end current cycle)

else

for each layer of loops of the consumer

while does not satisfy judgment condition 2

Each producer node for consumers

if judgment condition 1

S3.1.1

else

S3.1.2

S4 may also be performed herein

Each operator node in the for computation graph

Taking the operator node as a consumer node

if current consumer does not have producer nodes

continue

else

while does not satisfy judgment condition 2

Each producer node for consumers

if judgment condition 1

S5.1.1

else

S5.1.2

Algorithm 2

Execution S4

Each operator node in the for computation graph

Taking the operator node as a consumer node

if current consumer does not have producer nodes

continue

else

for each layer of loops of the consumer

while does not satisfy judgment condition 2

Each producer node for consumers

if judgment condition 1

S3.1.1

else

S3.1.2

while does not satisfy judgment condition 2

Each producer node for consumers

if judgment condition 1

S5.1.1

else

S5.1.2

The application defines a route locality cost model on a computational graph, the cost model can well measure the alignment degree of data dependence among computational graph nodes, and based on the route locality cost model, the application provides an optimization method for hardware-independent computational graph data rearrangement and cyclic rearrangement, a method for optimizing route locality by single-layer cyclic data rearrangement, and a method for establishing a dimension dependent graph and carrying out cyclic reordering through the dimension dependent graph so as to optimize route locality.

It should be noted that any of the methods described in this embodiment may be combined based on actual implementation conditions, and have corresponding beneficial effects, which are not described herein.

Referring to fig. 12, the present embodiment further provides a data processing apparatus based on a many-core architecture, including:

the original calculation map obtaining module 1210 is configured to obtain an original calculation map corresponding to the target task; the original calculation graph comprises a plurality of operator nodes, wherein the operator nodes with data dependency relations form at least one data dependency relation group, and each data dependency relation group comprises a consumer node and at least one producer node corresponding to the consumer node;

the data alignment processing module 1220 is configured to perform node data alignment processing on the producer node and the consumer node in each data dependency group, so as to obtain a target dependency group; the routing cost between the producer node and the consumer node in the target dependency group is less than the routing cost between the producer node and the consumer node in each data dependency group;

a target computational graph generation module 1230 for generating a target computational graph based on the set of target dependencies;

A data processing module 1240, configured to perform data processing based on the many-core architecture deployed with the target computational graph, to obtain a data processing result corresponding to the target task; the core units in the many-core architecture correspond to operator nodes in the target computational graph; and the routing cost among operator nodes in the target computation graph represents the routing traffic among the core units in the many-core architecture.

Further, the producer nodes in each data dependency group comprise at least one layer of loops and a first computing unit, and the consumer nodes in each data dependency group comprise at least one layer of loops and a second computing unit; the data alignment processing module 1220 includes:

a first determining module, configured to determine a first parameter correspondence between the first computing unit and the second computing unit when the producer node includes a first type of producer node; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero;

the first replacement module is used for carrying out variable replacement on the circulation variable in the at least one layer of circulation of the consumer node based on the first parameter corresponding relation to obtain a first update circulation corresponding to the consumer node; the routing cost between the consumer node after the update cycle and the first type producer node is smaller than the routing cost between the consumer node before the update cycle and the first type producer node;

And the first generation module is used for obtaining the target dependency relationship group based on a first updating cycle corresponding to the consumer node.

Further, the consumer node comprises a multi-layer loop, and the second computing unit comprises coordinate parameters respectively corresponding to the multi-layer loop;

the first determining module includes:

the first analysis module is used for carrying out parameter relation analysis on the coordinate parameter corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameter corresponding to the corresponding circulation in the first calculation unit to obtain a parameter corresponding relation corresponding to the at least one layer of circulation;

the second determining module is used for determining the first parameter corresponding relation based on the parameter corresponding relation corresponding to the at least one layer of circulation;

the first generation module includes:

a second replacement module, configured to traverse the multi-layer loop of the consumer node, and perform the following operations for each layer of loop: under the condition that the parameter corresponding relation corresponding to the current cycle comprises first parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the first parameter mapping information to obtain an updated cycle corresponding to the current cycle; the first parameter mapping information represents the mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the first computing unit;

And the second generation module is used for obtaining the first updating cycle based on at least one updated cycle.

Further, the first determining module includes:

a third determining module, configured to determine, when the number of the first type producer nodes is plural, a parameter correspondence between a first computing unit and the second computing unit of each first type producer node;

and the fourth determining module is used for determining the first parameter corresponding relation based on the parameter corresponding relation between the first computing unit and the second computing unit of each first type producer node.

Further, the apparatus further comprises:

a fifth determining module, configured to determine a second parameter correspondence between the first computing unit and the second computing unit when the producer node includes a second type producer node; the degree of invasiveness of the second class producer node in the original calculation graph is equal to zero;

the third replacing module is used for carrying out variable replacement on the circulating variable in the at least one layer of circulation of the second type producer node based on the second parameter corresponding relation to obtain a second updating circulation corresponding to the second type producer node; the cycle order of the first computing unit under the second update cycle is consistent with the cycle order of the second computing unit;

The first generation module includes:

and the third generation module is used for obtaining the target dependency relationship group based on the first updating cycle and the second updating cycle.

Further, the second type producer node comprises a multi-layer loop, and the first computing unit comprises coordinate parameters respectively corresponding to the multi-layer loop;

the fifth determination module includes:

the second analysis module is used for carrying out parameter relation analysis on the coordinate parameter corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameter corresponding to the corresponding circulation in the second calculation unit to obtain a parameter corresponding relation corresponding to the at least one layer of circulation;

a sixth determining module, configured to determine the second parameter correspondence based on the parameter correspondence corresponding to the at least one layer of cycle;

the third replacement module includes:

a fourth replacement module, configured to traverse the multi-layer loop of the second type producer node, and perform the following operations on each layer of loop: under the condition that the parameter corresponding relation corresponding to the current cycle comprises second parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the second parameter mapping information to obtain an updated cycle corresponding to the current cycle; the second parameter mapping information represents the mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the second calculation unit;

And a fourth generation module, configured to obtain the second update cycle based on at least one updated cycle.

Further, the fifth determining module includes:

a seventh determining module, configured to determine, when the number of the second type producer nodes is plural, a parameter correspondence between a first computing unit and the second computing unit of each second type producer node;

and an eighth determining module, configured to determine the second parameter correspondence based on the parameter correspondence between the first computing unit and the second computing unit of each second class producer node.

Further, each operator node in the original computational graph comprises at least one layer of loops respectively;

the data alignment processing module includes:

the dimension dependency graph generation module is used for generating a dimension dependency graph based on at least one layer of loops respectively contained in each operator node in the original calculation graph; the dimension dependency graph comprises a plurality of operator nodes and a cyclic dimension node sequence corresponding to each operator node, each layer of cycle of each operator node corresponds to one cyclic dimension node, and the dimension sequence of each cyclic dimension node in the cyclic dimension node sequence corresponding to each operator node is consistent with the cyclic nesting sequence of each operator node; the cyclic dimension nodes with the data dependency relationship are connected through dimension dependency edges;

The first adjustment module is used for adjusting the dimension sequence of the cyclic dimension nodes corresponding to the consumer nodes in each data dependency relationship group to obtain an adjusted first node sequence under the condition that the producer nodes in each data dependency relationship group comprise the first type producer nodes; the routing cost between the consumer node and the first type producer node after the dimension sequence adjustment is smaller than the routing cost between the consumer node and the first type producer node before the dimension sequence adjustment; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero;

and a fifth generating module, configured to obtain the target dependency relationship group based on the first node sequence.

Further, the first adjustment module includes:

a ninth determining module, configured to determine, when the number of the first type producer nodes is plural, a cyclic dimension node corresponding to each first type producer node, and a dimension sequence correspondence relationship of the cyclic dimension node corresponding to the consumer node;

And the second adjustment module is used for adjusting the dimension sequence of the circulating dimension nodes corresponding to the consumer nodes based on the dimension sequence corresponding relation of the circulating dimension nodes corresponding to each first type of producer node and the circulating dimension nodes corresponding to the consumer nodes to obtain the first node sequence.

Further, the apparatus further comprises:

the third adjusting module is used for adjusting the dimension sequence of the cyclic dimension nodes corresponding to the second type of producer nodes to obtain an adjusted second node sequence under the condition that the producer nodes comprise the second type of producer nodes; the dimension sequence of the circulating dimension nodes with the data dependency relationship in the second node sequence and the circulating dimension node sequence corresponding to the consumer node is the same; the degree of invasiveness of the second class producer node in the original computational graph is equal to zero.

Further, the third adjustment module includes:

The device provided in the above embodiment can execute the method provided in any embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the above embodiments may be found in the methods provided by any of the embodiments of the present application.

The present embodiment also provides a computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program loaded by a processor and executed by a method according to any of the above embodiments.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform any of the methods described above.

Fig. 13 is a block diagram of an electronic device, which may be a server, for data processing based on a many-core architecture, the internal structure of which may be as shown in fig. 13, according to an example embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a data processing method based on a many-core architecture.

It will be appreciated by those skilled in the art that the structure shown in fig. 13 is merely a block diagram of a portion of the structure associated with the disclosed aspects and is not limiting of the electronic device to which the disclosed aspects apply, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The steps and sequences recited in the embodiments are merely one manner of performing the sequence of steps and are not meant to be exclusive of the sequence of steps performed. In actual system or interrupt product execution, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing).

The structures shown in this embodiment are only partial structures related to the present application and do not constitute limitations of the apparatus to which the present application is applied, and a specific apparatus may include more or less components than those shown, or may combine some components, or may have different arrangements of components. It should be understood that the methods, apparatuses, etc. disclosed in the embodiments may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and the division of the modules is merely a division of one logic function, and may be implemented in other manners, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or unit modules.

Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A data processing method based on a many-core architecture, comprising:

Generating a target calculation graph based on the target dependency group;

2. The method of claim 1, wherein the producer nodes in each data dependency group comprise at least one tier of loops and a first computing unit, and the consumer nodes in each data dependency group comprise at least one tier of loops and a second computing unit;

and performing node data alignment processing on the producer node and the consumer node in each data dependency relationship group to obtain a target dependency relationship group, wherein the method comprises the following steps of:

determining a first parameter correspondence of the first computing unit and the second computing unit in the case that the producer nodes include a first type of producer nodes; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero;

Performing variable replacement on the circulation variable in the at least one layer of circulation of the consumer node based on the first parameter corresponding relation to obtain a first update circulation corresponding to the consumer node; the routing cost between the consumer node after the update cycle and the first type producer node is smaller than the routing cost between the consumer node before the update cycle and the first type producer node;

and obtaining the target dependency relationship group based on a first updating cycle corresponding to the consumer node.

3. The method of claim 2, wherein the consumer node comprises a multi-layer loop, and the second computing unit comprises coordinate parameters corresponding to the multi-layer loop, respectively;

the determining the first parameter correspondence between the first computing unit and the second computing unit includes:

carrying out parameter relation analysis on the coordinate parameters corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameters corresponding to the corresponding circulation in the first calculation unit to obtain a parameter corresponding relation corresponding to the at least one layer of circulation;

determining the first parameter corresponding relation based on the parameter corresponding relation corresponding to the at least one layer of circulation;

The variable replacement is performed on the circulation variable in the at least one layer of circulation of the consumer node in each data dependency relationship group based on the first parameter correspondence, so as to obtain a first update circulation corresponding to the consumer node, including:

traversing the multi-tier loops of the consumer nodes, performing the following for each tier loop:

under the condition that the parameter corresponding relation corresponding to the current cycle comprises first parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the first parameter mapping information to obtain an updated cycle corresponding to the current cycle; the first parameter mapping information represents the mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the first computing unit;

the first update cycle is obtained based on at least one updated cycle.

4. The method according to claim 2, wherein, in the case that the producer nodes in each data dependency group include a first type of producer nodes, determining the first parameter correspondence of the first computing unit and the second computing unit includes:

5. The method according to claim 2, wherein the performing variable replacement on the loop variable in the at least one layer of loops of the consumer nodes in each data dependency group based on the first parameter correspondence, and after obtaining the first update loop corresponding to the consumer nodes, the method further comprises:

determining a second parameter correspondence of the first computing unit and the second computing unit in the case that the producer node includes a second type of producer node; the degree of invasiveness of the second class producer node in the original calculation graph is equal to zero;

performing variable replacement on the circulation variable in the at least one layer of circulation of the second type producer node based on the second parameter corresponding relation to obtain a second update circulation corresponding to the second type producer node; the cycle order of the first computing unit under the second update cycle is consistent with the cycle order of the second computing unit;

The obtaining the target dependency relationship group based on the first update cycle corresponding to the consumer node includes:

and obtaining the target dependency relationship group based on the first updating cycle and the second updating cycle.

6. The method of claim 5, wherein the second type of producer node comprises a multi-tier loop, and the first computing unit comprises coordinate parameters corresponding to the multi-tier loop, respectively;

and determining a second parameter corresponding relation between the first computing unit and the second computing unit under the condition that the producer nodes in each data dependency relation group comprise a second class of producer nodes, wherein the method comprises the following steps:

carrying out parameter relation analysis on the coordinate parameters corresponding to at least one layer of circulation in the multi-layer circulation and the coordinate parameters corresponding to the corresponding circulation in the second calculation unit to obtain a parameter corresponding relation corresponding to the at least one layer of circulation;

determining the second parameter corresponding relation based on the parameter corresponding relation corresponding to the at least one layer of circulation;

the variable replacement is performed on the circulation variable in the at least one layer of circulation of the second class of producer nodes based on the second parameter corresponding relation to obtain a second update circulation corresponding to the second class of producer nodes, including:

Traversing the multi-tier loops of the second class producer nodes, performing the following for each tier loop:

under the condition that the parameter corresponding relation corresponding to the current cycle comprises second parameter mapping information, carrying out variable replacement on a cycle variable in the current cycle based on the second parameter mapping information to obtain an updated cycle corresponding to the current cycle; the second parameter mapping information represents the mapping relation between the coordinate parameters corresponding to the current cycle and the coordinate parameters corresponding to the corresponding cycle in the second calculation unit;

the second update cycle is obtained based on at least one updated cycle.

7. The method of claim 5, wherein determining the second parameter correspondence of the first computing unit and the second computing unit in the case where the producer nodes in each data dependency group include a second type of producer node comprises:

8. The method of claim 1, wherein each operator node in the original computational graph comprises at least one layer of loops, respectively;

generating a dimension dependency graph based on at least one layer of loops respectively contained by operator nodes in the original calculation graph; the dimension dependency graph comprises a plurality of operator nodes and a cyclic dimension node sequence corresponding to each operator node, each layer of cycle of each operator node corresponds to one cyclic dimension node, and the dimension sequence of each cyclic dimension node in the cyclic dimension node sequence corresponding to each operator node is consistent with the cyclic nesting sequence of each operator node; the cyclic dimension nodes with the data dependency relationship are connected through dimension dependency edges;

under the condition that the producer nodes in each data dependency relationship group comprise a first type of producer nodes, adjusting the dimension sequence of the circulating dimension nodes corresponding to the consumer nodes in each data dependency relationship group to obtain an adjusted first node sequence; the routing cost between the consumer node and the first type producer node after the dimension sequence adjustment is smaller than the routing cost between the consumer node and the first type producer node before the dimension sequence adjustment; operator nodes with data dependency relations in the original calculation graph are connected through edges, and the degree of incidence of the first type producer nodes in the original calculation graph is larger than zero;

And obtaining the target dependency relationship group based on the first node sequence.

9. The method according to claim 8, wherein, in the case that the producer nodes in each data dependency group include a first type of producer nodes, adjusting a dimension order of the cyclic dimension nodes corresponding to the consumer nodes in each data dependency group to obtain an adjusted first node sequence, including:

10. The method according to claim 8, wherein, in the case where the producer nodes in each data dependency group include a first type of producer nodes, adjusting a dimension order of the cyclic dimension nodes corresponding to the consumer nodes in each data dependency group to obtain an adjusted first node sequence, the method further comprises:

11. The method according to claim 10, wherein, in the case that the producer node includes a second type of producer node, adjusting a dimension order of a cyclic dimension node corresponding to the second type of producer node, to obtain an adjusted second node sequence, includes:

12. A many-core architecture-based data processing apparatus, comprising:

13. An electronic device comprising a processor and a memory, wherein the memory has stored therein at least one instruction or at least one program that is loaded and executed by the processor to implement a many-core architecture based data processing method according to any of claims 1 to 11.

14. A computer storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded by a processor and executing a data processing method based on a many-core architecture according to any of claims 1 to 11.