CN116909573A

CN116909573A - Node fusion method and device for calculation graph, electronic equipment and storage medium

Info

Publication number: CN116909573A
Application number: CN202310979973.3A
Authority: CN
Inventors: 贾宏宇; 张云飞; 陈威行; 于佃海; 马艳军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-10-20

Abstract

The disclosure provides a node fusion method, a node fusion device, electronic equipment and a storage medium of a computational graph, relates to the technical field of artificial intelligence, and particularly relates to the field of deep learning. The specific implementation scheme is as follows: in response to receiving a regular expression associated with a computation graph in a deep learning compiler, resolving the regular expression to obtain a target fusion strategy, wherein the regular expression is obtained according to the self-defined setting of a candidate fusion strategy for fusing operator nodes in the computation graph; and fusing the operator nodes in the computational graph based on the target fusion strategy.

Description

Node fusion method and device for calculation graph, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the field of deep learning, and specifically relates to a node fusion method, a node fusion device, electronic equipment and a storage medium of a computational graph.

Background

The deep learning compiler is used for building a bridge between the artificial intelligent algorithm model and the machine code, is one of core components of a deep learning framework, and plays a vital role in a deep learning system. The deep learning compiler upwards accepts a model algorithm which is designed by a researcher and is constantly changed, and downwards adapts to the machine binary code output under a multi-hardware scene, so that the resource constraint of different hardware back ends is met.

The deep learning model built by the developer is converted into a calculation graph by the front end of the compiler, which is equivalent to abstract calculation logic and control flow in the model into a graph structure.

Disclosure of Invention

The disclosure provides a node fusion method, a node fusion device, electronic equipment and a storage medium of a computation graph.

According to an aspect of the present disclosure, there is provided a node fusion method of a computation graph, including: in response to receiving a regular expression associated with a computational graph in a deep learning compiler, resolving the regular expression to obtain a target fusion strategy, wherein the regular expression is obtained according to a candidate fusion strategy user-defined setting for fusing operator nodes in the computational graph; and fusing operator nodes in the computational graph based on the target fusion strategy.

According to another aspect of the present disclosure, there is provided a node fusion apparatus of a computation graph, including: the analysis module is used for responding to the received regular expression associated with the calculation graph in the deep learning compiler, analyzing the regular expression to obtain a target fusion strategy, wherein the regular expression is obtained according to the self-defined setting of a candidate fusion strategy for fusing operator nodes in the calculation graph; and the fusion module is used for fusing the operator nodes in the calculation graph based on the target fusion strategy.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the node fusion method of the computational graph of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a node fusion method of a computational graph of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements a node fusion method of a computational graph of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which the node fusion method and apparatus of computational graphs may be applied, in accordance with embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a node fusion method of a computational graph according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of horizontal fusion and vertical fusion according to an embodiment of the disclosure;

fig. 4 schematically illustrates an AST node class diagram according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a block diagram of an AST syntax tree resulting from regular expression parsing, in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flowchart of the execution of a WhileRegexNode2 node corresponding to ([ HV ] I) policy, according to an embodiment of the disclosure;

fig. 7 schematically illustrates a policy resolution diagram corresponding to H ([ HV ] I) ([ RV ] I) fusion policy according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a node fusion apparatus of a computational graph, according to an embodiment of the present disclosure; and

fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Developers and algorithm researchers have placed diversified demands on deep learning compilers for versatility, flexibility, high performance, and the like. The universality is used for controlling the compiler to support diversified model structures and adapting to different operators called by upper-layer users. Flexibility characterization provides a user with a sufficient degree of flexibility in the optimization process of the compiler, so that the user can freely control the optimization process and selectively apply the optimization strategy. In addition, the user has strong demands on high performance, the deep learning compiler ensures efficient execution of the algorithm as much as possible, improves the operation performance, and reduces the hardware energy consumption.

The optimization strategy of the deep learning compiler plays a key role in the performance of the finally generated codes, and the calculation graph fusion is an important optimization means of the deep learning compiler. Design schemes for graph fusion strategies in deep learning compiler products include MLIR (compiler infrastructure class software), TVM (deep learning oriented model compiler), triton (an inference server), and the like.

MLIR provides Pass affine-loop-fusion strategy for graph fusion. The Pass supports producer-consumer fusion and serving fusion. MLIR appears as three fusion modes: producer-consumer fusion was performed alone. The sizing fusion was performed alone. The producer-consumer fusion is performed first, followed by the sizing fusion.

TVM provides a Pass FuseOps (operator based fusion strategy) for graph fusion, where Pass is fused according to OpPattern Kind of Group, and fusion is performed between groups if a specific OpPattern Kind constraint is satisfied between two groups. However, because of the OpPattern Kind value of Group, the corresponding semantics are deterministic and the user plane cannot be modified.

And the Triton adopts a scoring mode, the fusion constraint conditions and the fusion priorities among the nodes are embedded into a scoring function, and the fusion nodes are selected through the scoring function. The Triton protocol includes: and scoring any two nodes, and then selecting the two nodes with the highest scores for fusion.

The inventor finds that the combination mode of the fusion strategy is fixed in the process of realizing the conception of the disclosure, and the combination mode is embedded into the low-level realization of the compiler, and can only be controlled by the developer of the compiler, an upper user does not have a space for self-defining a fusion mechanism, can not freely configure the fusion strategy, the user has to learn the realization scheme of the fusion strategy, then the fusion strategy is adapted in an optimization algorithm, the learning cost is high, and the configuration flexibility is poor. The fusion strategy limits the flexibility of the compiler, limits the performance optimization capability of upper users, and also brings resistance to the ecological development of the deep learning compiler.

FIG. 1 schematically illustrates an exemplary system architecture to which the node fusion method and apparatus of computational graphs may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the node fusion method and apparatus of the computation graph may be applied may include a terminal device, but the terminal device may implement the node fusion method and apparatus of the computation graph provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (merely an example) providing support for content browsed by the user with the first terminal apparatus 101, the second terminal apparatus 102, the third terminal apparatus 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, the node fusion method of the computation graph provided in the embodiments of the present disclosure may be generally performed by the first terminal device 101, the second terminal device 102, or the third terminal device 103. Accordingly, the node fusion apparatus of the computation graph provided in the embodiments of the present disclosure may also be provided in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

Alternatively, the node fusion method of the computation graph provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the node fusion apparatus of the computation graph provided by the embodiments of the present disclosure may be generally disposed in the server 105. The node fusion method of the computation graph provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the node fusion apparatus of the computation graph provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

For example, when nodes of the computation graph need to be fused, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire a regular expression associated with the computation graph in the deep learning compiler, where the regular expression is obtained according to a candidate fusion policy customization setting for fusing operator nodes in the computation graph. And then the obtained regular expression is sent to the server 105, the server 105 responds to the received regular expression, analyzes the regular expression to obtain a target fusion strategy, and fuses operator nodes in the computational graph based on the target fusion strategy. Or the regular expression is parsed by a server or a server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and operator nodes in the computation graph are fused based on a target fusion strategy obtained by the parsing.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow chart of a node fusion method of a computation graph according to an embodiment of the disclosure.

As shown in fig. 2, the method includes operations S210 to S220.

In operation S210, in response to receiving a regular expression associated with a computation graph in the deep learning compiler, the regular expression is parsed to obtain a target fusion policy, the regular expression being obtained according to a candidate fusion policy custom setting for fusing operator nodes in the computation graph.

In operation S220, operator nodes in the computation graph are fused based on the target fusion policy.

According to embodiments of the present disclosure, a computational graph may be composed of one or more operator nodes connected. The operator node may be a node representing at least one of addition calculation, multiplication calculation, division calculation, reduce (subtraction) calculation, sum (summation) calculation, and the like, and may not be limited thereto. The node fusion method of the computational graph provided by the disclosure can be a node fusion method defined for the computational graph formed by a plurality of operator nodes, and can be used for selecting the operator nodes for fusion or fusing the nodes meeting constraint conditions into a subgraph. It should be noted that, the subgraphs without data dependency relationship can be executed in parallel, which is beneficial to improving the execution efficiency.

In accordance with embodiments of the present disclosure, there are a variety of fusion strategies that may be selected during node fusion, such as horizontal fusion, vertical fusion, recalculation, and the like. Candidate fusion policies may include at least one of horizontal fusion, vertical fusion, recalculation, other defined fusion policies, custom fusion policies, and the like.

According to the embodiment of the disclosure, the execution sequence, the execution turn and the fusion mode of different fusion strategies have important influence on the result of generating the subgraph. The regular expression can be obtained by formalizing at least one of the execution sequence and the execution times of one or more fusion strategies and the fusion mode into a state machine and describing the state machine by using the regular expression.

According to embodiments of the present disclosure, a fusion policy in a deep learning compiler implements two mechanism implementations based on a fusion policy definition and a fusion mechanism. Fusion policy definitions can be used to define which fusion modality to include, as well as the role and constraints of each fusion modality. The fusion mechanism implementation can be used for combining fusion strategies and specifying details such as the execution sequence, the turn and the like of the fusion strategies. At the user level, the user may first select the desired candidate fusion policy. Then, the regular expression can be obtained by self-defining setting according to the fusion mode, the execution sequence and the execution times of the selected candidate fusion strategy.

According to embodiments of the present disclosure, the above-described process of parsing a regular expression may take the form of post-parsing execution or edge traversal or edge execution. Through traversing or analyzing the regular expression, a target fusion strategy obtained based on traversing or analyzing can be realized, and operator nodes in the calculation graph are fused.

Through the embodiment of the disclosure, as the regular expression is introduced to construct a target fusion strategy for fusing operator nodes in the computational graph, a flexibly configurable deep learning compiler graph fusion mechanism is realized. Under the mechanism, an upper user can customize a target fusion strategy by inputting a simple regular expression and map the target fusion strategy to a deep learning compiler bottom layer to realize that the development cost for the user is close to 0, so that the development cost and the debugging cost of the fusion strategy are reduced, the method is friendly to the user, and the flexibility and the usability of the deep learning compiler are improved. In addition, each operator node corresponds to a generated Kernel function, after fusion among operator nodes is completed, the cost of function call can be reduced, the consumption of computer memory is reduced, and the computing performance is improved. And a larger optimization space can be provided for subsequent optimization work of the compiler.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to the embodiment of the disclosure, the basic unit of graph fusion can be Group, which can be understood as a sub-graph of a calculation graph, and the Group without dependency can be executed in parallel. Before the operator nodes in the computational graph are fused, preprocessing can be performed on each operator node in the computational graph, so that the Op (operator) nodes are packaged into Group operator nodes. A Group may include one or more ops.

According to an embodiment of the present disclosure, before performing the above operations S210 to S220, a candidate fusion policy, a target fusion policy, and a regular expression may be first determined.

According to embodiments of the present disclosure, the candidate fusion policy may include at least one of: vertical fusion, horizontal fusion, recalculation, input fusion, etc., and may not be limited thereto.

According to embodiments of the present disclosure, producer-consumer relationships may be constructed from the context dependencies of groups. For example, a producer of a Group may have one or more, and the producer's output may be used as input to the current Group. For example, a consumer of a Group may have one or more, and the output of the current Group may be used as input to the consumer.

Fig. 3 schematically illustrates a schematic diagram of horizontal fusion and vertical fusion according to an embodiment of the present disclosure.

As shown in fig. 3, the computational graph 300 includes a first Group 310, a second Group320, a third Group 330, and a fourth Group 340. The first Group 310 may act as a producer, the second Group320, the third Group 330, and the fourth Group 340 may act as consumers, and the first Group 310 output may act as an input to the second Group320, the third Group 330, and the fourth Group 340.

According to embodiments of the present disclosure, a fusion strategy for horizontal fusion may include: with a Group as an anchor point, such as first Group 310 in FIG. 3, all consumers of the Group are traversed, such as second Group320, third Group 330, and fourth Group 340 in FIG. 3. Judging whether the consumers can be fused or not according to the type information among the consumers, the number of data of tensors inside the consumers and the like, and then screening out a fusion candidate set and fusing. For example, in the case where it is determined that the fusion condition is satisfied, the third Group 330 and the fourth Group 340 in fig. 3 may perform horizontal fusion.

According to embodiments of the present disclosure, input fusion is a special case of horizontal fusion. Input fusion is directed to groups without producers that can directly process tensors of the user input model. The fusion policy of the input fusion may include: and acquiring a plurality of initial operator nodes sharing the input tensor according to the input tensor. And determining a plurality of target operator nodes with matched information according to the first input and output type information and the first tensor information of each of the plurality of initial operator nodes. And fusing the plurality of target operator nodes.

For example, the input fusion may traverse consumers sharing the same input tensor, screen out a fused candidate set according to a type relationship among the consumers, and fuse groups in the candidate set into a large Group.

According to embodiments of the present disclosure, a fusion strategy for vertical fusion may include: with a Group as an anchor point, such as first Group 310 in FIG. 3, all consumers of the anchor Group are traversed, such as second Group 320, third Group 330, and fourth Group 340 in FIG. 3. Judging whether the anchor Group can be fused with each consumer according to the type information and the like between the anchor Group and each consumer and the number of data of the anchor Group and the internal tensor of each consumer, and then screening out a fusion candidate set and fusing. For example, in the case where it is determined that the fusion condition is satisfied, the first Group 310 and the second Group 320 in fig. 3 may be vertically fused.

According to embodiments of the present disclosure, one special case of vertical fusion is recalculated. The recalculated fusion policy may include: and acquiring all consumer nodes corresponding to the operator group nodes according to the operator group nodes. And determining that the operator group node is matched with all the consumer nodes according to the second input and output type information and the second tensor information of the operator group node and the third input and output type information and the third tensor information of the consumer nodes, and respectively fusing the operator group node and each consumer node in all the consumer nodes.

For example, referring to the computational graph 300 shown in FIG. 3, a first Group 310 may be used as an anchor point to traverse all consumers of the Group: a second Group 320, a third Group 330, and a fourth Group 340. When the first Group 310 and all consumers meet the fusion condition, the second Group 320, the third Group 330 and the fourth Group 340 are respectively fused with the first Group 310, so as to obtain three fused groups.

It should be noted that, the Group fused by the above fusion policy based on the recalculation may perform repeated calculation, but the memory overhead of interaction between groups may be saved, so as to improve the calculation performance.

According to an embodiment of the present disclosure, the above-described type information (including, but not limited to, first input-output type information, second input-output type information, third input-output type information, and the like) may include at least one of the following: elementWise, broadcast, injective, reduction. ElementWise may characterize that the relationship between the input tensor index and the output tensor index is one-to-one. Broadcast may characterize the relationship between the input tensor index and the output tensor index as a one-to-many correspondence. Injective may characterize a single shot that maps the axis of the output tensor to the axis of the input tensor. Reduction may characterize that the relationship between the input tensor index and the output tensor index is a many-to-one correspondence.

According to embodiments of the present disclosure, the above information matching may take any of the following forms: in the scenario where the type information of two groups to be matched is ElementWise and Broadcast, respectively, all the outputs of the previous Group are required to be inputs of the second Group. Under the scene that the type information of the two groups to be matched is Reduction and Broadcast respectively, the number of output tensors of Broadcast groups is required to be consistent with the number of input tensors of the Reduction groups. In a scenario where the type information of two groups to be matched is Elementwise, the number of input tensors of the two groups is required to be consistent.

It should be noted that the expression forms of the above-mentioned information matching are only exemplary embodiments, but not limited thereto, and other expression forms known in the art, and expression forms in which rearrangement and combination determination is performed on the expression forms may be included as long as it is possible to determine whether the information matches.

According to the embodiment of the disclosure, the fusion process of the calculation graph can be compounded by a plurality of fusion strategies, and can comprise permutation and combination of the strategies and execution rounds of the strategies. After determining the candidate fusion policy, a target fusion policy may be determined according to the candidate fusion policy, and the determining method may include: and determining the number of execution rounds which are custom set for the candidate fusion strategy. And determining a policy permutation and combination mode of the custom settings for the multiple candidate fusion policies in response to determining the multiple candidate fusion policies based on the custom settings. And determining a target fusion strategy according to at least one of the number of execution rounds and the strategy arrangement combination mode and the candidate fusion strategy.

According to an embodiment of the present disclosure, the number of execution rounds may be set by setting a form of a predefined value, or may be set by setting a while loop or an until loop according to a preset constraint condition for a certain or a certain candidate fusion process based on a programming technique, and may not be limited thereto. The preset constraint conditions may include, for example: and continuously making a certain fusion strategy until the calculation map is not updated.

According to embodiments of the present disclosure, the candidate fusion strategy described above may be one or more. In the case where the candidate fusion policy is one, only the one candidate fusion policy may be determined as the target fusion policy. Under the condition that the candidate fusion strategy is one, the candidate fusion strategies can be arranged and combined to obtain the target fusion strategy. The determination method of the target fusion policy may not be limited to the embodiment described in the foregoing embodiment.

According to an embodiment of the disclosure, after determining the target fusion policy, the target fusion policy may be subjected to regular conversion according to a predefined regularization rule, for example, to obtain a regular expression.

According to embodiments of the present disclosure, predefined regularization rules may be used to define symbols in a regular expression, which may include, for example, the following predefined content:

H: horizontal Fusion, horizontal fusion strategy.

V: vertical Fusion strategy.

R: recompute, recalculate the policy.

I: input Fusion, input Fusion strategy.

[]: the sequence semantics, [ ] are slightly different from the semantics of a conventional regular expression, and the H, V and R policies described above can be included in [ (G ] to indicate that the output of the previous policy is the input of the latter policy.

Further, the output of the H, V, R three strategies will be described:

h: the consumers are fused, a plurality of consumers are fused into a Group, and the output of the H strategy is the fused Group.

V: the Producer Group and its consumer are fused, and finally, a Group is fused, and the output of the V strategy is the fused Group.

R: the Producer Group and each of its consumers are fused separately, and if there are N consumers of the Producer, then the output of the R policy is also N groups. Each output Group is a fusion of the Producer and its consumer. In the [ ] sequence semantic scenario, the output of one R strategy is randomly selected as the input of the subsequent optimization strategy. It should be noted that, the random selection process may be replaced by a process determined according to a custom rule, for example, the 1 st output of the R policy may be selected each time, and used as an input of a subsequent optimization policy, etc., and may not be limited thereto.

It should be noted that, the input fusion policy does not satisfy the sequence semantics, because the input fusion policy does not need to input groups, the policy does not use any Group as an anchor point, and there is no common Producer Group between groups performing input fusion. Still further, since the H, V, R policy will access each Group in the computation graph, it is necessary to perform a for loop traversal of the computation graph when executing the H, V, R policy. No for-loop traversal is required when executing the I-policy.

* : while (or until) loops semantics, which represent repeated execution until the computation graph is no longer updated, or other and constraint-on conditions are satisfied.

According to embodiments of the present disclosure, after determining a target fusion policy, a regular expression corresponding to the target fusion policy may be generated based on the predefined regularization rules described above.

For example, the target fusion policy is to repeatedly apply a horizontal fusion policy until no Group can fuse horizontally. Based on the predefined regularization rule, regularized conversion is carried out on the target fusion strategy, so that a regular expression can be obtained: h.

For example, when the fusion policy is applied to the Group, the target fusion policy is output Group after the horizontal fusion policy is applied, and the output Group is used as the input of the subsequent vertical fusion policy. Based on the predefined regularization rule, regularized conversion is carried out on the target fusion strategy, so that a regular expression can be obtained: [ HV ].

Through the above embodiment of the present disclosure, by defining the predefined regularization rule, a mechanism of self-defining a fusion policy can be implemented, and from the user plane, only the semantic concept of each symbol in the regular expression needs to be known, so that an upper user can control the underlying graph fusion policy through the self-defining regular expression. The regular expression can also facilitate the user to freely modify the fusion sequence, and improve the flexibility of the fusion strategy configuration.

According to embodiments of the present disclosure, after the fusion mechanism is abstracted into a regular expression based on the definition of the fusion policy and the general logical abstraction of the fusion mechanism, a process of parsing the regular expression may be performed.

According to an embodiment of the present disclosure, the parsing process may be represented in a form performed after parsing, and the above-described operation S210 may include: and analyzing the regular expression to generate a grammar tree. And determining a target fusion strategy according to the executive program characterized by the grammar tree.

In accordance with embodiments of the present disclosure, in response to receiving a regular expression, the deep learning compiler-level may first parse the regular expression. If the analysis result is correct, the target fusion strategy can be normally operated; otherwise, the user can be prompted and a default fusion strategy is adopted.

According to embodiments of the present disclosure, the parsing process described above may be implemented in the form of converting regular expressions into abstract syntax trees (abstract syntax code, abbreviated AST).

Fig. 4 schematically illustrates an AST node class diagram according to an embodiment of the disclosure.

As shown in fig. 4, the definition of the UML (classification between general and special elements) class diagram for abstract syntax tree correlation may include the following:

the root nodes that parse the regular expressions are fusion regexroot 410, with each regular expression corresponding to a unique one of the root nodes. There may be one or more fusion regexnodes 420 within the root node. Fusion regexnode is a virtual base class that provides only Run interfaces for subsequent enforcement policies. There may be multiple derived classes for fusion regexnode.

SequentialRegexNode 430: representing sequence nodes may correspond to a single H, V, R fusion policy, or representing sequence semantics. For example, when type_ = [ RV ], it is indicated that this node corresponds to the emphasis calculation+vertical fusion policy.

WhileRegexNode 440: representing a while (or until) loop node, the node may have multiple fusion regex node member variables, and the node may execute Run functions in the member variables in sequence until no update occurs to the Group in the computation graph.

InputREGEXNode 450: representing the input fusion policy.

According to the embodiment of the disclosure, when the regular expression is analyzed, the whole regular expression can be traversed from beginning to end, and then corresponding AST is generated according to the content of the character string. Then, the target fusion policy may be determined according to the executor characterized by the AST.

According to embodiments of the present disclosure, if V, H, R or I-tag is scanned and the outer layer has no symbol nesting, then each symbol may correspond to a separate node. V, H, R can correspond to a SequencialRegexNode node, and the value of type_of each node can be "V", "H", "R", respectively. I may correspond to an InputRegexNode node and the value of type_may be "I", representing an input fusion policy.

For example, the regular expression may be H, a node may be determined according to H, and a syntax tree of the regular expression H is obtained by combining fusion regexroot nodes in the UML class diagram. And may determine that the target fusion policy in this embodiment is to perform a one-time horizontal fusion policy.

Through the embodiment of the disclosure, the regular expression is parsed into the grammar tree, so that the regular expression can be mapped to the deep learning compiler bottom layer, and an automatic process for executing the target fusion strategy is realized.

According to an embodiment of the present disclosure, parsing the regular expression may include: and analyzing the character strings in the regular expression. In response to determining that the current parsed string is a first string that characterizes the first fusion policy, the first string is determined to be a child node. In response to determining that the current parsed string is a second string that characterizes the first fusion policy execution round, the second string is determined to be a parent node of the child node. A syntax tree is determined from the parent node and the child node.

According to embodiments of the present disclosure, if a "×mark is scanned, the string wrapped by" ×may correspond to a whiteregex node and may serve as a parent node.

For example, the regular expression may be H, the first string may be H, the second string may be H, H may be a child node, and x may be a parent node of H, to obtain a syntax tree of the regular expression H.

In accordance with an embodiment of the present disclosure, in response to determining that the currently parsed string is a first string that characterizes a first fusion policy, determining the first string as a child node may include: and in response to determining that the first current parsed string comprises a first sub-string representing a sequence string start flag and a second sub-string representing a sequence string end flag, acquiring a target sequence string positioned between the first sub-string and the second sub-string, wherein the sequence string start flag corresponds to the sequence string end flag one by one. The target sequence string is determined as a child node.

According to an embodiment of the present disclosure, the first string may be a part of or all of the strings in the regular expression, which is not limited herein. Based on the predefined regularization rules described above, a sequence string start flag may be "[", and a sequence string end flag may be "]".

According to embodiments of the present disclosure, if the "[", "]" tag is scanned, the string wrapped by "[", and "]" may correspond to one SequentialRegexNode and may act as a child node. Type_in the node may be a string within "[" and "]", representing the sequence semantics.

For example, the regular expression may be [ HV ] I, and the first string may be [ HV ] or I.

For example, the first string is [ HV ], the first string may be determined to include the first substring "[" and the second substring "]", and the target sequence string may be determined to be HV. In this embodiment, the target sequence string HV may be determined as a child node as a whole.

According to an embodiment of the present disclosure, determining a target fusion policy may include: and performing preamble traversal on the grammar tree to obtain semantic information represented by each tree node in the grammar tree. And determining an executive program according to semantic information represented by each tree node. The executive is determined to be the target fusion policy.

According to the embodiment of the disclosure, the semantic information may represent at least one of an execution manner of the target fusion policy, an execution round of the target fusion policy for a certain child node, and the like, and may not be limited thereto. Based on semantic information obtained from the previous traversal, an executive program can be determined, thereby determining a target fusion strategy.

For example, the regular expression is H ([ HV ] I) ([ RV ] I), and the AST syntax tree structure generated for the regular expression may be as shown in reference to fig. 5.

Fig. 5 schematically illustrates a block diagram of an AST syntax tree resulting from a parsing for a regular expression according to an embodiment of the present disclosure.

As shown in fig. 5, the execution portal of AST 500 may be fusion regexroot node 510, and the Run function in root node 510 may sequentially traverse and execute its member variables roots_, e.g., may sequentially traverse wheelregexnode 1 520, wheelregexnode 2 530, and wheelregexnode 3 540. Since the WhileRegexNode is a nested structure, child nodes in the WhileRegexNode can also be traversed in turn based on a preamble traversal manner when executing the WhileRegexNode. For example, for WhileRegexNode2 530, nodes such as SequencialRegexNode 2 531 and InputREGExNodel 532 may be traversed sequentially.

According to embodiments of the present disclosure, a target fusion policy may be determined based on the preamble traversal results for AST 500. Based on the target fusion strategy, operator nodes in the computational graph can be fused.

Fig. 6 schematically illustrates a flowchart of the execution of a wheelregexnode 2 node corresponding to ([ HV ] I) policy, according to an embodiment of the disclosure.

As shown in fig. 6, the flow may include operations S610 to S670.

In operation S610, all groups in the computation graph are traversed.

In operation S620, it is determined whether the next Group can be traversed. If yes, executing operation S630; if not, operation S660 is performed.

In operation S630, the type in the SequentialRegexNode2 is traversed.

In operation S640, it is determined whether the next type can be traversed. If yes, then execution proceeds to operation S650; if not, operation S620 is performed.

In operation S650, horizontal fusion, vertical fusion, or recalculation is performed on the current Group according to the type.

In operation S660, an input fusion is made to the current computational graph.

In operation S670, it is determined whether an update of the computation graph has not occurred. If yes, ending the flow; if not, operation S610 is performed.

According to an embodiment of the present disclosure, referring to fig. 6, it can be seen from the flowchart that, ([ HV ] I) corresponds to a nested structure, the outer layer is whileregex node2, and the inner layers are sequentialregex node2 and inputregex node1. The execution termination condition of the outer layer WhileRegexNode1 may be that the computation graph is no longer updated, otherwise the two nested nodes of the inner layer may be continuously executed. The above operations S610 to S650 may be performed based on the SequentialRegexNode2, including: traversing all groups in the calculation graph, scanning the type parameter stored in the sequentialRegexNode2 once every time when one Group is accessed, and executing corresponding fusion strategies according to characters in the type. The above operation S660 may be performed based on the InputRegexNode1, including: when all groups in the calculation graph are traversed once, the input method is executed again, and input fusion is carried out on the current calculation graph once.

Through the embodiment of the disclosure, by executing the AST grammar tree and automatically executing the fusion strategy according to the semantics corresponding to the nodes of different trees, the target fusion strategy which the user wants to execute can be mapped to the bottom layer of the deep learning compiler in a regular expression mode and automatic execution is realized.

According to an embodiment of the present disclosure, the parsing process may also be represented in a form of performing while traversing, and corresponding to this form, the above-described operation S210 may include: traversing the character strings in the regular expression. And in response to determining that the currently traversed character string is a third character string representing the second fusion policy, determining the second fusion policy as the target fusion policy.

For example, the regular expression may be V, the third string may be V, and the target fusion policy may be to perform a one-time vertical fusion policy. Corresponding to this embodiment, the above operation S220 may be expressed as: and executing a vertical fusion strategy on the operator nodes in the computational graph.

According to an embodiment of the present disclosure, the third string may include at least a third sub-string and a fourth sub-string sequentially arranged, where the third sub-string represents the first sub-fusion policy, and the fourth sub-string represents the second sub-fusion policy. The above operation S220 may include: and fusing the operator nodes in the computation graph based on the first sub-fusion strategy to obtain a first fusion computation graph. And fusing the operator nodes in the first fused computation graph based on the second sub-fusion strategy.

For example, the regular expression may be RV and the third string may be RV. The third substring may be R and the first substring policy may be a redirection policy. The fourth substring may be V and the second substring may be a vertical fusing policy. Corresponding to this embodiment, the above operation S220 may be expressed as: and executing a redirection strategy on the operator nodes in the calculation graph to obtain a first fusion calculation graph. Then, the vertical fusion strategy is executed on the first fusion calculation graph.

According to an embodiment of the present disclosure, based on a first sub-fusion policy, operator nodes in a computation graph are fused to obtain a first fused computation graph, and based on a second sub-fusion policy, a process of fusing operator nodes in the first fused computation graph may be expressed as: executing a first sub-fusion strategy on all operator nodes in the computation graph to obtain a first fusion computation graph, and executing a second sub-fusion strategy on all operator nodes in the first fusion computation graph. It can also be expressed as: determining an anchor point, executing a first sub-fusion strategy on the operator node determined based on the anchor point to obtain a first fusion calculation graph, and executing a second sub-fusion strategy on the first fusion calculation graph. It should be noted that, the first fused computation graph obtained by the former may be a fused computation graph including complete operator nodes, and the first fused computation graph obtained by the latter may be a fused computation graph including only sub-graphs of partial operator nodes determined based on anchor points. In the case where the latter operator nodes determined based on the anchor points include all the operator nodes of the computation graph, the first fused computation graph obtained by the latter may also be a fused computation graph including the complete operator nodes.

In case of parsing the regular expression based on the form of edge traversal and edge execution according to an embodiment of the present disclosure, the above-described operation S210 may further include: and in response to determining that the currently traversed character string comprises a third character string and a fourth character string representing a second fusion strategy execution round, determining a target fusion strategy according to the second fusion strategy and the second fusion strategy execution round.

For example, the regular expression may be V, the third string may be V, and the fourth string may be "". The target fusion policy may be to repeatedly execute the vertical fusion policy for the computational graph until the computational graph is not updated.

According to an embodiment of the present disclosure, based on the above embodiment, the above operation S220 may include: and (3) based on a second fusion strategy, carrying out the fusion of the ith round with the (1) th round on the operator nodes in the ith round fusion calculation graph to obtain the (i+1) th round fusion calculation graph, wherein the 1 st round fusion calculation graph is obtained by carrying out the fusion of the 1 st round on the operator nodes in the calculation graph based on the second fusion strategy, and i is an integer larger than 1. And in response to determining that the fusion information of the (i+1) th round meets a preset constraint condition, taking the fusion calculation graph of the (i+1) th round as a fusion result of fusion of operator nodes in the calculation graph based on the second fusion strategy and the execution round of the second fusion strategy.

For example, the regular expression is V, and the first round fusion computation graph may be obtained by performing the vertical fusion policy of the 1 st round on the operator nodes of computation fig. 1. And under the condition that the first round fusion calculation graph does not meet the preset constraint condition, continuously executing the vertical fusion strategy of the 2 nd round aiming at the first round fusion calculation graph to obtain a second round fusion calculation graph. And the same is done until fusion information of the (i+1) th round is obtained to meet the preset constraint condition.

For example, a regular expression may be H ([ HV ] I) ([ RV ] I), with three "# symbols indicating three while loops are to be performed.

Fig. 7 schematically illustrates a policy resolution diagram corresponding to H ([ HV ] I) ([ RV ] I) fusion policy according to an embodiment of the present disclosure.

As shown in fig. 7, the fusion strategy 700 may include fusion cycle 1, fusion cycle 2, and fusion cycle 3, corresponding to the three-layer while cycle. Nested for loops within the while loop may be used to traverse all groups in the computational graph.

The fusion loop 1 may traverse each Group in the computational graph, repeatedly performing a horizontal fusion policy on consumer nodes retrieved with each Group as an anchor point, including: the computational graph is updated once every time the horizontal fusion strategy is executed. If the Group relation in the calculation graph is not changed when the calculation graph is updated, the current while cycle can be jumped out; otherwise, if the Group relation in the calculation graph is updated, the horizontal fusion process can still be performed, and the iteration of the while loop can be continued.

The fusion loop 2 may first execute the [ HV ] semantics, i.e., traverse each Group on the computational graph, perform horizontal fusion on the groups first, and then perform vertical fusion on the groups output by the horizontal fusion. After one traversal of the Group on the computational graph is completed, input fusion is performed. The calculation map will not jump out of the while loop until after the execution of [ HV ] I completes one round.

The fusion loop 3 is similar to the fusion loop 2 in that when [ RV ] semantics are executed, recalculation is performed first, and then a Group is arbitrarily selected from a plurality of output results of the recalculation as the input of the subsequent vertical fusion. The present while loop will not be skipped until no update occurs to the calculation map after [ RV ] I completes one round of execution.

It should be noted that the horizontal fusion policy is still executed in the fusion loop 2, because the Group relationship of the computation graph may change when other fusion policies occur, and there is a possibility of horizontal fusion again between groups.

Through the embodiment of the invention, the user can customize and flexibly control the fusion strategy by inputting the regular expression, so that the usability and flexibility of the deep learning compiler in the node fusion process are effectively improved. In addition, from the aspects of usability, flexibility and the like, the method is also significant in improving the overall competitiveness of the flying oar frame.

According to an embodiment of the present disclosure, the fusion information may include at least one of a fusion round and a fusion result. The preset constraints may include at least one of: the i+1 is equal to a preset value, the i+1 round fusion calculation graph is identical to the i round fusion calculation graph, and the update mark of the i+1 round represents that the i+1 round fusion calculation graph is not updated. The update mark is used for recording the first characterization information or the second characterization information, wherein the first characterization information characterizes the calculated graph of the corresponding turn with update, and the second characterization information characterizes the calculated graph of the corresponding turn without update.

According to an embodiment of the present disclosure, an update flag may be predefined, for recording whether the current fusion process updates the computation graph. For example, flag=0 may be set to indicate no update, flag=1 may be set to indicate update. By reading the value of Flag, whether the cycle needs to be jumped out or not can be conveniently and quickly determined.

Through the above embodiments of the present disclosure, by defining the preset constraint condition in advance, it can be relatively easy to determine whether the jump-out cycle is needed, and the jump-out cycle can be timely jumped out under the condition that the jump-out cycle is determined to be needed.

Fig. 8 schematically illustrates a block diagram of a node fusion apparatus of a computational graph according to an embodiment of the present disclosure.

As shown in fig. 8, the node fusion apparatus 800 of the computation graph includes a parsing module 810 and a fusion module 820.

The parsing module 810 is configured to parse the regular expression in response to receiving the regular expression associated with the computation graph in the deep learning compiler, to obtain the target fusion policy, where the regular expression is obtained according to a candidate fusion policy user-defined setting for fusing operator nodes in the computation graph.

And the fusion module 820 is used for fusing the operator nodes in the computation graph based on the target fusion strategy.

According to an embodiment of the present disclosure, the parsing module includes a syntax tree generation sub-module and a first policy determination sub-module.

And the grammar tree generation sub-module is used for analyzing the regular expression and generating a grammar tree.

The first strategy determination submodule is used for determining a target fusion strategy according to the executive program represented by the grammar tree.

According to an embodiment of the present disclosure, a syntax tree generation sub-module includes a parsing unit, a child node determining unit, a parent node determining unit, and a syntax tree determining unit.

And the analysis unit is used for analyzing the character strings in the regular expression.

And the child node determining unit is used for determining the first character string as a child node in response to determining that the character string analyzed currently is the first character string representing the first fusion strategy.

And the father node determining unit is used for determining the second character string as the father node of the child node in response to determining that the character string analyzed currently is the second character string representing the first fusion strategy execution round.

And the grammar tree determining unit is used for determining the grammar tree according to the father node and the child node.

According to an embodiment of the present disclosure, the child node determining unit includes a sequence string obtaining subunit and a child node determining subunit.

And the sequence character string obtaining subunit is used for obtaining the target sequence character string between the first sub-character string and the second sub-character string in a one-to-one correspondence manner according to the fact that the first character string analyzed at present comprises the first sub-character string representing the sequence character string starting mark and the second sub-character string representing the sequence character string ending mark.

And the child node determining subunit is used for determining the target sequence character string as a child node.

According to an embodiment of the present disclosure, the first policy determination submodule includes a traversal unit, an execution program determination unit, and a policy determination unit.

The traversing unit is used for performing preamble traversal on the grammar tree to obtain semantic information represented by each tree node in the grammar tree.

And the executive program determining unit is used for determining an executive program according to the semantic information characterized by each tree node.

And the strategy determining unit is used for determining the executing program as a target fusion strategy.

According to an embodiment of the present disclosure, the parsing module includes a traversal sub-module and a second policy determination sub-module.

And the traversing submodule is used for traversing the character strings in the regular expression.

And the second strategy determination submodule is used for determining the second fusion strategy as a target fusion strategy in response to determining that the character string currently traversed is a third character string representing the second fusion strategy.

According to the embodiment of the disclosure, the third character string at least comprises a third sub-character string and a fourth sub-character string which are sequentially arranged, wherein the third sub-character string represents the first sub-fusion strategy, and the fourth sub-character string represents the second sub-fusion strategy; the fusion module comprises a first fusion sub-module and a second fusion sub-module.

And the first fusion sub-module is used for fusing the operator nodes in the computation graph based on the first sub-fusion strategy to obtain a first fusion computation graph.

And the second fusion sub-module is used for fusing the operator nodes in the first fusion calculation graph based on the second sub-fusion strategy.

According to an embodiment of the present disclosure, the parsing module further includes a third policy determination sub-module.

And the third strategy determination submodule is used for determining a target fusion strategy according to the second fusion strategy and the execution turn of the second fusion strategy in response to determining that the character string currently traversed comprises the third character string and the fourth character string representing the execution turn of the second fusion strategy.

According to an embodiment of the present disclosure, the fusion module includes a third fusion sub-module and a definition sub-module.

And the third fusion submodule is used for carrying out the fusion of the ith round and the 1 st round on the operator nodes in the ith round fusion calculation graph based on the second fusion strategy to obtain the ith round and the 1 st round fusion calculation graph, wherein the 1 st round fusion calculation graph is obtained by carrying out the fusion of the 1 st round on the operator nodes in the calculation graph based on the second fusion strategy, and i is an integer larger than 1.

And the defining sub-module is used for responding to the fact that the fusion information of the (i+1) th round meets the preset constraint condition, and taking the fusion calculation graph of the (i+1) th round as a fusion result for fusing operator nodes in the calculation graph based on the second fusion strategy and the second fusion strategy execution round.

According to an embodiment of the present disclosure, the preset constraints include at least one of: the i+1 is equal to a preset value, the i+1 round fusion calculation graph is the same as the i round fusion calculation graph, and the update mark of the i+1 round represents that the i+1 round fusion calculation graph has no update; the update mark is used for recording first characterization information or second characterization information, wherein the first characterization information characterizes that the calculated graph of the corresponding turn is updated, and the second characterization information characterizes that the calculated graph of the corresponding turn is not updated.

According to the embodiment of the disclosure, the node fusion device of the computation graph further comprises a execution round number determining module, a strategy arrangement and combination mode determining module and a strategy determining module.

And the execution round number determining module is used for determining the execution round number which is custom set for the candidate fusion strategy.

And the policy permutation and combination mode determining module is used for determining the policy permutation and combination mode custom-set for a plurality of candidate fusion policies in response to the fact that the candidate fusion policies based on the custom-set determining process are multiple.

And the strategy determining module is used for determining a target fusion strategy according to at least one of the execution round number and the strategy arrangement combination mode and the candidate fusion strategy.

According to an embodiment of the disclosure, the node fusion device of the computation graph further comprises a regular conversion module.

And the regular conversion module is used for carrying out regular conversion on the target fusion strategy according to the predefined regularization rule to obtain a regular expression.

According to an embodiment of the present disclosure, the candidate fusion strategy includes at least one of: vertical fusion, horizontal fusion, recalculation, input fusion.

According to an embodiment of the present disclosure, a fusion policy for input fusion includes: acquiring a plurality of initial operator nodes sharing the input tensor according to the input tensor; . And determining a plurality of target operator nodes with matched information according to the first input and output type information and the first tensor information of each of the plurality of initial operator nodes. And fusing the plurality of target operator nodes.

According to an embodiment of the present disclosure, a redirected fusion policy includes: and acquiring all consumer nodes corresponding to the operator group nodes according to the operator group nodes. And determining that the operator group node is matched with all the consumer nodes according to the second input and output type information and the second tensor information of the operator group node and the third input and output type information and the third tensor information of the consumer nodes, and respectively fusing the operator group node and each consumer node in all the consumer nodes.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the node fusion method of the computational graph of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a node fusion method of a computational graph of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program, when executed by a processor, implements a node fusion method of a computational graph of the present disclosure.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to an input/output (I/O) interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a node fusion method of a computation graph. For example, in some embodiments, the node fusion method of the computational graph may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into RAM 903 and executed by the computing unit 901, one or more steps of the node fusion method of the computation graph described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the node fusion method of the computation graph in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A node fusion method of a computational graph, comprising:

in response to receiving a regular expression associated with a computational graph in a deep learning compiler, resolving the regular expression to obtain a target fusion strategy, wherein the regular expression is obtained according to a candidate fusion strategy user-defined setting for fusing operator nodes in the computational graph; and

and fusing operator nodes in the computational graph based on the target fusion strategy.

2. The method of claim 1, wherein the parsing the regular expression to obtain a target fusion policy comprises:

analyzing the regular expression to generate a grammar tree; and

and determining the target fusion strategy according to the executive program characterized by the grammar tree.

3. The method of claim 2, wherein the parsing the regular expression to generate a syntax tree comprises:

analyzing the character strings in the regular expression;

in response to determining that a current parsed string is a first string that characterizes a first fusion policy, determining the first string as a child node;

in response to determining that the current parsed string is a second string that characterizes a round of execution of the first fusion policy, determining the second string as a parent node of the child node; and

and determining the grammar tree according to the father node and the child node.

4. The method of claim 3, wherein the determining the first string as a child node in response to determining that the currently parsed string is the first string that characterizes the first fusion policy comprises:

in response to determining that a first current parsed string includes a first sub-string representing a sequence string start flag and a second sub-string representing a sequence string end flag, obtaining a target sequence string located between the first sub-string and the second sub-string, the sequence string start flag corresponding one-to-one to the sequence string end flag; and

And determining the target sequence character string as the child node.

5. The method of any of claims 2-4, wherein the determining the target fusion policy according to the executive characterized by the syntax tree comprises:

performing preamble traversal on the grammar tree to obtain semantic information represented by each tree node in the grammar tree;

determining the executive program according to semantic information represented by each tree node; and

and determining the executive program as the target fusion strategy.

6. The method of claim 1, wherein the parsing the regular expression to obtain a target fusion policy comprises:

traversing the character strings in the regular expression; and

and in response to determining that the character string currently traversed is a third character string representing a second fusion strategy, determining the second fusion strategy as the target fusion strategy.

7. The method of claim 6, wherein the third string includes at least a third sub-string and a fourth sub-string arranged in sequence, the third sub-string representing a first sub-fusion policy and the fourth sub-string representing a second sub-fusion policy; the fusing the operator nodes in the computational graph based on the target fusion strategy comprises the following steps:

Based on the first sub-fusion strategy, operator nodes in the computation graph are fused to obtain a first fusion computation graph; and

and fusing operator nodes in the first fused computation graph based on the second sub-fusion strategy.

8. The method of claim 6 or 7, wherein the parsing the regular expression to obtain a target fusion policy further comprises:

and in response to determining that the character string currently traversed comprises the third character string and a fourth character string representing a second fusion strategy execution round, determining the target fusion strategy according to the second fusion strategy and the second fusion strategy execution round.

9. The method of claim 8, wherein the fusing operator nodes in the computational graph based on the target fusion policy comprises:

performing the fusion of the operator nodes in the ith round of fusion calculation graph by the (i+1) th round based on the second fusion strategy to obtain the (i+1) th round of fusion calculation graph, wherein the 1 st round of fusion calculation graph is obtained by performing the fusion of the (1) th round on the operator nodes in the calculation graph based on the second fusion strategy, and i is an integer greater than 1; and

And in response to determining that the fusion information of the (i+1) -th round meets a preset constraint condition, taking the (i+1) -th round fusion calculation graph as a fusion result of fusion of operator nodes in the calculation graph based on the second fusion strategy and the second fusion strategy execution round.

10. The method of claim 9, wherein the preset constraints include at least one of: the i+1 is equal to a preset value, the i+1 round fusion calculation graph is the same as the i round fusion calculation graph, and the update mark of the i+1 round represents that the i+1 round fusion calculation graph has no update; the update mark is used for recording first characterization information or second characterization information, the first characterization information characterizes that the calculated graph of the corresponding turn is updated, and the second characterization information characterizes that the calculated graph of the corresponding turn is not updated.

11. The method of any of claims 1-10, further comprising: before the regular expression associated with the computational graph in the deep learning compiler is parsed in response to receiving the regular expression, resulting in a target fusion policy,

determining the number of execution rounds which are custom set for the candidate fusion strategy;

Determining a policy arrangement combination mode for the plurality of the candidate fusion policies in response to determining that the candidate fusion policies based on which the process of the custom setting is based are a plurality of; and

and determining the target fusion strategy according to at least one of the execution round times and the strategy arrangement combination mode and the candidate fusion strategy.

12. The method of claim 11, further comprising:

and carrying out regular conversion on the target fusion strategy according to a predefined regularization rule to obtain the regular expression.

13. The method of any of claims 1-12, wherein the candidate fusion strategy comprises at least one of: vertical fusion, horizontal fusion, recalculation, input fusion.

14. The method of claim 13, wherein the fusion policy of the input fusion comprises:

acquiring a plurality of initial operator nodes sharing an input tensor according to the input tensor;

determining a plurality of target operator nodes with matched information according to the first input and output type information and the first tensor information of each of the plurality of initial operator nodes; and

and fusing the plurality of target operator nodes.

15. The method of claim 13, wherein the redirected fusion policy comprises:

acquiring all consumer nodes corresponding to operator group nodes according to the operator group nodes; and

and determining that the operator group node is matched with all the consumer nodes in information according to the second input and output type information and the second tensor information of the operator group node and the third input and output type information and the third tensor information of the consumer nodes, and respectively fusing each consumer node in the operator group node and all the consumer nodes.

16. A node fusion apparatus of a computational graph, comprising:

the analysis module is used for responding to the received regular expression associated with the calculation graph in the deep learning compiler, analyzing the regular expression to obtain a target fusion strategy, wherein the regular expression is obtained according to the self-defined setting of a candidate fusion strategy for fusing operator nodes in the calculation graph; and

and the fusion module is used for fusing the operator nodes in the calculation graph based on the target fusion strategy.

17. The apparatus of claim 16, wherein the parsing module comprises:

The grammar tree generation sub-module is used for analyzing the regular expression to generate a grammar tree; and

and the first strategy determination submodule is used for determining the target fusion strategy according to the executive program characterized by the grammar tree.

18. The apparatus of claim 17, wherein the syntax tree generation submodule comprises:

the analysis unit is used for analyzing the character strings in the regular expression;

the child node determining unit is used for determining the first character string as a child node in response to determining that the character string analyzed currently is the first character string representing the first fusion strategy;

a parent node determining unit, configured to determine, in response to determining that a current parsed string is a second string that characterizes a round of execution of a first fusion policy, the second string as a parent node of the child node; and

19. The apparatus of claim 18, wherein the child node determination unit comprises:

a sequence character string obtaining subunit, configured to obtain, in response to determining that a first character string that is currently parsed includes a first sub-character string that characterizes a sequence character string start flag and a second sub-character string that characterizes a sequence character string end flag, a target sequence character string that is located between the first sub-character string and the second sub-character string, where the sequence character string start flag corresponds to the sequence character string end flag one to one; and

And the child node determining subunit is used for determining the target sequence character string as the child node.

20. The apparatus of any of claims 17-19, wherein the first policy determination submodule comprises:

the traversing unit is used for performing preamble traversal on the grammar tree to obtain semantic information represented by each tree node in the grammar tree;

an executable program determining unit, configured to determine the executable program according to semantic information represented by each tree node; and

and the strategy determining unit is used for determining the executing program as the target fusion strategy.

21. The apparatus of claim 16, wherein the parsing module comprises:

the traversing submodule is used for traversing the character strings in the regular expression; and

and the second strategy determination submodule is used for determining the second fusion strategy as the target fusion strategy in response to determining that the character string currently traversed is a third character string representing the second fusion strategy.

22. The apparatus of claim 21, wherein the third string comprises at least a third substring and a fourth substring arranged in sequence, the third substring representing a first sub-fusion policy and the fourth substring representing a second sub-fusion policy; the fusion module comprises:

The first fusion sub-module is used for fusing operator nodes in the computation graph based on the first sub-fusion strategy to obtain a first fusion computation graph; and

23. The apparatus of claim 21 or 22, wherein the parsing module further comprises:

and the third strategy determination submodule is used for determining the target fusion strategy according to the second fusion strategy and the execution turn of the second fusion strategy in response to the fact that the character string currently traversed comprises the third character string and a fourth character string representing the execution turn of the second fusion strategy.

24. The apparatus of claim 23, wherein the fusion module comprises:

the third fusion submodule is used for carrying out the fusion of the ith round and 1 st round on the operator nodes in the ith round fusion calculation graph based on the second fusion strategy to obtain the ith round and 1 st round fusion calculation graph, wherein the 1 st round fusion calculation graph is obtained by carrying out the fusion of the 1 st round on the operator nodes in the calculation graph based on the second fusion strategy, and i is an integer larger than 1; and

And the definition submodule is used for responding to the fact that the fusion information of the ith round+1 meets the preset constraint condition, and taking the fusion calculation graph of the ith round+1 as a fusion result for fusing operator nodes in the calculation graph based on the second fusion strategy and the second fusion strategy execution round.

25. The apparatus of claim 24, wherein the preset constraints comprise at least one of: the i+1 is equal to a preset value, the i+1 round fusion calculation graph is the same as the i round fusion calculation graph, and the update mark of the i+1 round represents that the i+1 round fusion calculation graph has no update; the update mark is used for recording first characterization information or second characterization information, the first characterization information characterizes that the calculated graph of the corresponding turn is updated, and the second characterization information characterizes that the calculated graph of the corresponding turn is not updated.

26. The apparatus of any of claims 16-25, further comprising:

the execution round number determining module is used for determining the execution round number which is custom set for the candidate fusion strategy;

the policy permutation and combination mode determining module is used for determining a policy permutation and combination mode which is custom-set for a plurality of candidate fusion policies in response to the fact that the candidate fusion policies based on the custom-set process are multiple; and

And the strategy determining module is used for determining the target fusion strategy according to at least one of the execution times and the strategy arrangement combination mode and the candidate fusion strategy.

27. The apparatus of claim 26, further comprising:

and the regular conversion module is used for carrying out regular conversion on the target fusion strategy according to a predefined regularization rule to obtain the regular expression.

28. The apparatus of any of claims 16-27, wherein the candidate fusion policy comprises at least one of: vertical fusion, horizontal fusion, recalculation, input fusion.

29. The apparatus of claim 28, wherein the fusion policy of input fusion comprises:

and fusing the plurality of target operator nodes.

30. The method of claim 28, wherein the redirected fusion policy comprises:

31. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

32. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

33. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1-15.