CN114791808A

CN114791808A - Data flow graph generation method and device

Info

Publication number: CN114791808A
Application number: CN202210116285.XA
Authority: CN
Inventors: 张振; 刘宝森; 欧阳鹏
Original assignee: Beijing Qingwei Intelligent Information Technology Co ltd
Current assignee: Beijing Qingwei Intelligent Information Technology Co ltd
Priority date: 2022-02-07
Filing date: 2022-02-07
Publication date: 2022-07-26

Abstract

The invention provides a data flow graph generation method and a device, wherein the method comprises the following steps: analyzing the dependency information of the data in the code under the condition that a parent loop exists in the code block; traversing each process block in the code blocks to generate a block data flow diagram corresponding to each process block; analyzing the data flow graph corresponding to each process block and adding dependency information to generate a code data flow graph corresponding to the code block; and eliminating repeated variables in the code data flow graph, simplifying the process blocks, establishing a connection relation among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the cyclic code through the first data flow graph file. The above method utilizes llvm front end, Intermediate Representation (IR) and its existing partial analysis steps. llvm is an open-source compiler framework, which is convenient for users to add compilation steps (pass) or modify compilation flow according to the architecture of their own processor. The invention is used for the front end of the reconfigurable compiler, but is not limited to the reconfigurable compiler.

Description

Data flow graph generation method and device

Technical Field

The present invention relates to the field of heterogeneous processor-oriented compilation, and in particular, to a method and an apparatus for generating a dataflow graph during code compilation for loop acceleration.

Background

A data flow graph (dfg) is a powerful graphical tool for describing data processing procedures in software systems. The data flow diagram depicts the process of moving and transforming the data flow from input to output from the perspective of data transfer and processing. Since it can clearly reflect the execution process of the program code, it is often an important component of a compiler of a heterogeneous processor.

Some software (such as an autoflowchart) capable of automatically generating a data flow graph of a code is available at present, but the software is used for a programmer to analyze or display code logic and is embodied by business logic, only the literal meaning of a code level is analyzed, a hardware environment is not considered, or interface effects (such as layout and wiring) presented to a user by the data flow graph are considered more, and the application background is not concerned. Due to lack of details of operators, operands, etc., it cannot be used by the compiler.

Disclosure of Invention

The invention aims to provide a data flow graph generation method, which utilizes llvm front end, Intermediate Representation (IR) and existing partial analysis flow. llvm is an open-source compiler framework, which facilitates users to add or modify compilation processes (pass) for their own processor architecture. A front-end for a reconfigurable compiler, but is not limited to reconfigurable compilers. Any architecture that requires instructions to be executed in parallel may be used.

In a first aspect of the present invention, a method for generating a dataflow graph is provided, including: analyzing dependency information among child code blocks in a code block under the situation that a parent cyclic code exists in the code block; traversing each process block in the code blocks to generate a block dataflow graph corresponding to each process block; adding the dependency information to the data flow graph corresponding to each process block to generate a code data flow graph corresponding to the code block; and eliminating repeated variables in the code data flow graph, establishing a connection relation among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the code block through the first data flow graph file.

Optionally, after the repeated variables in the code data flow graph are eliminated, the process blocks are simplified, the connection relationship between the process blocks is established, and a first data flow graph file is generated, so that the data flow graph of the code blocks is shown in the first data flow graph file, the method further includes: counting the connection relation of each process block in the code block; extracting process blocks meeting preset conditions, and simplifying the process blocks in the code blocks; acquiring the association or nesting relation among the process blocks; and complementing the dependent distance between the data in each process block to generate a second data flow graph file, wherein the dependent distance represents the cycle times of the interval between multiple accesses when the data in the memory is accessed multiple times in the code block.

Optionally, after completing the dependent distance between the data in each process block and generating a second dataflow graph file, the method further includes: and generating the code block loading information.

Optionally, the dependency relationship includes a precedence order and/or an inclusion relationship.

Optionally, after obtaining the association or nesting relationship between the process blocks, the method further includes: and deleting useless process blocks in each process block or converting the useless process blocks into cycle information.

Optionally, traversing each process block in the code block to generate a block data flow diagram corresponding to each process block includes: and presenting each process block in the middle representation IR in a tiled structure, analyzing IR sentences one by one, filling operators and operands of each IR into a corresponding data structure, and generating a block data flow graph corresponding to each process block.

Optionally, in the scenario that a parent loop code exists in a code block, before analyzing dependency information between sub-code blocks in the code block, the method further includes; and converting the code block files with different formats into the intermediate representation IR format file.

In a second aspect of the present invention, there is provided an apparatus for generating a dataflow graph, the apparatus including: an analysis unit configured to analyze dependency information between child code blocks in a code block in a scenario where a parent loop code exists in the code block; the first generating unit is used for traversing each process block in the code blocks and generating a block data flow diagram corresponding to each process block; a second generating unit, configured to add the dependency information to the data flow graph corresponding to each process block, and generate a code data flow graph corresponding to the code block; a third generating unit, configured to eliminate a repeated variable in the code data flow graph, establish a connection relationship between the process blocks, and generate a first data flow graph file, so that the data flow graph of the code block is displayed by the first data flow graph file.

Optionally, the apparatus further comprises; a counting unit, configured to eliminate a repeated variable in the code data flow graph, establish a connection relationship between the process blocks, and obtain a data flow graph file, so that after the data flow graph of the code block is displayed by using the first data flow graph file, the connection relationship between the process blocks in the code block is counted; the simplification unit is used for extracting the process blocks meeting the preset conditions and simplifying the process blocks in the code blocks; the acquisition unit is used for acquiring the association or nesting relation among the process blocks; and a fourth generating unit, configured to complement dependent distances between data in the process blocks, and generate a second data flow graph file, where the dependent distances indicate that there are multiple accesses to the code block in the memory, and cycle times of intervals between the multiple accesses.

Optionally, the apparatus further comprises: a fifth generating unit, configured to complement the dependent distance between the data in each process block, and generate the code block loading information after generating a second dataflow graph file.

Optionally, the apparatus further comprises: and the deleting and converting unit is used for deleting or converting useless process blocks in the process blocks into cycle information after acquiring the association or nesting relation among the process blocks.

Optionally, the first generating unit includes: and the generating module is used for presenting each process block in the middle representation IR in a tiled structure, analyzing IR statements one by one, filling an operator and an operand of each IR into a corresponding data structure, and generating a block data flow graph corresponding to each process block.

Optionally, the apparatus further comprises; and the conversion unit is used for converting code block files with different formats into the intermediate representation IR format file before analyzing the dependency information among the sub code blocks in the code block under the condition that the parent cycle code exists in the code block.

The following will further describe characteristics, technical features, advantages and implementation manners of the data flow diagram generation method and system in a clear and understandable manner by combining the accompanying drawings.

Drawings

FIG. 1 is a flow diagram illustrating a method for dataflow graph generation in one embodiment of the present invention;

FIG. 2 is a general flow diagram illustrating a data flow diagram in one embodiment of the invention;

FIG. 3 is a diagram illustrating the relationship of data structures in another embodiment of the present invention;

FIG. 4 is a view for explaining a for cycle configuration (one) in another embodiment of the present invention;

FIG. 5 is a view illustrating a while loop structure according to an embodiment of the present invention;

FIG. 6 is a schematic diagram for explaining the dependent distance calculation in another embodiment of the present invention;

FIG. 7 is a flow chart for explaining the hierarchy computation in another embodiment of the present invention;

FIG. 8 is a simplified for-loop structure for illustrating another embodiment of the present invention;

FIG. 9 is a view for explaining a loop structure with branching in another embodiment of the present invention;

FIG. 10 is a diagram illustrating a load information data structure in one embodiment of the invention;

FIG. 11 is a diagram illustrating an example dot diagram in another embodiment of the present invention;

FIG. 12 is a schematic view for explaining a batch display effect in another embodiment of the present invention;

fig. 13 is a schematic diagram for explaining a data flow graph generating apparatus in another embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The compiler has the task of translating the code logic into instructions of a corresponding architecture, for a traditional (sequential execution instruction) architecture, after the compiler performs lexical analysis, syntactic analysis and semantic analysis on the code, generating Intermediate Representation (IR) which can be corresponding to an instruction set one by one, and after the code is translated into the instructions, the sequential loading and execution are only needed during the operation. However, as the times develop, people have higher and higher requirements on the performance and the power consumption of a processor, and the IR sometimes cannot meet the requirements of an architecture of parallel computing (because the IR is not sequentially loaded and executed), so a more macroscopic logic representation method is required, and a data flow graph just can represent the situation of the parallel computing. Clearly, the more architecture-oriented the composition of the dataflow graph is for translation into machine instructions.

One aspect of the present invention provides a data flow graph generating method, as shown in fig. 1, the data flow graph generating method includes:

in step S101, in a scenario where a parent loop code exists in a code block, dependency information of data in the code block is analyzed.

Step S103, traversing each process block in the code block, and generating a block dataflow graph corresponding to each process block.

And step S105, analyzing the data flow graph corresponding to each process block, adding dependency information, and generating a code data flow graph corresponding to the code block.

And S107, eliminating repeated variables in the code data flow graph, simplifying process blocks, establishing a connection relation among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the code blocks through the first data flow graph file.

In this embodiment, the loop-level flow runOnLoop is inherited (so that the flow is only entered if there is a loop in the code), and the name of the process block in the loop and the parent loop are obtained. Standard (existing) cycle range analysis, cycle information analysis, and dependency analysis were performed. The IR is then parsed piece by piece and the information is recorded into corresponding data structures (operators and operands) and associations between these data structures are established. These data structures are then attributed to the current process block, resulting in the data flow graph dfg structure for that block. Then, according to the previous dependency analysis, the dependency relationship is added between the corresponding operators (perfect dfg). After all process blocks are obtained, logical associations (precedence or inclusion relationships) between the process blocks are established, and multiple common (repeated) data variables are eliminated to form an integral dfg. At this point, the dfg data structure may be saved as an original dfg file for comparison and error checking with the optimized dfg.

This is followed by a simplification of the process blocks, removing the parts not of interest to the compiler, leaving only the loop body itself and the necessary logic blocks. The method comprises the steps of simplifying memory access calculation; reasoning about relationships (order or nesting) based on the connections of process blocks; deleting or converting useless process blocks into cycle information; completing information such as dependent distance; generating a reduced dfg; and finally, generating data loading information.

According to the embodiment provided by the application, under the condition that the code block has the parent cycle code, the dependency information of the data in the code block is analyzed; traversing each process block in the code block to generate a block data flow diagram corresponding to each process block; analyzing the data flow graph corresponding to each process block and adding dependency information to generate a code data flow graph corresponding to the code block; and eliminating repeated variables in the code data flow graph, simplifying code blocks, establishing a connection relation among all process blocks, and generating a first data flow graph file so as to display the data flow graph of the code blocks through the first data flow graph file. The above method utilizes llvm front end, Intermediate Representation (IR) and its existing partial analysis steps. llvm is an open-source compiler framework, which is convenient for users to add compilation steps (pass) or modify compilation processes according to the architecture of their own processor. A front-end for a reconfigurable compiler, but is not limited to a reconfigurable compiler.

Optionally, after eliminating repeated variables in the code data flow graph, and establishing a connection relationship between the process blocks to obtain a data flow graph file, so that the data flow graph of the code block is displayed by the first data flow graph file, the method may further include: counting the connection relation of each process block in the code block; extracting process blocks meeting preset conditions, and simplifying the process blocks in the cyclic code; acquiring association or nesting relation among the process blocks; and complementing the dependent distance between the data in each process block, and generating a second data flow graph file, wherein the dependent distance represents the cycle times of the interval between multiple times of access of the data in the memory in the code block.

Optionally, after completing the dependent distance between the data in each process block and generating the second dataflow graph file, the method may further include: and generating code block loading information.

In this embodiment, the code block loading information includes, but is not limited to, the amount of data to be loaded, the amount of data to be generated, the number of calculation units to be used, the calculation order hierarchy of the calculation units, and whether there is an operator of an unknown type (see fig. 10 Demand).

Optionally, after obtaining the association or nesting relationship between the process blocks, the method may further include: and deleting useless process blocks in each process block or converting the useless process blocks into loop information.

In this embodiment, the loop information includes, but is not limited to, a loop start value, an end value, a loop step size, a loop direction, a loop variable name, a nesting depth of the loop, a loop name, a parent loop name, a child loop name, and an initialization list < variable name: value > (see FIG. 10 Scalar).

Optionally, traversing each process block in the code block, and generating a block data flow diagram corresponding to each process block, includes: and each process block is presented in the middle representation IR in a tiled structure, the IR statements are analyzed one by one, the operator and the operand of each IR are filled into the corresponding data structure, and the block data flow graph corresponding to each process block is generated.

Optionally, in a scenario that a parent loop code exists in a code block, before analyzing dependency information between sub-code blocks in the code block, the method may further include; the code block files of different formats are converted into intermediate representation IR format files.

As an alternative embodiment, the present application further provides a method for automatically generating a dataflow graph for a compiler, which is oriented to loop code acceleration. As shown in fig. 2, the compiler automatically generates a flow diagram of the dataflow graph method. The specific process is as follows.

1. The C, C + + code is converted to a.ll (intermediate representation IR) format file using the clone/clone + + command.

Such as, for example, clean xxx.c-emit-llvm-g-c-S-O xxx.ll-xclean-disable-O0-optnone

Wherein xxx.c is the C code file to be loop code optimized; ll is an IR format file; others are necessary compilation options.

2. Inheriting the loop-level flow runOnLoop (thus, the flow is only entered if there is a loop in the code), and obtaining the process block name and the parent loop in the loop. Standard (existing) cycle range analysis, cycle information analysis, and dependency analysis were performed. The IR is then parsed piece by piece and the information is recorded into corresponding data structures (operators and operands) and associations between these data structures are established. These data structures are then attributed to the current process block, resulting in the dfg structure for that block. Then, according to the previous dependency analysis, the dependency relationship is added between the corresponding operators (perfect dfg). After all process blocks are obtained, logical association (precedence or inclusion relationship) between the process blocks is established, and common (repeated) data variables of the process blocks are eliminated to form integral dfg. At this time, the dfg data structure can be saved as an original dfg file so as to compare with the optimized dfg and check errors.

This is followed by a simplification of the process blocks, removing the parts not of interest to the compiler, leaving only the loop body itself and the necessary logic blocks. The method comprises the steps of simplifying memory access calculation; reasoning about relationships (order or nesting) based on the connections of process blocks; deleting or converting useless process blocks into cycle information; completing information such as dependent distance; generating a simplified dfg; and finally, generating data loading information.

In this embodiment, a data structure is defined, which includes an operator node OpNode, a data node VarNode, a process block ProcBlock, dependency information deplnfo, branch information branch, and the like, as shown in fig. 3, and a data structure relationship diagram is shown.

Wherein, Node is base class of OpNode, Var Node and ProcBlock, which contains basic name, ID and other information. The operand VarNode includes the type and size of data, and is used as information of input or output of which operational characters; the operator OpNode comprises input, output (operand), calculation level, antecedent, postcedent branch and dependency information; the process block ProcBlock includes all the operators, operands contained in the block, and associated information (including specific operator nodes) with other process blocks.

In this embodiment, blocks within the loop are constructed and mapped onto ProcBlock. A cycle consists of a plurality of blocks and may also include sub-cycles. In runOnloop in the loop pass of llvm, loop blocks appear in child-parent order when there is loop nesting. If there are multiple sibling cycles, the sibling cycles appear in order. And in complex circulation, traversing according to depth first. A for loop contains 4 parts, cond, body, inc and end. As shown in the figure below, there may be branching blocks (if. then, if. else) between the for.

The parent cycle is presented with the previously presented sub-cycles, so that only one outermost cycle needs to be analyzed to save time. As shown in FIG. 4, the for loop structure diagram (one).

For(i=0; i<10;++i) //for.cond + for.inc

{

…// for _ body can have no calculation, only jump

For(j=0; j<20; j+=2) // for.cond1 + for.inc1

{

A[i][j] = …; // for.body1

}

… //for.end

}

while the while loop has no inc module, the rest is the same.

As shown in fig. 5, while loop structure. All blocks (including branches, labels) are presented in the IR in a tiled structure, with the conversion of blocks implemented with br instructions, similar to the assembly style.

In this embodiment, each loop block is traversed, i.e., the IR statements are parsed strip by strip, filling the operators and operands of each IR into the corresponding data structure. The IR is of the form:

for.cond: ; preds = %for.inc, %entry

%0 = load i32, i32* %i, align 4

%cmp = icmp slt i32 %0, 10

br i1 %cmp, label %for.body, label %for.end

for.body: ; preds = %for.cond

%1 = load i32, i32* %i, align 4

%2 = load i32, i32* %i, align 4

%mul = mul nsw i32 %1, %2

%3 = load i32, i32* %i, align 4

wherein for.cond, for.body, etc. colon before represents the cyclic block name; % first is either variable name or block name; the "=" is followed immediately by an operator; … … because llvm has a mechanism to ensure that variable names are unique, there is no concern about ambiguity. This allows easy mapping of IR to custom data structures.

In this embodiment, dependency information is added to the operator.

By utilizing the dependency analysis flow carried by the previously executed llvm, memory access dependencies can be obtained, and the dependency information prompts a compiler that the reading and writing sequence of the memory operated by a certain operator (load, store, and the like) cannot be disturbed. This procedure is mostly correct, but i later find that there is an error in calculating the dependent distance.

The dependency distance is simply the number of cycles between which a memory is accessed multiple times in a cycle.

The dependency information is between two OpNodes, which indicates that the sequence of the two operations cannot be disturbed or parallel. Dependent operations must be executed first and dependent operations are executed later.

For example

for(i=10; i<100; ++i)

{

a[i] = a[i-2]；

}

a [ i ] will be accessed again after two rounds.

But the addressing computation may be complex, with two-or multidimensional operations being possible in addition to linear (four-way) operations.

Assuming no care is taken for the specific calculation of the subscript, write in the form of a [ f (i) ] = a [ g (i) ] (each time the multidimensional is concerned with only one of the dimensions), depending on when f (i) appears to be equal to g (i). Wherein i is a loop argument, f (i), and g (i) is a subscript. The independent variables are defined by upper and lower loop limits [ b, t ] (discrete). Assuming that f and g are both monotonic, then f (i), g (i) have respective value ranges, and the extrema occur at the argument boundaries. Such as [ f (b), f (t) ], and [ g (b), g (t) ] (in the case of both monotonically increasing amounts). As shown in fig. 6, the dependence distance calculation is schematic.

Let the value range of f be denoted by [ fmin, fmax ], and the value range of g be denoted by gmin, gmax. It is obvious that

There is a dependency when fmin < gmax < fmax or fmin < gmin < fmax. Firstly, a

For example, in the case shown in fig. 6, the first occurrence of dependency is when g (d) = f (b). Where d is the dependent distance.

The specific method comprises the following steps:

a function capable of calculating the value of an expression is written and the addressed value is calculated with the function (for the following calculation).

And replacing the independent variable i of the subscript expressions f and g with an upper limit b and a lower limit t respectively to obtain the monotonicity of fmin and fmax, and gmin, gmax and f and g.

And judging whether the dependency and the first dependency value are f (b) or f (t) according to the value domain relation (r). Assume f1.

And calculating the distance by using a dichotomy method.

For example, let d = (t-b)/2, and obtain the value of g (d), assuming that g 2.

If ((g2> f1 & & g increment) | (g2< f1 & & g decrement))

g3=g(d/2)

else if ((g2> f1 & & g decreasing) | (g2< f1 & & g increasing))

g3=g(d*3/2)

else if (g2==f1)

return d；

Recursive calculations may be performed.

In this embodiment, operator hierarchies are computed in order to implement parallel computation. Since parallel computing expects all (or as many as possible) intermediate results to be obtained simultaneously, we need to know that those steps need to be performed simultaneously. With the operator hierarchy, this information can be obtained. For example:

x＝a ² +b ² +c ²

we want to put the respective squaring operations of a, b, c 3 numbers in the 1 st stage, and let t equal to a ² +b ² Put in stage 2, t + c ² The stage 3 is adopted, so that the advantages of parallel computing can be exerted.

A method of hierarchical computation of operator nodes, as shown in fig. 7, is a hierarchical computation flow chart.

An operator node that assigns the node level to a value of 0 if it has no input or only pure input (and input does not come from output of other operators); if it has input from the output of the previous operation, then add 1 to the hierarchy of the previous layer as the node hierarchy of the present operator.

The hierarchy of the upper level nodes of the operator nodes is calculated in this process. But at the same time, considering whether the upper node is in the same process block as the calculated node or contains it, otherwise the upward recursive computation is stopped.

In this embodiment, the method for simplifying the nodes in a single process block is as follows:

the same operation with the same input parameters is retained for only one. Because the same variable appears in multiple blocks, we expect a variable to occupy memory only once, and should not allocate memory repeatedly.

Operations that are not of interest in architectures such as sext (sign bit extension), zext (0 type extension), bitcast (type conversion), etc. are deleted.

And deleting the debug related instruction.

Using expression computation, multiple loads of the multidimensional array are reduced to one (for loaded information extraction). For example, the presence of getelementptr in the original IR indicates an addressing operation, which has 3 more parameters, namely, variable start address, variable offset address, and structure member compile address. For example, in

%tmp6 = getelementptr %struct.munger_struct* %P, i32 2, i32 1

Member No. 1 of the No. 2 array index (numbers all start from 0) representing struct. Sometimes the array index and member number are also calculated values, so the calculation needs to be expanded. Finally, it is represented in the figure in the form of an addressing expression and recorded in a custom.pre file. For example:

ID: Datum618, detail: vla1[k.2*10*reg2][j.0*reg2][(i.0-1)]: pointer integer, size: 4, addr: 16, writen: 0;

its format is ID [ ID name ], detail, variable addressing expression (containing calculation), pointer type, length of each data in array, offset address, and whether it is output.

Releasing the dependence that disappears due to simplification.

Reasoning, establishing connections between blocks

A for loop contains 4 parts, cond, body, inc and end. As shown in the following figure, a nested loop configuration is shown.

As shown in FIG. 4, the for loop structure is schematically illustrated. Where cond has one or more phi (branch option) nodes, where one or more variables are initialized or incremented (or accumulated), and then compared and a branch jump (br) is taken based on the comparison.

inc has add and br.

The body may have operations or may have only one br (jump to other loops). If there is a subcycle, there are several cases:

1. only the subcycles: the parent loop has only br and jumps to the child loop cond.

2. The subcycles are preceded by the general calculation: the content of the body as a parent loop is computed in normal and then br jumps to the child loop.

3. There is a common calculation after the sub-cycle: the parent loop body only has br, jumps to the child loop cond, and after the child loop executes, jumps to for.

4. There are several human-child cycles in the parent cycle: the body of the parent loop has only one br to jump to the cond of the first sub-loop, and after one sub-loop is finished, the body jumps to the cond of the next sub-loop. If there is a non-loop between two loops, the non-loop is considered to be for. The execution sequence is as 3.

end may or may not be calculated (not perfectly circular). Unlike body, end is executed only once and at the end of the sub-loop, belonging to the sub-loop.

In simplification, the conc and inc which are not too much calculated are simplified into a subgraph containing only the additional information plus br, and the calculation of the end part is taken as another subgraph (if any). As shown in fig. 8, a simplified for loop structure.

Wherein, cond + inc is simplified into a form of loop information attached to the body graph, and is not a graph connection. The loop information format is as follows:

loop information name (same body name), loop variable name (such as i), loop start value, loop end value, loop increment value, increment direction (+, -), nested depth of the loop, parent loop name.

while the while loop has no inc module, so the loop information does not need to be simplified.

When there is an if branch, as shown in fig. 9, a panoramic view of a loop structure with branches.

The parts between if _ then or if _ else and if _ end are nested, and the front and back are in order.

The needed reconstruction is as follows:

1. the branches except cond and inc are reserved.

2. And (5) converting cond and inc into br (jump node of original cond).

3. The jump form of the sub-loop is converted into a nested form.

4. The symbol table also adds the variable of the end part.

In the present embodiment, load information is generated. The purpose of generating dfg is to generate a visual graph structure, and more importantly, to transmit the information of the analysis code to the back end of the compiler as much as possible, so that the implementation is more convenient. This load information is generated.

As shown in fig. 10, the information data structure diagram is loaded. The flowelements represents the whole loading information of a process block, and is similar to a traditional process structure and comprises 3 segments: the method comprises the following steps that (1) an uninitialized data section bss, an initialized data section data and a code section code (the content of the code section is reserved for a back-end compiler to be filled in), wherein each section comprises an initial address and size information; symbol tables are composed of a series of Symbolinfo structures, wherein each Symbolinfo structure indicates information such as name, offset address, length and whether a variable is output; the demand represents resources required to be occupied by the process block, and the resources comprise an operator number (PE _ num), a bank number load _ num required to be loaded with a memory at the same time, a bank number store _ num required to be stored with the memory at the same time, a maximum level number level and whether an operator is a stop operator hasUnknownOp; scalar represents loop information, corresponding to the above description, except for the addition of a subcoop name and an initial value table initTable.

To record the generated graph for later manual review or reuse of the data structure, the graph is converted to a dot file using a standard format. With the foregoing data structure, the dot file is a matter of arranging the idioms. As shown in FIG. 11, dot illustrates the illustration.

Wherein the dfg file conforms to the dot format standard. Elements include operators, data, blocks, loop information, and connections between them.

The black rectangular box represents data, and the 4 lattices each represent a data name, a type size, and the number of data.

Ellipses represent operators, including operator names and priorities (with: partitions). The priority is calculated from the structure of the graph, but the dependency is not currently considered.

An operator may have an input and an output, which are connected by edges, and the output may become the input of another operator. The drawing is naturally formed by such connection. The input edge has a number indicating the input data as the parameter number of the operator.

The edges of the arrowed line a represent memory dependencies, pointing to the depended instructions (operators). The memory depends on 3 parameters separated by spaces, the first one represents the node type, which may be's' (singlelnstruct),'m' (multifuct), 'pi' (PiBlock), 'r' (Root) and 'u' (unbnown). The 2 nd parameter represents a constraint type, which may be 'c' (fuse), 'f' (flow), 'o' (output), 'a' (anti) and 'i' (input).

The large boxes c and d containing data and operations represent the boundaries of the block. Such as loops, branches, etc.

The dotted arrow b connecting the different blocks or operators is to let us know which is to be performed next (since the blocks of the same level should have order, some operators have no input or output). Nested blocks are executed in the order of the sub-loops according to the loop information of the parent loop.

The loop condition and loop increment blocks are replaced by a set of loop information. The loop information includes the loop variable name, start, end, step value, increment direction, depth of the loop and sub-loop name.

In this embodiment, the optimization command opt of llvm is used to invoke the process (pass) of the present invention to generate the dot file.

Such as opt-load dfg. so-scale-evolution-dfg-analyze pathTo/xxx. ll

So is the command option to call the flow; dfg is an option to perform the present procedure; pathTo/xxx. ll is the path that previously generated the IR file.

Html files are generated (facilitating batch inspection).

When the correctness of the whole project is confirmed, a large amount of checking and verifying work is needed, and a large amount of time is consumed for comparing files one by one. Therefore, the method is convenient for batch inspection. The method is that all dot files to be checked are placed under the same path, all dot files under the path are traversed, and the dot files are converted into png or jpg files by an xdot tool. Writing a script for generating an html file, wherein each segment of the script contains a record to be checked, and the contained information comprises a file name, a C/C + + source code (link), an IR code (link), a dfg graph before simplification (png link) and a dfg graph after simplification (png format). As shown in fig. 12, the effect diagram is displayed in batch.

In this embodiment, the IR of llvm can be converted to a custom dfg data structure. The loop condition and loop end of the above dfg data structure are removed from the figure and converted into loop information. The multi-step addressing calculation is simplified into a one-step addressing operation. The compilability of the code is judged according to the architecture instruction set and the constraint condition. And recalculating the memory dependence distance on the basis of the existing dependence analysis flow. And saving the customized dfg data structure into a dot format file. And converting the dot file into a graphic file and displaying the graphic file in batches by using a browser, so that the debugging is facilitated.

In this embodiment, a data flow graph is automatically generated for a compiler by using existing partial analysis flows of llvm front end and Intermediate Representation (IR), wherein the flow of the compiler automatically generating the data flow graph generates a data structure taking a relational graph as a core, which can be manually checked, and can provide guidance for subsequent automatic instruction set mapping (mapping) and data deployment (banking) of the compiler, and is an important component in a reconfigurable compiler.

The present embodiment may be used in a front end of a reconfigurable compiler, but is not limited to a reconfigurable compiler. Any architecture that requires instructions to be executed in parallel may be used.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method according to the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a data flow graph generating apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details of which have been already described are omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.

Fig. 13 is a block diagram of a data flow graph generating apparatus according to an embodiment of the present invention, and as shown in fig. 13, the data flow graph generating apparatus includes:

an analysis unit 1401 configured to analyze dependency information of data in the code block in a scenario where the parent loop code exists in the code block.

A first generating unit 1403, configured to traverse each process block in the code blocks, and generate a block data flow graph corresponding to each process block.

And a second generating unit 1405, configured to analyze and add dependency information to the data flow graph corresponding to each process block, and generate a code data flow graph corresponding to the code block.

A third generating unit 1407, configured to eliminate repeated variables in the code data flow graph, simplify the process blocks, establish a connection relationship between the process blocks, and generate a first data flow graph file, so that the data flow graph of the code blocks is displayed by the first data flow graph file.

With the embodiments provided in the present application, the analysis unit 1401 analyzes the dependency information of the data in the code block in the scenario where the parent loop code exists in the code block; a first generating unit 1403 traverses each process block in the code block to generate a block data flow diagram corresponding to each process block; the second generation unit 1405 analyzes the data flow graph corresponding to each process block and adds dependency information to generate a code data flow graph corresponding to the code block; the third generating unit 1407 eliminates repeated variables in the code data flow graph, simplifies the process blocks, establishes a connection relationship between the process blocks, and generates the first data flow graph file, so that the data flow graph of the code blocks is displayed by the first data flow graph file. The above method utilizes llvm front end, Intermediate Representation (IR) and its existing partial analysis steps. llvm is an open-source compiler framework, which is convenient for users to add compilation steps (pass) or modify compilation flow according to the architecture of their own processor. A front-end for a reconfigurable compiler, but is not limited to a reconfigurable compiler.

Optionally, the above apparatus may further include; the statistical unit is used for eliminating repeated variables in the code data flow graph, establishing connection relations among all process blocks, and obtaining a data flow graph file, so that after the data flow graph of the code blocks is displayed through the first data flow graph file, the connection relations among all the process blocks in the code blocks are counted; the simplifying unit is used for extracting process blocks meeting preset conditions and simplifying the process blocks in the code blocks; the acquisition unit is used for acquiring the association or nesting relation among the process blocks; and a fourth generating unit, configured to complement dependent distances between data in the process blocks, and generate a second data flow graph file, where the dependent distances indicate multiple times of access of the data in the memory to the code block, and cycle times of intervals between the multiple times of access.

Optionally, the apparatus may further include: and a fifth generating unit, configured to complement the dependent distances between the data in the process blocks, and generate code block loading information after generating the second dataflow graph file.

Optionally, the apparatus may further include: and the deletion conversion unit is used for deleting or converting useless process blocks in the process blocks into loop information after acquiring the association or nesting relation among the process blocks.

Optionally, the first generating unit 1403 may include: and the generating module is used for presenting each process block in the middle representation IR in a tiled structure, analyzing IR statements one by one, filling an operator and an operand of each IR into a corresponding data structure, and generating a block data flow graph corresponding to each process block.

Optionally, the above apparatus may further include; and the conversion unit is used for converting the code block files with different formats into the intermediate representation IR format file before analyzing the dependency information among the sub code blocks in the code block under the condition that the parent loop code exists in the code block.

It should be noted that the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

An embodiment of the present invention further provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps in any of the method embodiments described above when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, analyzing the dependency information of the data in the code block under the condition that the code block has a parent loop code;

s2, traversing each process block in the code block, and generating a block data flow diagram corresponding to each process block;

s3, analyzing the data flow graph corresponding to each process block and adding dependency information to generate a code data flow graph corresponding to the code block;

and S4, eliminating repeated variables in the code data flow graph, simplifying process blocks, establishing a connection relation among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the code blocks through the first data flow graph file.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s2, traversing each process block in the code block to generate a block data flow diagram corresponding to each process block;

and S4, eliminating repeated variables in the code data flow graph, simplifying process blocks, establishing connection relations among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the code blocks through the first data flow graph file.

Optionally, for a specific example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for generating a dataflow graph, the method including:

analyzing the dependency information of the data in the code block under the condition that a parent loop code exists in the code block;

traversing each process block in the code block to generate a block data flow diagram corresponding to each process block;

analyzing the data flow graph corresponding to each process block, adding the dependency information, and generating a code data flow graph corresponding to the code block;

and eliminating repeated variables in the code data flow graph, simplifying the process blocks, establishing a connection relation among the process blocks, and generating a first data flow graph file so as to display the data flow graph of the code blocks through the first data flow graph file.

2. The method of claim 1, wherein the removing of the repeated variables in the code data flow graph, simplifying the process blocks and establishing a connection relationship between the process blocks, generating a first data flow graph file, such that after the data flow graph of the code blocks is displayed by the first data flow graph file, the method further comprises;

counting the connection relation of each process block in the code block;

extracting process blocks meeting preset conditions, and simplifying the process blocks in the code blocks;

acquiring the association or nesting relation among the process blocks;

and complementing the dependent distance between the data in each process block to generate a second data flow graph file, wherein the dependent distance represents the cycle times of the interval between multiple accesses when the data in the memory is accessed multiple times in the code block.

3. The method of generating a dataflow graph according to claim 2, wherein after completing the dependent distances between the data in the respective process blocks and generating a second dataflow graph file, the method further includes:

and generating the code block loading information.

4. The method of generating a dataflow graph according to any one of claims 1 to 3, wherein the dependency information includes a precedence order and/or an inclusion relationship.

5. The method of generating a dataflow graph as set forth in claim 2, wherein after obtaining the association or nesting relationship between the process blocks, the method further includes:

and deleting useless process blocks in each process block or converting the useless process blocks into cycle information.

6. The method for generating a data flow graph according to claim 1, wherein traversing each process block in the code blocks to generate a block data flow graph corresponding to each process block includes:

and presenting each process block in the middle representation IR in a tiled structure, analyzing IR sentences one by one, filling operators and operands of each IR into a corresponding data structure, and generating a block data flow graph corresponding to each process block.

7. The method of generating a dataflow graph according to claim 1, wherein before analyzing dependency information between sub-code blocks in a code block in a scenario in which a parent loop code exists in the code block, the method further includes;

the code block files in different formats are converted into IR format files.

8. An apparatus for data flow graph generation, the apparatus comprising:

an analysis unit configured to analyze dependency information of data in a code block in a scenario where a parent loop code exists in the code block;

the first generating unit is used for traversing each process block in the code blocks and generating a block dataflow graph corresponding to each process block;

a second generating unit, configured to analyze the data flow graph corresponding to each process block, add the dependency information, and generate a code data flow graph corresponding to the code block;

a third generating unit, configured to eliminate a repeated variable in the code data flow graph, simplify the process blocks, establish a connection relationship between the process blocks, and generate a first data flow graph file, so that the data flow graph of the code block is displayed by the first data flow graph file.

9. The apparatus of claim 8, further comprising:

a counting unit, configured to eliminate a repeated variable in the code data flow graph, establish a connection relationship between the process blocks, and obtain a data flow graph file, so that after the data flow graph of the code block is displayed by using the first data flow graph file, the connection relationship between the process blocks in the code block is counted;

the simplification unit is used for extracting the process blocks meeting the preset conditions and simplifying the process blocks in the code blocks;

the acquisition unit is used for acquiring the association or nesting relation among the process blocks;

and a fourth generating unit, configured to complement a dependency distance between data in each process block, and generate a second data flow graph file, where the dependency distance represents a cycle number of intervals between multiple accesses when the data in the memory is accessed multiple times in the code block.

10. The apparatus of claim 9, further comprising:

a fifth generating unit, configured to complement the dependent distance between the data in each process block, and generate the code block loading information after generating a second dataflow graph file.