CN113157917B

CN113157917B - OpenCL-based optimized classification model establishing and optimized classification method and system

Info

Publication number: CN113157917B
Application number: CN202110277166.8A
Authority: CN
Inventors: 汤战勇; 张宇翔; 赵佳棋; 张�成; 叶贵鑫; 房鼎益
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2023-03-24
Anticipated expiration: 2041-03-15
Also published as: CN113157917A

Abstract

The invention belongs to the technical field of source code mapping and feature learning, and discloses a method and a system for establishing and optimizing a classification model based on OpenCL. The modeling method comprises the following steps: preprocessing an OpenCL Kernel program, performing Cl2Graph source code-to-Graph operation on the preprocessed program, then learning an effective representation of a code Graph by using an innovative Relational Graph Neural Network (RGNN), and finally entering a decision network to complete a heuristic optimization task of heterogeneous device mapping and thread coarsening factors. According to the method, the syntactic semantic relation of the code is enriched through the process of turning the graph, the characteristics of the code are reserved, on the other hand, the effective characteristics of the code are learned through effectively modeling the edge through a neural network, and finally, the heuristic optimization task of the OpenCL program is guided, so that the operation efficiency is greatly improved, and a higher acceleration ratio is obtained.

Description

OpenCL-based optimized classification model establishing and optimized classification method and system

Technical Field

The invention belongs to the technical field of source code mapping and feature learning, and particularly relates to a method and a system for establishing and optimizing a classification model based on OpenCL.

Background

In recent years, with the intensive research on technologies such as machine learning, image processing, and unmanned driving, demands for a machine computation speed have become more and more stringent. OpenCL is an industry standard framework for programming computers that are composed of a combination of CPUs, GPUs, and other processors. OpenCL is a common multi-CPU/GPU/other chip heterogeneous computing (hetereogenous) standard, initiated by multiple companies and organizations, that is cross-platform. The method aims to fully utilize the powerful parallel abacus capacity of the GPU and the cooperative task of the GPU and the CPU, and more efficiently utilize hardware to efficiently complete large-scale abacus.

The OpenCL framework enables programs written by it to run transparently on heterogeneous devices, such as on a GPU or CPU. The computer computing power is liberated, and the development of big data is promoted. However, as the operating environment and the devices change, the mapping of heterogeneous devices changes, and how to determine the best mapping device becomes a research hotspot in recent years. In addition, after the optimal device is determined, the concurrent execution of the program can effectively improve the operating efficiency, but the excessive allocation process can cause the overhead of the write text switching and the resource. It is also a great research focus to improve the execution efficiency of a program given how to optimally allocate the number of threads for a program.

The traditional machine learning method extracts the code features manually, but the process is time-consuming and labor-consuming and cannot realize end-to-end heuristic optimization, for example, parameters such as global memory number and local memory access number of a program are extracted manually by Grewe and the like to serve as the features of the program, and a decision tree algorithm is utilized to complete a heterogeneous mapping task. In order to complete the task of predicting the thread coarsening factor, magni and the like manually extract various program information such as branch diffusion and instruction mixing of a program, and obtain the first 7 main components as program characteristics by using a main component analysis technology, so that the statistical methods consume huge manpower and are complicated in process. In recent years, the deep learning method mainly treats codes as texts, firstly serializes and codes the texts, and then uses LSTM network learning program characteristics.

Disclosure of Invention

The invention aims to provide a method and a system for establishing an OpenCL-based optimization classification model and optimizing a classification model, which are used for solving the problems that an OpenCL program is unreasonably allocated when a heterogeneous device mapping task is performed and threads are unreasonably allocated in a thread coarsening factor prediction task in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

the method for establishing the optimized classification model based on the OpenCL comprises the following steps:

step 1: the method comprises the steps of obtaining a source code data set and a source code category set, wherein the source code data set comprises a plurality of source codes, the source code category set comprises categories of the source codes in the source code data set, and an AST (active AST) graph set and an IR (infrared) graph set are established according to the source code data set, wherein the following sub-steps are satisfied when a plurality of AST graphs and a plurality of IR graphs are generated according to any source code in the source code data set:

step 1.1: preprocessing a source code to obtain a standardized source code;

step 1.2: acquiring a main function in a standardized source code, performing AST analysis on the main function in the standardized source code to acquire an AST tree of the standardized source code, and performing IR conversion on the main function in the standardized source code to acquire an intermediate instruction set;

step 1.3: and (3) traversing and adding AST edges to the AST tree obtained in the step (1.2) according to the depth, and generating a plurality of AST graphs, wherein each AST graph comprises one AST edge and the types of the AST edges contained in different AST graphs are different, and the types of the AST edges comprise: a Child edge, a Next _ Token edge, a Guarded edge, a Jump edge, and a computer _ From edge;

step 1.4: adding an IR edge to the intermediate instruction set obtained in the step 1.2 to generate a plurality of IR graphs, wherein each IR graph comprises one IR edge and the IR edges of different IR graphs are different in type, and the types of the IR edges comprise: a sequential flow edge, a data flow edge and a control flow edge;

step 2: establishing an RGNN model, wherein the RGNN model comprises an embedding layer, a batch standardization layer and a full connection layer, taking an AST atlas and an IR atlas as training sets, taking a source code category set as a label set, training the RGNN model, and taking the trained RGNN model as an optimized classification model;

the embedding layer is used for acquiring a feature matrix of each source code, an adjacency matrix of all AST graphs of each source code and an adjacency matrix of all IR graphs of each source code and generating an embedding layer vector of each source code;

the batch normalization layer is used for acquiring the embedded vector of each source code and normalizing the embedded vector to obtain a normalized vector of each source code;

the full connection layer is used for acquiring the normalized vector of each source code for classification and outputting the category of the source code.

Further, the type of the source code is a type of operating device or a type of number of allocated threads, the type of the operating device includes 2 types, and the type of the number of allocated threads is 6 types.

Further, the fully-connected layer includes two layers, where the first layer includes 16 nerves, the second layer includes N neurons, N is a positive integer greater than or equal to 2, N =2 if the type of the source code is the type of the operating device, and N =6 if the type of the source code is the type of the number of allocated threads.

Further, the preprocessing comprises segmentation, annotation elimination, macro definition replacement, function name replacement and variable name replacement which are sequentially carried out; the function name is replaced by: different functions are uniformly named as { A, B., Z } according to the sequence, and the variable names are replaced by: different variable names are uniformly named as { a, b.

Further, the batch normalization layer is further configured to obtain a configuration parameter of each source code, and normalize the configuration parameter of each source code after splicing the configuration parameter of each source code with the embedded vector to obtain a normalized vector of each source code.

The OpenCL-based optimization classification method comprises the following steps:

the method comprises the following steps: acquiring a source code to be classified;

step two: generating all AST images and all IR images of a source code to be classified by adopting the establishment method of the OpenCL-based optimized classification model in any one of claims 1 to 5;

step three: inputting all AST graphs and all IR graphs of the source code to be classified into the optimized classification model obtained by the OpenCL-based optimized classification model establishing method in any one of claims 1-5, and obtaining the category of the source code to be classified.

An OpenCL-based optimized classification system comprising a processor and a memory for storing a plurality of functional modules operable on the processor, the functional modules comprising: the system comprises a data acquisition module, a graph generation module, a model training module and a classification module;

the data acquisition module is used for acquiring a source code data set and a source code category set, the source code data set comprises a plurality of source codes, the source code category set comprises categories of the source codes in the source code data set,

the diagram generating module is used for establishing an AST diagram set and an IR diagram set according to a source code data set, wherein a plurality of AST diagrams and a plurality of IR diagrams generated according to any source code in the source code data set meet the following sub-modules:

the first submodule is used for preprocessing the source code to obtain a standardized source code;

the second sub-module is used for acquiring a main function in the standardized source code, performing AST analysis on the main function in the standardized source code to acquire an AST tree of the standardized source code, and performing IR conversion on the main function in the standardized source code to acquire an intermediate instruction set;

the third sub-module is configured to generate a plurality of AST graphs by adding AST edges to the AST tree obtained by the second sub-module in a traversal manner according to the depth, where each AST graph includes one AST edge and different AST edges included in different AST graphs are of different types, and the types of AST edges include: a Child edge, a Next _ Token edge, a Guarded edge, a Jump edge, and a computer _ From edge;

the fourth sub-module is configured to add an IR edge to the intermediate instruction set obtained by the second sub-module, and generate a plurality of IR maps, where each IR map includes one kind of IR edge and the kinds of IR edges included in different IR maps are different, and the kinds of IR edges include: a sequential flow edge, a data flow edge and a control flow edge;

the model training module is used for establishing an RGNN model, the RGNN model comprises an embedding layer, a batch standardization layer and a full connection layer, an AST (active wavelet transform) image set and an IR (infrared) image set are used as training sets, a source code category set is used as a label set, the RGNN model is trained, and the trained RGNN model is used as an optimized classification model;

the full connection layer is used for acquiring the normalized vector of each source code for classification and outputting the category of the source code;

the classification module is used for acquiring a source code to be classified, generating all AST images and all IR images of the source code to be classified, inputting all AST images and all IR images of the source code to be classified into an optimized classification model obtained by the model training module, and obtaining the classification of the source code to be classified.

Further, the fully-connected layer includes two layers, where the first layer includes 16 nerves, the second layer includes N neurons, N is a positive integer greater than or equal to 2, if the type of the source code is the type of the operating device, N =2, and if the type of the source code is the type of the number of the allocated threads, N =6.

Compared with the prior art, the invention has the following technical characteristics:

(1) The method can effectively complete OpenCL Kernel heterogeneous mapping tasks and thread coarsening factor prediction tasks, the heterogeneous mapping prediction accuracy can reach 88.7%, and the performance of the obtained accelerated advanced method is improved by 11.1%. The method can show excellent acceleration ratio improvement on a large platform and a plurality of platforms in a thread coarsening factor prediction task, and the performance of the obtained acceleration is improved by 5.2% compared with the most advanced method.

(2) The invention can fully retain the structural information of the source code and enrich the syntactic semantic information of the code in the process of converting the source code into the graph. Firstly, an abstract syntax tree is analyzed in the Cl2Graph process, the code content and the structure information stored in the tree data structure are higher than the sequence relation information of a text, secondly, the structure information of the code is further enriched through a border adding algorithm, and meanwhile, the characteristics of multiple life cycles of the code are recorded by turning at the instruction level, so that effective technical support is provided for the subsequent Graph network research.

(3) The relational graph neural network can effectively model the relationship between the nodes, not only considers the information of the nodes, but also considers the information of the connection relationship between the nodes, learns the embedded vector of the final learning graph of each node by utilizing the communication of the information of the neighborhood nodes, and provides a new idea for the graph neural network.

(4) The present invention has good flexibility and expandability, is suitable for other relevant tasks of relevant codes, and is naturally compatible with high-level languages such as C-like language, java, etc. because the present invention uses the Clang compiler to extract AST. The present invention facilitates tasks with a small number of datasets using a migration learning technique, for example the invention extends to defect code detection.

(5) The invention provides an end-to-end heuristic optimization method, namely, an input source code directly outputs an optimization result. Compared with the traditional machine learning, the method is more efficient and convenient, the traditional machine learning method needs to manually extract code features, the process is time-consuming and labor-consuming and needs expert support in a specific field, and the method realizes one-stop end-to-end heuristic optimization through a good graph conversion method and a graph network deep learning technology, so that the step of manually extracting the features is omitted.

(6) The method has good speed-up ratio effect on the heterogeneous equipment mapping and the thread coarsening factor prediction task. Compared with the most advanced method in the heterogeneous parallel, the method improves the performance by 11.1 percent. In the thread coarsening task, the performance is improved by 5.2% compared with the most advanced method.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of AST-based construction of the Cl2Graph process of the present invention;

FIG. 3 is a diagram of a procedure for IR-based construction in the Cl2Graph process of the present invention;

fig. 4 is a process of operation of the relational graph neural network RGNN.

Detailed Description

The technical terms appearing in the present invention are explained first:

ASTChild edge: the ASTChild edge is a form of transformation of the AST tree after traversal of the AST tree using a depth-first search algorithm. The benefit of using a non-recursive auxiliary stack for depth-first traversal is to avoid the inefficiency of recursion.

NextToken edge: nextToken edges are used to record the sequential relationship of the codes. Traversing the AST tree by using a depth-first search algorithm, connecting each grammar token with its sibling leaf nodes in the traversal process as shown in fig. 2, first determining whether the current node is a leaf node, where it is determined that there is no child node, and if the sibling node of its parent node has a node and the node is a leaf node, connecting the node with the leaf node.

GuardedBy side: guardebye edges are used to change the expression that connects a conditional statement variable with the corresponding closure, e.g. the code if (a > b) {. A =5. } else {. B =4. }, connects node (a > b) with a =5, connects node (a > b) with b =4, since the condition in the conditional statement is a > b, since the values of a, b cause jumps in the program, and a, b are present in the action domains of if and else. Thus, the edge records control flow information of the program.

Jump edge: the Jump edge mainly comprises three uses: the edges are for control flow information of rich code, jumping of if-else statements, jumping of switch-case, and jumping of for loops.

ComputedFrom edge: the Jump edge is used for recording the data flow information of the code, for example, in the expression a = b + c, the value of a is controlled by b and c, so that the node connection of the assignment statement can express the trend of the data flow, when the node b and the node c are respectively connected with the node a,

sequentially flowing edges: the sequential stream edge is used for connecting the sequential relation of the IR instructions, the sequential stream indicates the sequence of the IR instruction operation types, the execution sequence of the instruction operation is generally carried out according to the sequence of the occurrence of the instructions, when special conditions such as br jump instructions and the like are met, the instruction execution sequence jumps to the basic block appointed by the IR instructions, so that the instructions in the target basic block are sequentially executed, the operation types in the instructions need to be judged in the adding process, and the operation codes are connected with the next operation codes needing to be executed.

Data stream side: and the data stream edge is used for recording the assignment process and the trend of the variable in the register. For example, in the instruction "%4= add i 64"% 3,1", a 64-bit integer variable"% 3 "is added to 1 and assigned to the variable"% 4", and for this instruction, the variable"% 3 "is the incoming data stream, and after the opcode calculation, the outgoing data stream is the variable"% 4". The IR data flow records the flow direction of variables, and the syntax semantic information is enriched at the intermediate instruction level.

Controlling the flow edge: control flow is used to record jump logic for intermediate instructions, the main jump instructions including br instructions, which are used to deliver control flow to different code blocks, including conditional jumps and unconditional jumps. The conditional jump instruction jumps to a basic block of a specified address in the current function control, and the unconditional jump instruction is used to deliver the control flow to one of a number of basic blocks. When a br instruction is encountered in the execution process, the IR control flow jump instruction is connected with a first instruction in a target basic block in the basic block needing to be jumped according to the jump condition.

The embodiment discloses a method for establishing an OpenCL-based optimization classification model, which comprises the following steps:

step 1: acquiring a source code data set, wherein the source code data set comprises a plurality of source codes, and an AST graph set and an IR graph set are established according to the source code data set, wherein the sub-steps of generating a plurality of AST graphs and a plurality of IR graphs according to any source code in the source code data set are as follows:

step 1.1: preprocessing a source code to obtain a standardized source code;

step 2: establishing an RGNN model, wherein the RGNN model comprises an embedding layer, a batch standardization layer and a full connection layer, an AST atlas and an IR atlas are used as training sets to train the RGNN model, and the trained RGNN model is used as an optimized classification model;

the embedding layer is used for acquiring a feature matrix corresponding to each source code, an adjacent matrix of each AST image and an adjacent matrix of each IR image and generating an embedding layer vector of each source code;

and the full connection layer is used for acquiring the normalized vector of each source code for classification and outputting the types of the operating equipment and the distribution thread number.

Specifically, the preprocessing includes segmentation, annotation removal, macro definition replacement, function name replacement, and variable name replacement performed in sequence.

Specifically, the segmentation is as follows: adopting lexical analysis (scanning scanner) in a Clang compiler, reading source codes, combining the source codes into individual identification tokens according to a preset rule, and finally, dividing the whole codes into a token list.

Specifically, the annotation elimination is: and traversing the token from the head by adopting syntactic analysis (serving as a parser) in a Clang compiler, searching all root parts of speech in the OpenCL code, and deleting source code annotations according to the type and content of the token in the traversing process.

The source code annotation includes a "/", "/" headed string. If starting with "//", all tokens in the same row after the token will be deregistered and if starting with "/", all subsequent tokens will be deregistered until the end of "/" is encountered.

Specifically, the macro definition is replaced by: in the process of traversing token, the macro instructions subjected to compiling preprocessing are replaced uniformly, for example, identifiers beginning with "# define", a hash table data structure is used in an optimization function to store definitions and values, and when token with the same definition is encountered in the process of traversing token, the corresponding values are searched for and replaced.

Specifically, the function name is replaced by: different functions are uniformly named as { A, B., Z } according to the sequence, and the variable names are replaced by: different variable names are uniformly named as { a, b. The abstract syntax tree represents the syntax structure of the programming language in the form of a tree, with each node on the tree representing a structure in the source code. The AST tree generated in the Clang is equivalent to abstract recombination of tokens, and converts the tokens serialized before into a tree form, where each node in each tree has not only a corresponding token but also attributes corresponding to the node, as shown in table 1, the nodes are descriptions of different node attributes, such as VarDecl, functional decl, decrefexpr, and the like, where VarDecl represents a variable declaration and functional decl is a method declaration. And judging whether the variable declaration or the function declaration is adopted by utilizing the attribute, uniformly naming different variable names as { a, B.., Z } by the system according to the sequence, and uniformly naming different method names as { A, B., Z } according to the sequence.

TABLE 1

In order to save memory space, the method specifically realizes that the method names and the number of the function names are recorded by self-defining two shaping variables, the ASCII codes of the characters are used for letter conversion and replace the original variable or function name, in this way, the newly named array or hash table is avoided from being established, so that the memory overhead is reduced, and the OpenCL Kernel comparison before and after standardization is shown in the table 2.

TABLE 2

Specifically, the construction method of the GuardedBy edge comprises the following steps: firstly, judging whether the attribute of the current node is IfStmt, then judging whether a binary operator Binaryoperator exists in a condition bracket or not to record variable names before and after binary operation, and judging whether a previously stored variable exists or not when traversing the scope corresponding to if and else, thereby adding a GuardedBy edge.

Specifically, the Jump edge construction method comprises the following steps: (1) And for the if branch, if the attribute of the current node is IfStmt, judging the conditional statement that the leftmost child node of the node is. Firstly, judging whether the child node of the current node is more than or equal to 3, if so, connecting IfStmt with all child nodes except the leftmost child node, namely connecting the current node with the first real node in the if-else action domain. If the number of child nodes of the current node is 2, it indicates that there is no else scope subsequently, at this time, the current node needs to connect a second child node and also needs to connect with the next brother node of the current node, that is, it needs to connect both the node in the if judgment and the first node after the if scope. (2) For the switch-case branch, if the current node type is SwitchStmt, the current node type is connected with the next sibling node with the attribute of CaseStmt and SwitchStmt. That is, the current node needs to be connected with the content in the case judgment and needs to be connected with the next statement node outside the switch scope. (3) For the for loop, if the attribute of the current node is ForStmt, the node is connected with the last child node from left to right, namely, the current node is connected with the first node in the loop body, and simultaneously, the node is connected with the second child node, namely, the current node is connected with the first node outside the loop body.

Specifically, after the AST graph or the IR graph is constructed, in order to reduce the storage space of data, a data storage graph of an adjacency-like table is used instead of adjacency matrixes, for example [ [1], [2,3],. ], each list in the list represents the connection relationship of each corresponding node, bit 0 of the list is [1], the list represents that node No. 0 is connected with node No. 1, bit 1 of the same list is [2,3], and the list represents that node No. 1 is connected with node No. 2 and node No. 3.

Completing the custom edge-adding algorithm based on the AST and the IR, the embodiment generates eight corresponding graphs for each source code, each graph records connection information of each type of edge, and simultaneously stores the connection information in a similar adjacency list manner, and when subsequent data enters a network for training, the subsequent data is converted into an adjacency matrix in batch. In the conversion process, only the relation between the nodes needs to be judged, and if the relation exists, the corresponding position of the adjacent matrix is set to be 1.

Specifically, the construction method of the feature matrix is as follows:

(1) And learning the characteristics of the nodes in the AST tree to obtain a characteristic vector of each attribute, wherein the characteristic vector of each attribute is in a one-dimensional vector form of 1 × 100. Each AST tree node has a corresponding attribute, which uses a string to indicate the role of the node, as shown in table 1. And printing and storing the accessed node attributes in each AST tree, finally obtaining the node attribute sets of all graphs, learning the relationship between the node types and the node contexts by using the Skip-Gram algorithm of Word2Vec, and converting the marks in the sparse integer coding vocabulary into a vector space with lower dimensionality by the embedding method. The context length is set to 1 and the embedding length is set to 100, so that the purpose of this is to keep a sufficient amount of information as possible, and if the embedding length is too low, node content information may be lost.

(2) And recombining the node vectors. And (3) extracting and recombining features according to the node types, and if a graph converted by Kernel has 50 AST nodes, extracting corresponding vectors according to the node types and splicing to generate a feature matrix of 50 x 100. This completes the flow of all Cl2 Graph.

Specifically, in the embedded layer, the adjacency matrix of the AST map may be expressed as:

G1＝<G _ASTChild ,G _NextToken ,G _GuardedBy ,G _Jump ,G _ComputedFrom >wherein any G =<V,E>V represents a node set, and E represents an edge set;

extracting features according to the edge types to obtain the features mu 1, mu 1=G of the AST image _ASTChild *X+G _NextToken *X+,G _GuardedBy *X+...+G _ComputedFrom And performing convolution operation on the feature matrix X and the graph constructed by each edge to generate unique new node features of each edge, and finally aggregating the five feature matrices to finish feature extraction of edge types. Similarly, the IR map also performs the above operation to obtain an adjacency matrix G2 of the IR map and a feature mu 2 of the IR map, and the mu 1 and the mu 2 are spliced to obtain a node feature vector mu;

specifically, the vector generation process of the embedding layer in the embedding layer is shown in fig. 3, and includes the following steps:

firstly, the node feature vectors mu, G1 and G2 are input into the embedding layer, and the feature vector of the current node v is assumed to be X _v And the node u is the neighborhood of the current node v, and in order to learn the embedded vector of each node, the process is as follows: matrix W ₀ Is X _v Configure weights for X _v Initialization of (1), the updated neighborhood node set of the ith round is aggregated into

Then enter the embedding process of n layers, W ₁ Matrix is ^ er>

Configuring learned weight, performing relu activation once each time a graph convolution process is completed for filtering noise, performing n layers of operations, using tanh activation, overlapping current node features, performing relu activation, obtaining node update of the (i + 1) th round of the current node, obtaining features of each node after updating, finally aggregating into a vector of 1 × 32 according to rows to serve as an embedded vector, and completing an embedded vector generation process.

Specifically, the batch normalization operation implemented by the batch normalization layer includes the following steps:

inputting configuration parameters of source codes into a batch normalization layer, wherein the configuration parameters of the source codes comprise work item sizes and data transmission sizes, the work item sizes represent allocated spaces, the data transmission sizes represent input calculated values, the two data are key features of Kernel operation, each auxiliary input corresponding to each Kernel is a vector of 1*2, a batch of data is set to be 64, for example, 64 x 32 represents feature vectors of 64 Kernel data, and a batch of 64 x 34 feature vectors are generated by splicing the auxiliary inputs according to rows. The method has the advantages that the problem of gradient disappearance is avoided, meanwhile, the convergence rate of the algorithm is improved, and the classification effect is effectively improved.

Specifically, the batch normalization layer is further configured to obtain a configuration parameter of each source code, and normalize the configuration parameter of each source code after splicing the configuration parameter of each source code with the embedded vector to obtain a normalized vector of each source code.

Specifically, a double-layer full-connection layer network is used as a decision network for heuristic optimization prediction, wherein a first layer is 16 nerves, and a second layer is N neurons;

in the heterogeneous device mapping task, the second fully-connected layer is a GPU or a CPU which is predicted to run by 2 neurons correspondingly. In the thread coarsening factor prediction task, the second layer full-link layer comprises 6 neurons, and the neurons respectively correspond to predicted coarsening factors {1,2,4,8, 16 and 32}. The structure of the decision network can be subsequently adjusted according to different prediction tasks.

The RGNN model provided by the method is a model algorithm integrating the ideas of an RGCN model and a GNN model, and the graph neural network performs feature extraction through the structure of a learning graph, but the method does not consider the relation of edges; although the relational graph convolutional neural network (RGCN) can model the relation, the most original graph convolution algorithm is only used in the feature learning process, and the complex structure cannot be effectively learned, so that the RGNN model considers the modeling of edges, and simultaneously learns the global features of the graph in a mode of node aggregation updating through message transmission. And finally, aggregating the learned new feature matrix into a one-dimensional embedded vector according to columns for representing the whole code graph.

The embodiment also discloses an OpenCL-based optimization classification method, which comprises the following steps:

step two: generating all AST images and all IR images of a source code to be classified by adopting an OpenCL-based establishment method of an optimized classification model;

step three: and inputting all AST images and all IR images of the source code to be classified into an optimized classification model obtained by the establishment method of the optimized classification model based on OpenCL, and obtaining the category of the source code to be classified.

In this embodiment, an OpenCL-based optimized classification system is further disclosed, where the system includes a processor and a memory for storing a plurality of functional modules capable of running on the processor, where the functional modules include: the system comprises a data acquisition module, a graph generation module, a model training module and a classification module;

the embedding layer is used for acquiring a feature matrix of each source code, an adjacent matrix of all AST graphs of each source code and an adjacent matrix of all IR graphs of each source code and generating an embedding layer vector of each source code;

Specifically, the type of the source code is a type of operating device or a type of number of allocated threads, the type of the operating device includes 2 types, and the type of the number of allocated threads is 6 types.

Specifically, the preprocessing comprises segmentation, annotation elimination, macro definition replacement, function name replacement and variable name replacement which are sequentially performed; the function name is replaced by: different functions are uniformly named as { A, B., Z } according to the sequence, and the variable names are replaced by: different variable names are uniformly named as { a, b.

Example 1

In this example, the following heuristic optimization experiment was performed.

(1) Heterogeneous device parallel tasks

The heterogeneous device parallel task aims to predict whether Kernel is efficiently distributed on a specific device to run on a CPU or a GPU. In the parallel prediction task of the heterogeneous equipment, 680 parallel data sets including a park data set, an SHOC data set, an AMD data set, a NVIDIA SDK data set, an NPB data set, a Rodinia data set and a PolyBench data set are adopted in the experiment, ten thousand crawled data are adopted in the training set, the experiment is respectively carried out on an equipment CPU Intel Xeon E5-2667 data set and a GPU NVIDIA TITAN Xp data set, the information of an experiment platform is shown in a table 3, the experiment is compared with the method of Static mapping, grewe and the like and a Deeptone acceleration tool, and the Static mapping is used as a reference to calculate the corresponding acceleration ratio. Experimental results show that Grewe and DeepTune respectively obtain acceleration ratios of 3.17 and 3.25, and the method can achieve the acceleration ratio of 3.61 times under the best condition, so that the acceleration ratio performance is improved by 11.1%.

TABLE 3

(2) Thread coarsening factor task

The thread coarsening factor task aims to predict how the same program can distribute process numbers to obtain the best operation efficiency, an experimental data set consists of 17 OpenCL Kernel extracted by Parboil, NVIDIA SDK and AMD SDK, the optimal thread coarsening factor is predicted in {1,2,4,8, 16 and 32}, and a model is evaluated by a leave-one-out method. It is worth noting that in order to embody the expandability of the method, a migration learning technology is adopted to obtain the multi-relation graph network weight of the parallel task large-data-volume training of the heterogeneous equipment, the method solves the problem of small data set quantity, meanwhile, the training time is shortened, the method is compared with Magni, deeptone, GNN-AST, GNN-CDFG and the like, experimental results show that 1.25 times, 1.02 times and 1.03 times of acceleration ratios can be respectively obtained on four platforms, the average acceleration ratio is 1.08, and compared with the most advanced method, the method improves the average performance by 5.2%.

Claims

1. The method for establishing the OpenCL-based optimization classification model is characterized by comprising the following steps of:

step 1.1: preprocessing a source code to obtain a standardized source code;

step 1.4: adding an IR edge to the intermediate instruction set obtained in the step 1.2 to generate a plurality of IR graphs, wherein each IR graph comprises one IR edge and the types of the IR edges of different IR graphs are different, and the types of the IR edges comprise: a sequential flow edge, a data flow edge and a control flow edge;

the construction method of the feature matrix comprises the following steps: (1) Learning the characteristics of nodes in the AST tree to obtain a characteristic vector of each attribute, wherein the characteristic vector of each attribute is a one-dimensional vector with the form of 1 × 100; each AST tree node has a corresponding attribute, and the attribute uses a character string to indicate the role of the node; printing and storing the accessed node attributes in each AST tree, finally obtaining the node attribute sets of all graphs, and learning the relationship between the node type and the node context by using the Skip-Gram algorithm of Word2 Vec; setting the context length to 1 and the embedding length to 100; and (2) node vector recombination: extracting and recombining features according to node types, and if a graph converted by Kernel has 50 AST nodes, extracting corresponding vectors according to the node types and splicing to generate a feature matrix of 50 × 100; so as to finish all Cl2Graph flows;

G1＝<G _ASTChild, G _NextToken, G _GuardedBy, G _Jump, G _ComputedFrom >wherein any G =<V,E>V represents a node set, and E represents an edge set;

extracting features according to the edge types to obtain the features mu 1, mu 1 of the AST image _＝ G _ASTChild *X+G _NextToken *X+ _, G _GuardedBy *X+...+G _ComputedFrom Performing convolution operation on the feature matrix X and the graph constructed by each edge to generate unique new node features of each edge, and finally aggregating the five feature matrices to finish feature extraction of edge types; similarly, the IR map performs the above operations to obtain an adjacency matrix G2 of the IR map and a feature μ 2 of the IR map, and the node feature vector μ is obtained by splicing μ 1 and μ 2.

2. The method as claimed in claim 1, wherein the type of the source code is a type of running device or a type of number of allocated threads, the type of the running device includes 2 types, and the type of the number of allocated threads is 6 types.

3. The method as claimed in claim 2, wherein the fully-connected layer includes two layers, the first layer is 16 nerves, the second layer is N neurons, N is a positive integer greater than or equal to 2, N =2 if the source code is of a type of operating device, and N =6 if the source code is of a type of number of allocated threads.

4. The method for establishing an OpenCL-based optimized classification model according to claim 1, wherein the preprocessing includes segmentation, annotation removal, macro definition replacement, function name replacement, and variable name replacement in sequence; the function name is replaced by: different functions are uniformly named as { A, B,. Different variable names are uniformly named as { a, b.

5. The method for establishing an OpenCL-based optimized classification model according to claim 1, wherein the batch normalization layer is further configured to obtain a configuration parameter of each source code, and normalize the configuration parameter of each source code after splicing the configuration parameter of each source code with the embedded vector to obtain a normalized vector of each source code.

6. The OpenCL-based optimization classification method is characterized by comprising the following steps of:

7. An OpenCL-based optimized classification system, characterized in that the system comprises a processor and a memory for storing a plurality of functional modules capable of running on the processor, the functional modules comprising: the system comprises a data acquisition module, a graph generation module, a model training module and a classification module;

the model training module is used for establishing an RGNN model, the RGNN model comprises an embedding layer, a batch standardization layer and a full connection layer, an AST atlas and an IR atlas are used as training sets, a source code category set is used as a label set, the RGNN model is trained, and the trained RGNN model is used as an optimized classification model;

the classification module is used for acquiring a source code to be classified, generating all AST images and all IR images of the source code to be classified, inputting all AST images and all IR images of the source code to be classified into an optimized classification model obtained by the model training module, and obtaining the category of the source code to be classified;

the construction method of the feature matrix comprises the following steps: (1) Learning the characteristics of nodes in the AST tree, and obtaining a characteristic vector of each attribute, wherein the characteristic vector of each attribute is a one-dimensional vector with the form of 1 × 100; each AST tree node has a corresponding attribute, and the attribute uses a character string to indicate the role of the node; printing and storing the node attributes accessed in each AST tree, finally obtaining a node attribute set of all graphs, and learning the relationship between the node type and the node context by using a Skip-Gram algorithm of Word2 Vec; setting the context length to 1 and the embedding length to 100; and (2) node vector recombination: extracting and recombining features according to node types, and if a graph converted by Kernel has 50 AST nodes, extracting corresponding vectors according to the node types and splicing to generate a feature matrix of 50 × 100; so as to finish all Cl2Graph flows;

performing feature extraction according to edge type to obtain AST image featuresSign μ 1, μ 1=G _ASTChild *X+G _NextToken *X+,G _GuardedBy *X+...+G _ComputedFrom Performing convolution operation on the feature matrix X and the graph constructed by each edge to generate unique new node features of each edge, and finally aggregating the five feature matrices to finish feature extraction of edge types; similarly, the IR map performs the above operations to obtain an adjacency matrix G2 of the IR map and a feature μ 2 of the IR map, and the node feature vector μ is obtained by splicing μ 1 and μ 2.

8. The OpenCL-based optimized classification system of claim 7, wherein the source code is classified into a class of run devices or a class of number of allocated threads, the class of run devices includes 2 classes, and the class of number of allocated threads is 6 classes.

9. The OpenCL-based optimized classification system of claim 8, wherein the fully-connected layer includes two layers, the first layer is 16 nerves, the second layer is N neurons, N is a positive integer greater than or equal to 2, N =2 if the type of the source code is a type of the running device, and N =6 if the type of the source code is a type of the number of allocated threads.

10. The OpenCL-based optimized classification system of claim 7, wherein the preprocessing includes segmentation, annotation removal, macro definition replacement, function name replacement, and variable name replacement in sequence; the function name is replaced by: different functions are uniformly named as { A, B., Z } according to the sequence, and the variable names are replaced by: different variable names are uniformly named as { a, b.