CN113885871A

CN113885871A - Special back-end code generation method and device for supporting machine learning training

Info

Publication number: CN113885871A
Application number: CN202111071061.3A
Authority: CN
Inventors: 姚海龙; 曾军; 李啸宇; 寇明阳
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2022-01-04

Abstract

The invention provides a special back-end code generation method and a special back-end code generation device for supporting machine learning training, wherein the method comprises the following steps: analyzing and optimizing the first calculation graph to obtain a second calculation graph; distributing the at least one operator to an operator mapping module based on the operator type of the at least one operator, and outputting an operator calling code corresponding to each operator; inputting the memory configuration information of the at least one operator into a memory management module, and outputting a memory management code; and generating a target back-end code based on the operator calling code and the memory management code corresponding to each operator, and compiling the target back-end code based on an operator library to obtain a deployment file. The invention realizes an end-to-end compiler supporting neural network training, improves the operation efficiency of the GPU applied to neural network training and reasoning, and expands the application range of the GPU.

Description

Special back-end code generation method and device for supporting machine learning training

Technical Field

The invention relates to the technical field of computers, in particular to a special back-end code generation method and device for supporting machine learning training.

Background

An image Processing Unit (GPU) is used as a traditional neural network training back-end device which is generally applied in the field of machine learning, integrates a large number of Single Instruction Multiple Data (SIMD) microprocessors, provides huge parallel computing capability, and obtains good results in neural network training, but due to the hardware architecture of the GPU, the GPU has a large promotion space in the aspects of complex logic control and Input Output (IO) intensive computation.

At present, a terminal-to-terminal compiler supporting training and reasoning, such as a kunlun chip, is not available as a chip dedicated to neural network training and reasoning, so that the computational efficiency of the GPU applied to neural network training and reasoning needs to be optimized urgently.

Disclosure of Invention

The invention provides a special back-end code generation method and device for supporting machine learning training, which are used for solving the problem that the operation efficiency of a GPU (graphics processing unit) applied to neural network training and reasoning needs to be optimized urgently.

In a first aspect, the present invention provides a method for generating a special back-end code for supporting machine learning training, including:

acquiring a first calculation graph, and analyzing and optimizing the first calculation graph to obtain a second calculation graph, wherein the first calculation graph is a neural network model formed by at least one operator according to a first sequence;

acquiring at least one operator of a second calculation graph, distributing the at least one operator to an operator mapping module based on the operator type of the at least one operator, and outputting an operator calling code corresponding to each operator;

acquiring the memory configuration information of the at least one operator, inputting the memory configuration information of the at least one operator into a memory management module, and outputting a memory management code;

and generating a target back-end code based on the operator calling code and the memory management code corresponding to each operator, and compiling the target back-end code based on an operator library to obtain a deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph.

Optionally, the operator mapping module comprises at least one operator code generator and an operator mapper;

the distributing the at least one operator to an operator mapping module based on the operator type of the at least one operator, and outputting an operator calling code corresponding to each operator specifically includes:

obtaining an operator type of at least one operator based on the operator mapper;

distributing the at least one operator to an operator code generator corresponding to each operator based on the operator type of the at least one operator;

and outputting an operator calling code corresponding to each operator based on the at least one operator and the code generation template class in the operator code generator.

Optionally, before the distributing the at least one operator to the operator mapping module based on the operator type of the at least one operator, the method further includes:

acquiring a first operator, wherein the first operator is an operator in the second calculation graph;

generating a code generation template class corresponding to the first operator based on the operator class of the first operator and a preset code generation template base class;

and generating and registering an operator code generator corresponding to the first operator based on the code generation template class corresponding to the first operator.

Optionally, the operator calling code includes at least one of an operator function, an input parameter, and a function calling code;

the input parameters include at least one of:

transferring first tensor data of the function in a pointer mode;

second tensor data acquired from the at least one operator based on an advanced AOT compilation method;

an integer vector, the integer vector being derived based on a vector set mechanism.

Optionally, the integer vector is obtained based on a vector group mechanism, and specifically includes:

inputting a second operator into an operator code generator corresponding to the second operator, and outputting an integer vector, wherein the second operator is an operator of which the input parameter in the second calculation graph comprises the integer vector;

sending the integer vector to a vector manager, generating a variable name of the integer vector based on the vector manager, and sending the variable name of the integer vector to an operator code generator corresponding to the second operator;

and outputting the variable name to an operator calling code corresponding to the second operator based on the operator code generator corresponding to the second operator.

Optionally, the memory management code is configured to:

acquiring a memory pool based on the memory configuration information of the at least one operator;

and acquiring the input and output memory requests of the at least one operator, and distributing the memory pool based on the input and output memory requests of the at least one operator.

Optionally, the memory management code is further configured to:

acquiring memory information corresponding to the input and output memory request of the at least one operator;

and acquiring a weight parameter corresponding to the at least one operator based on the memory information, and updating the weight parameter based on a weight parameter updating mechanism.

Optionally, the compiling the target back-end code based on the operator library specifically includes:

acquiring a preset subtask division method corresponding to an operator type of a third operator in an operator library, wherein the third operator is a supplementary operator;

performing subtask division processing on an operator calling code corresponding to the third operator based on the preset subtask division method to obtain at least one subtask;

calculating the thread number of each subtask based on the at least one subtask, and executing the at least one subtask in parallel based on the thread number of each subtask;

and the target back-end code comprises an operator calling code corresponding to the third operator.

In a second aspect, the present invention provides a special back-end code generation apparatus for supporting machine learning training, comprising:

the processing unit is used for acquiring a first calculation graph, analyzing and optimizing the first calculation graph to obtain a second calculation graph, and the first calculation graph is a neural network model formed by at least one operator according to a first sequence;

the operator calling code generating unit is used for acquiring at least one operator of the second calculation graph, distributing the at least one operator to the operator mapping module based on the operator type of the at least one operator, and outputting an operator calling code corresponding to each operator;

the memory management code generating unit is used for acquiring the memory configuration information of the at least one operator, inputting the memory configuration information of the at least one operator into the memory management module and outputting a memory management code;

and the compiling unit is used for generating a target back-end code based on the operator calling code corresponding to each operator and the memory management code, and compiling the target back-end code based on an operator library to obtain a deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph.

In a third aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the special back-end code generation method for supporting machine learning training according to the first aspect when executing the program.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for generating dedicated back-end code for supporting machine learning training as described in the first aspect.

In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method for generating dedicated back-end code for supporting machine learning training according to the first aspect.

According to the special back-end code generation method and device for supporting machine learning training, provided by the invention, the second calculation graph is obtained by analyzing and optimizing the first calculation graph, the calculation efficiency of the calculation graph is improved, the target back-end code is generated based on the operator calling code corresponding to each operator generated by the operator mapping module and the memory management code generated by the memory management module, and then the target back-end code is compiled based on the operator library to obtain the deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph, so that an end-to-end compiler supporting neural network training is realized, the calculation efficiency of a GPU applied to neural network training and reasoning is improved, and the application range of the GPU is expanded.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for generating special back-end code for supporting machine learning training according to the present invention;

FIG. 2 is a schematic diagram of the structure of target back-end code provided by the present invention;

FIG. 3 is a second flowchart of the method for generating special back-end code for supporting machine learning training according to the present invention;

FIG. 4 is a schematic flow chart illustrating a process of compiling an operator calling code corresponding to a complementary operator according to the present invention;

FIG. 5 is a schematic structural diagram of a special back-end code generating device for supporting machine learning training according to the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problem that the operation efficiency of the GPU applied to neural network training and inference needs to be optimized urgently, an embodiment of the present invention provides a method for generating a special back-end code supporting machine learning training, and fig. 1 is one of the flow diagrams of the method for generating a special back-end code supporting machine learning training provided by the embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

step 100, obtaining a first calculation graph, and analyzing and optimizing the first calculation graph to obtain a second calculation graph.

It should be noted that, in order to solve the problem that the operation efficiency of the GPU applied to the neural network training and inference needs to be optimized urgently, the embodiment of the present invention provides a special backend code generation method supporting machine learning training, which implements an end-to-end compiler supporting neural network training, and is applied to the GPU, thereby improving the operation efficiency of the GPU applied to the neural network training and inference, and extending the application range of the GPU.

Wherein an operator represents an operation and operation, such as a convolution calculation or a matrix product.

Optionally, the first computation graph is a neural network model composed of at least one operator in a first order.

It should be noted that the first computation graph is a graph-level structure describing the entire computation task, and the computation task includes executing the operators in the first order of the first computation graph. The first sequence is a topological sequence which is set randomly, and different calculation graphs can be obtained by adopting different topological sequences by at least one operator.

It can be understood that the operator provided by the embodiment of the invention can support various general mathematical operations and prediction and training of various neural network models.

Optionally, the parsing and optimization process includes constant folding or operator fusion.

In one embodiment, the first computation graph is subjected to constant folding or operator fusion to obtain a second computation graph.

Step 101, obtaining at least one operator of a second computation graph, distributing the at least one operator to an operator mapping module based on the operator type of the at least one operator, and outputting an operator calling code corresponding to each operator.

It should be noted that the method for generating a special backend code supporting machine learning training provided by the embodiment of the present invention is applied to a compiling system based on a TVM framework, the front end of the system is an open source neural network compiling framework TVM in the prior art, the backend of the system includes a code generating module for generating a target backend code, and the entire system runs on an electronic device corresponding to a GPU.

Optionally, the code generation module includes a computation graph traversal module, an operator mapping module, a memory management module, and a code output module.

The computation graph traversal module is used for traversing the second computation graph to obtain at least one operator of the second computation graph and obtain the memory configuration information of the at least one operator.

The operator mapping module is used for generating an operator calling code corresponding to each operator based on at least one operator of the second computation graph, wherein the operator mapping module comprises an operator mapper and a code generator.

The memory management module is used for generating memory management codes based on the memory configuration information of at least one operator of the second calculation graph.

And the code output module is used for generating and outputting a target back-end code based on the operator calling code and the memory management code corresponding to each operator.

It should be noted that, the operator mapping module provided in the embodiment Of the present invention classifies at least one operator based on the type Of the operator, simplifies the design and maintenance work Of the code generator, and adopts an Ahead Of Time (AOT) compiling method to traverse the computation graph before generating the operator calling code, and counts the input and output shapes and the memory information Of at least one operator in the computation graph, thereby optimizing the computation overhead Of the reshape operator, and meanwhile, directly forming a vector for some fixed array parameters as the input data Of the operator.

Optionally, the operator types include an Element-wise operator, a Reducing operator, and a Broadcasting operator.

Wherein the Element-wise operator is used to perform the same operation on each position in the input tensor data, such as a unary operator (not or negative), a binary operator (addition, subtraction, multiplication or division) or a ternary operator (if a; then b; else c).

And the Reducing operator is used for solving the elements at the specific positions of the input tensor data by adopting a preset function and outputting a result.

The Broadcasting operator is used to rearrange the input tensor data and store it in the appropriate location of the output tensor data, such as broadpast or slice.

The operator calling code is used for executing the calculation task of the operator.

In one embodiment, the second computation graph is traversed based on a computation graph traversal module to obtain at least one operator of the second computation graph, the at least one operator is distributed to an operator mapping module based on an operator type of the at least one operator, and an operator calling code corresponding to each operator is output, where the operator calling code is used to execute a computation task of the operator.

And 102, acquiring the memory configuration information of the at least one operator, inputting the memory configuration information of the at least one operator into a memory management module, and outputting a memory management code.

It can be understood that, because the input data or the output result of each operator is stored in the memory of the chip, and a large amount of data transmission overhead exists between the host and the chip in the process of executing the calculation task of the calculation graph, the embodiment of the present invention further provides a memory management module for producing a memory management code based on the memory configuration information of at least one operator, thereby implementing a set of complete and efficient memory management mechanism, and greatly reducing the overhead of memory allocation, release and data transmission based on the memory pool and the newly expanded operator.

Optionally, the memory configuration information is used to record memory information of input data and memory information of output data of at least one operator.

The input data of the operator is a variable node, and the output result of the operator is an intermediate result, for example, the weight parameter may be the input data of the operator, or the output result of the operator.

The memory management code is used for realizing memory allocation, memory copy or memory release of the input data memory information and the output result memory information of at least one operator.

In one embodiment, the second computation graph is input to a computation graph traversal module, the second computation graph is traversed based on the computation graph traversal module to obtain input data memory information and output data memory information of at least one operator, the input data memory information and the output data memory information are recorded in memory configuration information, the memory configuration information is input to a memory management module, and a memory management code is output.

103, generating a target back-end code based on the operator calling code and the memory management code corresponding to each operator, and compiling the target back-end code based on an operator library to obtain a deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph.

Optionally, the target back-end code includes a computation graph main function, at least one operator function, at least one global variable, a header file declaration, and a wrapping function, for example, the target back-end code may be implemented based on C + + language.

The main function of the computation graph is used for indicating the second computation graph to execute computation tasks according to a first sequence, and the main function of the computation graph comprises an input/output memory request of at least one operator and an execution flow of the second computation graph, and specifically comprises memory allocation, memory copy, input parameters, function call codes and memory release.

It can be understood that the memory management code including memory allocation, memory copy and memory release is generated by the memory management module; the operator calling code comprises an input parameter, a function calling code and an operator function, and is generated by the operator mapping module.

The memory copy includes executing the memory copy from the host to the chip and executing the memory copy from the chip to the host.

The global variable is mainly used in a main function of a computational graph, is used for controlling the operation logic of a weight parameter updating mechanism, and assists a memory management module to perform memory allocation, memory copy and memory release.

And the operator functions are used for being called by the main function of the computational graph in sequence based on the function interfaces of each type of operators and executing the computational tasks of the second computational graph through corresponding operation logic.

The header file and the wrapper function provide the underlying support, wherein the wrapper function can call the computation graph main function to provide the underlying support.

Optionally, the operator library comprises a compilation method of at least one operator of the second computation graph.

It can be understood that, in the embodiment of the present invention, the target back-end code is obtained by combining the operator calling code corresponding to each operator with the memory management code, and the deployment file is obtained after the operator library performs compiling and linking processing on the target back-end code, and the deployment file can be directly run on a chip and is used for executing the computation task of the first computation graph.

According to the special back-end code generation method for supporting machine learning training provided by the embodiment of the invention, the second calculation graph is obtained by analyzing and optimizing the first calculation graph, the calculation efficiency of the calculation graph is improved, the target back-end code is generated based on the operator calling code corresponding to each operator generated by the operator mapping module and the memory management code generated by the memory management module, and then the target back-end code is compiled based on the operator library to obtain the deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph, so that an end-to-end compiler supporting neural network training is realized, the calculation efficiency of the GPU in neural network training and reasoning is improved, and the application range of the GPU is expanded.

Fig. 2 is a schematic structural diagram of a target backend code provided by an embodiment of the present invention. As shown in fig. 2, the target backend code includes a header file declaration, a global variable, an operator function, a computation graph main function, and a wrapping function, where the computation graph main function includes memory allocation, memory copy, an input parameter, a function call code, and memory release, and the memory copy is mainly divided into a case where the host executes the memory copy to the electronic device with the GPU and a case where the electronic device with the GPU executes the memory copy to the host.

Based on the content of the above embodiment, the operator mapping module includes at least one operator code generator and an operator mapper;

It should be noted that the operator mapping module provided by the embodiment of the present invention includes at least one operator code generator and one operator mapper, where each operator code generator and each operator form a corresponding relationship.

It can be understood that, in order to solve the problem that the number of operator types of the computation graph is huge, which causes the reduction of the design efficiency of the code generator corresponding to each operator and the higher maintenance cost, the embodiment of the present invention provides an operator mapper, and the operator mapper classifies the operators according to the computation characteristics of the operators in the computation graph, such as an Element-wise operator, a Reducing operator, or a Broadcasting operator.

Optionally, the operator mapper is configured to obtain an operator type of at least one operator of the second computation graph, and distribute the at least one operator to the operator code generator corresponding to each operator based on the operator type.

It can be understood that the operator code generator provided by the embodiment of the present invention includes a code generation template class corresponding to an operator, where the code generation template class generates an operator invocation code corresponding to the operator according to a specific parameter of the operator.

According to the special back-end code generation method for supporting machine learning training, provided by the embodiment of the invention, at least one operator is distributed to the operator code generator corresponding to each operator based on the operator mapper, and the operator calling code corresponding to each operator is generated according to the code generation template class in the operator code generator, so that the design efficiency of the code generator can be improved, the maintenance workload of the code generator can be reduced, and the expansibility of the operators can be improved.

Based on the content of the above embodiment, before the distributing the at least one operator to the operator mapping module based on the operator type of the at least one operator, the method further includes:

It should be noted that, in the case that an operator is expanded, if the operator mapping module does not include a code generator corresponding to a newly expanded operator, an operator calling code corresponding to the newly expanded operator cannot be generated by the existing operator mapping module.

Wherein the first operator belongs to the newly expanded operator in the second computation graph.

The operator mapping module comprises a preset code generation template base class.

In one implementation mode, a first operator is obtained, a preset code generation template base class is expanded based on an operator type of the first operator, a code generation template class corresponding to the first operator is generated, an operator code generator is defined based on the code generation template class corresponding to the first operator through a general tool function or a macro definition method, and the operator code generator is registered in an operator mapping module, wherein the operator code generator corresponds to the first operator.

According to the special back-end code generation method for supporting machine learning training, the code generation template class corresponding to the first operator is generated according to the operator type of the first operator and the preset code generation template base class, and then registration of the operator code generator is achieved based on the code generation template class, so that extension of the code generator is achieved, expansibility of the operator is improved, and application range of a GPU is further expanded.

Fig. 3 is a second flowchart of a method for generating a special back-end code for supporting machine learning training according to an embodiment of the present invention. As shown in fig. 3, the specific process of the method includes:

the method comprises the steps of inputting a calculation graph into a calculation graph traversal module, outputting at least one operator node of the calculation graph, mapping at least one operator to an operator code generator corresponding to each operator by an operator mapper, and generating an operator calling code corresponding to each operator based on the operator code generators, wherein the operator calling code comprises an input parameter, an operator function and a function calling code.

And inputting the calculation graph into a calculation graph traversal module, outputting the memory information of variables and intermediate results, and generating a memory management code through a memory management module.

And inputting the operator calling code and the memory management code corresponding to each operator into a code output module, and outputting a target back-end code.

Based on the content of the above embodiment, the operator calling code includes at least one of an operator function, an input parameter, and a function calling code;

the input parameters include at least one of:

transferring first tensor data of the function in a pointer mode;

It can be understood that the operator mapping module generates an operator function corresponding to each operator type based on the operator type of at least one operator; for at least one operator of the second calculation graph, the operator mapping module generates a function calling code corresponding to each operator in the main function; for fixed array parameters that need additional definition, the operator mapping module generates input parameters in the master function.

Optionally, the second tensor data is a tensor size, typically represented in a single number.

It should be noted that, in the embodiment of the present invention, an AOT compiling method is adopted, and in the compiling stage, the second tensor data is acquired from at least one operator, and the second tensor is input into the function calling code.

Based on the content of the above embodiment, the integer vector is obtained based on a vector group mechanism, which specifically includes:

It should be noted that the input parameters include integer vectors, and the integer vectors are obtained in the compiling stage and output to the function call code in the running stage.

In an implementation manner, in the vector group mechanism method provided in the embodiment of the present invention, a vector manager is maintained in an encoding stage, and when an operator code generator corresponding to a second operator processes the second operator, if an input parameter of the second operator is an integer vector, the operator code generator generates the integer vector of the second operator, and the second operator sends the integer vector to the vector manager.

Further, the vector manager stores the information of the integer vector of the second operator, assigns a variable name of the integer vector to the integer vector, and returns the variable name to the operator code generator, and the operator code generator outputs an operator calling code corresponding to the second operator based on the variable name of the integer vector.

And reading the information of the integer vector recorded in the vector manager and outputting the integer vector in the process of outputting the operator calling code corresponding to the second operator by the code output module.

According to the special back-end code generation method for supporting machine learning training, the information of the integer vector is stored based on the vector manager, the variable name of the integer vector is generated, the operator code generator outputs the operator calling code based on the variable name of the integer vector, and the functions of obtaining the integer vector in the compiling stage and using the integer vector in the operating stage are achieved.

Based on the content of the foregoing embodiment, the memory management code is configured to:

It should be noted that, when the computation task of the computation graph is executed in the operation stage, the cost in the computation graph, that is, the memory required for executing the computation graph is generated, and the computation task of the computation graph is executed in a manner of calling the TVM framework and the chip operator library in the prior art, redundant data operation is generated, and the computation efficiency is reduced.

Optionally, the i/o memory request is a request for a memory size required for inputting data and outputting a result.

In the operation stage, the memory management code is used for acquiring total memory required by input data and output results of at least one operator based on memory configuration information of the at least one operator, applying for acquiring a memory pool corresponding to the total memory from the chip, and allocating the memory corresponding to the input and output memory requests of each operator from the memory pool under the condition of acquiring the input and output memory requests of the at least one operator.

According to the special back-end code generation method for supporting machine learning training provided by the embodiment of the invention, the memory management code generated by the memory management module is used for realizing memory allocation, so that redundant operation can be reduced, the operation efficiency of executing the calculation graph is improved, and the cost in the calculation graph is effectively reduced.

Based on the content of the foregoing embodiment, the memory management code is further configured to:

It should be noted that, in the running stage, executing the computation task of the computation graph may also generate inter-computation-graph overheads, that is, data overheads between the iteration execution computation graphs, so that in the embodiment of the present invention, the memory management code generated based on the memory management module can update the weight parameter in the chip memory based on the weight update mechanism, thereby reducing data communication overheads of copying the chip to the host, and data communication overheads of copying the updated weight parameter from the host to the chip, and improving the running efficiency of the computation graph.

It can be understood that, in the process of executing the neural network training (second computation graph), in order to improve the operation efficiency of the training, it is necessary to determine the number of iterations of the training and the iteration interval of the output result, therefore, a command operator is provided in the memory management code implemented in the embodiment of the present invention, and is used for controlling the execution of the operation logic of the neural network training, and executing the weight parameter updating mechanism, so as to implement the updating of the weight parameter, and effectively reduce the overhead of the whole training process.

The weight parameter updating mechanism runs on a chip, and the specific process is that an updating operator acquires memory information corresponding to a weight matrix to be updated, generates a tensor program corresponding to the weight matrix, updates the weight parameters based on the tensor program, and stores the updated weight parameters in a memory corresponding to the weight matrix.

According to the special back-end code generation method for supporting machine learning training provided by the embodiment of the invention, the memory management code generated by the memory management module is used for updating the weight parameter, so that memory copy is realized, the expenditure among the calculation graphs is reduced, and the operation efficiency of the calculation graphs is improved.

Based on the content of the foregoing embodiment, the compiling the target backend code based on the operator library specifically includes:

It should be noted that the operator library on the chip includes the existing compiling method of the operator, and the newly expanded operator (i.e., the complementary operator) cannot be compiled, so that the embodiment of the present invention provides the compiling method of the complementary operator, which realizes the expansion of the operator library and improves the efficiency of the compiling process of the operator.

In one embodiment, each operator type is preset with a subtask division method, for example, if the operator type is an Element-wise type, the preset subtask division method processes a continuous input data area for each thread; if the operator is a Rotate function, each thread processes at least one continuous matrix; if the operator is a group convolution function, each thread processes at least one consecutive group.

It can be understood that, with the compiling method for the third operator provided in the embodiment of the present invention, an entry kernell function and an external interface function are respectively implemented, subtask division is performed on an operator call code of the third operator in the external interface function to obtain at least one subtask corresponding to the third operator, and the at least one subtask is sent to the entry kernell function, a thread number of each subtask is calculated in the entry kernell function, and the at least one subtask corresponding to the third operator is executed in parallel according to the thread number of each subtask to obtain and integrate at least one operation result, and the integrated operation result is sent to the external interface function.

According to the special back-end code generation method for supporting machine learning training provided by the embodiment of the invention, the sub-task division processing is carried out on the operator calling code corresponding to the third operator based on the preset sub-task division method corresponding to the operator type to obtain at least one sub-task, and the at least one sub-task is executed in parallel based on the thread number of each sub-task, so that the compiling processing on the third operator is realized, the operator library is expanded, and the efficiency of the compiling process of the operator is improved.

Fig. 4 is a schematic flowchart of compiling an operator invocation code corresponding to a complementary operator according to an embodiment of the present invention. As shown in fig. 4, the method comprises the steps of:

step 400, the host computer obtains at least one supplementary operator and judges the operator type of the at least one supplementary operator;

step 401, the host performs subtask division processing on an operator calling code corresponding to each supplementary operator by each supplementary operator based on a preset subtask division method corresponding to the operator type of at least one supplementary operator to obtain at least one subtask corresponding to each supplementary operator;

step 402, the electronic device obtains at least one subtask corresponding to each complementary operator, and calculates a thread number of each subtask;

step 403, the electronic device executes at least one subtask corresponding to each complementary operator in parallel based on the thread number of each subtask, and obtains and integrates a calculation result;

step 404, the host computer obtains the integrated calculation result sent by the electronic device.

The following describes the special back-end code generation device for supporting machine learning training according to the present invention, and the special back-end code generation device for supporting machine learning training described below and the special back-end code generation method for supporting machine learning training described above may be referred to in correspondence with each other.

Fig. 5 is a schematic structural diagram of a dedicated back-end code generation apparatus for supporting machine learning training according to an embodiment of the present invention. As shown in fig. 5, the dedicated back-end code generation apparatus for supporting machine learning training includes: processing unit 500, operator calling code generating unit 510, memory management code generating unit 520, and compiling unit 530, wherein,

the processing unit 500 is configured to obtain a first computation graph, analyze and optimize the first computation graph to obtain a second computation graph, where the first computation graph is a neural network model formed by at least one operator according to a first order;

an operator calling code generating unit 510, configured to obtain at least one operator of the second computation graph, distribute the at least one operator to an operator mapping module based on an operator type of the at least one operator, and output an operator calling code corresponding to each operator;

a memory management code generating unit 520, configured to obtain memory configuration information of the at least one operator, input the memory configuration information of the at least one operator into a memory management module, and output a memory management code;

a compiling unit 530, configured to generate a target backend code based on the operator invocation code and the memory management code corresponding to each operator, and perform compiling processing on the target backend code based on an operator library to obtain a deployment file, where the deployment file is used to execute a computation task of the first computation graph.

The special back-end code generation device for supporting machine learning training provided by the embodiment of the invention obtains the second calculation graph by analyzing and optimizing the first calculation graph, improves the calculation efficiency of the calculation graph, generates the target back-end code based on the operator calling code corresponding to each operator generated by the operator mapping module and the memory management code generated by the memory management module, and compiles the target back-end code based on the operator library to obtain the deployment file, wherein the deployment file is used for executing the calculation task of the first calculation graph, so that an end-to-end compiler supporting neural network training is realized, the calculation efficiency of the GPU applied to the neural network training and the inference is improved, and the application range of the GPU is expanded.

The operator calling code comprises at least one item of an operator function, an input parameter and a function calling code;

the input parameters include at least one of:

transferring first tensor data of the function in a pointer mode;

The integer vector is obtained based on a vector group mechanism, and specifically comprises:

The memory management code is to:

The memory management code is further to:

The compiling process of the target back-end code based on the operator library specifically comprises the following steps:

The special back-end code generating device for supporting machine learning training provided by the invention can realize each process realized by the method embodiments of fig. 1 to fig. 4, and achieve the same technical effect, and is not repeated herein for avoiding repetition.

Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a specialized back-end code generation method to support machine learning training, the method comprising:

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method for generating dedicated back-end code for supporting machine learning training provided by the above methods, the method comprising:

In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the special back-end code generation method for supporting machine learning training provided by the above embodiments, the method including:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for generating specialized back-end code for supporting machine learning training, comprising:

2. The method of claim 1, wherein the operator mapping module comprises at least one operator code generator and an operator mapper;

3. The method of claim 2, wherein prior to distributing the at least one operator to an operator mapping module based on the operator type of the at least one operator, further comprising:

4. The special back-end code generation method for supporting machine learning training according to claim 2 or 3, wherein the operator call code comprises at least one of an operator function, an input parameter, and a function call code;

the input parameters include at least one of:

transferring first tensor data of the function in a pointer mode;

5. The method of claim 4, wherein the integer vector is obtained based on a vector group mechanism, and specifically comprises:

6. The method of claim 1, wherein the memory management code is configured to:

7. The method of claim 6, wherein the memory management code is further configured to:

8. The method for generating specialized back-end code supporting machine learning training according to claim 1, wherein the compiling the target back-end code based on an operator library specifically comprises:

9. A specialized back-end code generation apparatus that supports machine learning training, comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for dedicated back-end code generation for supporting machine learning training according to any one of claims 1 to 8 when executing the computer program.