CN117992061A - Program conversion method, program conversion device, computer device, and computer-readable storage medium - Google Patents

Program conversion method, program conversion device, computer device, and computer-readable storage medium Download PDF

Info

Publication number
CN117992061A
CN117992061A CN202410126367.1A CN202410126367A CN117992061A CN 117992061 A CN117992061 A CN 117992061A CN 202410126367 A CN202410126367 A CN 202410126367A CN 117992061 A CN117992061 A CN 117992061A
Authority
CN
China
Prior art keywords
operator
program
compiled
target
memory unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410126367.1A
Other languages
Chinese (zh)
Inventor
白杨
沈小勇
吕江波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Smartmore Technology Co Ltd
Original Assignee
Shenzhen Smartmore Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Smartmore Technology Co Ltd filed Critical Shenzhen Smartmore Technology Co Ltd
Priority to CN202410126367.1A priority Critical patent/CN117992061A/en
Publication of CN117992061A publication Critical patent/CN117992061A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The application relates to a program conversion method, a program conversion device, a computer device and a computer readable storage medium. The method comprises the following steps: acquiring an initial program; aiming at the current processing operator in each operator, determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator; based on each parallel memory unit corresponding to each operator, respectively arranging operation memory units corresponding to operation modules of each operator to obtain a program to be compiled; inputting the program to be compiled into the trained operator optimization model for processing, outputting a target operation module corresponding to the target operator which can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module to obtain the target program. By adopting the method, the efficiency of program compiling can be effectively improved.

Description

Program conversion method, program conversion device, computer device, and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a program conversion method, apparatus, computer device, and computer readable storage medium.
Background
With the development of artificial intelligence technology, deep learning has been widely applied in various industries and fields, and how to improve the compiling efficiency of corresponding high-performance tensor programs has important significance for further improving the application value of the deep learning technology.
In the prior art, the compiling optimization of the tensor program is realized on the basis of a static library or a dependency related heuristic algorithm provided by a manufacturer corresponding to the existing deep learning framework, so that the optimizing effect is poor, and the compiling efficiency of the program is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a program conversion method, apparatus, computer device, and computer-readable storage medium, which can effectively improve the efficiency of program compilation.
In a first aspect, the present application provides a program conversion method, including:
acquiring an initial program, wherein the initial program comprises data dependency relations among operators;
Aiming at the current processing operator in each operator, determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator;
Based on each parallel memory unit corresponding to each operator, respectively arranging operation memory units corresponding to operation modules of each operator to obtain a program to be compiled;
Inputting a program to be compiled into a trained operator optimization model for processing, outputting a target operation module corresponding to a target operator which can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module to obtain a target program; the trained operator optimization model is used for determining a target operation module with maximum operation performance corresponding to the target operator according to hardware resources of the program compiler.
In a second aspect, the present application provides a program conversion apparatus comprising:
The acquisition module is used for acquiring an initial program, wherein the initial program comprises operators and data dependency relations among the operators;
The processing module is used for determining each parallel memory unit corresponding to each operator according to the current processing operator in each operator and based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator;
The arrangement unit is used for respectively arranging the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators to obtain a program to be compiled;
The conversion module is used for inputting the program to be compiled into the trained operator optimization model for processing, outputting a target operation module corresponding to a target operator which can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module to obtain a target program; the trained operator optimization model is used for determining a target operation module with maximum operation performance corresponding to the target operator according to hardware resources of the program compiler.
In a third aspect, the present application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the program conversion method described above when the processor executes the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the program conversion method described above.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the program conversion method described above.
According to the program conversion method, the device, the computer equipment and the computer readable storage medium, the corresponding parallel memory units are sequentially distributed to each operator through combing the operators and the data dependency relationship among the operators in the initial program, the program to be compiled is obtained by arranging the operation storage units corresponding to the operation modules of each operator based on the parallel memory units, so that the parallel pipeline compiling of the operation modules in the operators during program compiling is realized, the compiling efficiency of the program is effectively improved, the program to be compiled is input into an operator optimization model, the reasonable splitting and conversion of each target operator of the current program to be compiled according to the hardware resources of the program compiler are realized, the target operation module corresponding to each target operator is obtained, the conversion of each target operator in the program is completed, the conversion of the initial program is completed, and the parallel pipeline compiling of each operation module in each operator is realized during compiling the converted program, so that the compiling efficiency of the program is effectively improved.
Drawings
Fig. 1 is a flow chart of a program conversion method according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for determining the size of a memory cell occupied by operator operation according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for determining parallel memory cells according to an embodiment of the present application;
fig. 4 is a flowchart of a method for generating a to-be-compiled program according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another method for generating a to-be-compiled program according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of a method for training to obtain an operator optimization model according to an embodiment of the present application;
FIG. 7 is a flowchart of a method for determining the operation performance of each reference operation module according to an embodiment of the present application;
FIG. 8 is a flowchart of a tensor optimization method according to an embodiment of the present application;
fig. 9 is a block diagram of a program conversion device according to an embodiment of the present application;
FIG. 10 is a diagram illustrating an internal architecture of a computer device according to an embodiment of the present application;
FIG. 11 is an internal block diagram of another computer device according to an embodiment of the present application;
Fig. 12 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The terminal can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things equipment and portable wearable equipment, and the internet of things equipment can be smart speakers, smart televisions, smart air conditioners, smart vehicle-mounted equipment and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like.
The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.
In some embodiments, as shown in fig. 1, a program conversion method is provided, where this embodiment is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S102, an initial program is acquired.
The initial program comprises data dependency relations among operators; the data dependency relationship may be a data flow relationship in the initial program corresponding data flow graph, for example, the data flow of the operator a points to the operator b, and the data flow of the operator b points to the operator c, which indicates that the operation of the operator c needs to rely on the output data of the operator b, and the operation of the operator b needs to rely on the output data of the operator a.
Step S104, aiming at the current processing operator in each operator, determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator.
Each parallel memory unit is a memory unit supporting the parallel operation of the operation module in the current processing operator.
It will be appreciated that a parallel memory unit is a memory unit in a computer that is configured to run in parallel in a dynamically pipelined manner based on current memory resources; when the memory resource is idle, the number of parallel memory units in parallel operation is determined according to the maximum bearing capacity of the current memory resource, so that the number of the parallel memory units can be dynamically adjusted in real time according to the current memory resource size, and in the parallel compiling process, if the parallel memory units are completely operated, the memory resource occupied by the parallel memory units is released from the memory, and the parallel memory units in queue ordering are activated and operated according to the current memory resource condition, so that the parallel pipeline operation in a dynamic adjustment mode is achieved.
Specifically, the computer equipment converts the initial program obtained in the steps into a corresponding data flow diagram, wherein the data flow diagram comprises operators of the initial program, data flow directions among the operators, data input area cache/storage units and data output area cache/storage units corresponding to the operators, and then the current memory resource size and the size of the memory unit occupied by the operation corresponding to the current processing operator are used as the basis for determining each parallel memory unit corresponding to the current processing operator.
And S106, based on each parallel memory unit corresponding to each operator, respectively arranging the operation memory units corresponding to the operation modules of each operator to obtain the program to be compiled.
Specifically, the computer equipment determines each parallel memory unit corresponding to each operator based on the steps, and then associates each parallel memory unit with the operation storage unit of each corresponding operation module in the operator, so that each operation module can be compiled in a parallel pipeline mode based on the parallel memory unit when being compiled, and further, the arrangement of the operation storage units corresponding to the operation modules of each operator is completed, and a program to be compiled is obtained.
Step S108, inputting the program to be compiled into the trained operator optimization model for processing, outputting a target operation module corresponding to the target operator which can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module to obtain the target program.
The trained operator optimization model is used for determining a target operation module with maximum operation performance corresponding to the target operator according to hardware resources of the program compiler.
Specifically, the computer device splits/converts each operator according to an operation module by using a trained operator optimization model according to an operator type corresponding to each operator in the to-be-compiled program, so as to obtain each split operation module, determines a group of target operation modules with optimal operation performance from each split operation module, so as to obtain a group of target operation modules corresponding to each operator, and finally replaces/converts each operator in the to-be-compiled program into a corresponding group of target operation modules, so as to obtain a corresponding target program.
In this embodiment, by combing the operators and the data dependency relationship among the operators in the initial program, corresponding parallel memory units are sequentially allocated to each operator, and the program to be compiled is obtained by arranging the operation memory units corresponding to the operation modules of each operator based on the parallel memory units, so that the parallel pipeline compiling of the operation modules in the operators during program compiling is realized, the compiling efficiency of the program is effectively improved, the program to be compiled is input into the operator optimization model, the reasonable splitting and conversion of each target operator of the current program to be compiled are realized according to the hardware resources of the program compiler, the target operation module corresponding to each target operator is obtained, the conversion of each target operator in the program is completed, the conversion of the initial program is further completed, and the parallel pipeline compiling is realized by each operation module in each operator during compiling the converted program, so that the compiling efficiency of the program is effectively improved.
In some embodiments, as shown in fig. 2, the program conversion method further includes:
step S202, a resource occupation list is obtained.
The resource occupation list is used for representing the relation between operators of all operator types and memory occupation resources.
It can be understood that the functions and the operation amounts completed by different types of operators are different, so that the sizes of the occupied memory resources are different; the resource occupation list can be determined based on the corresponding operation efficiency and performance of each type of operator when the operators are operated under the same computer hardware resource, or can be determined based on the corresponding algorithm complexity of each operator, and the specific mode is not limited.
Step S204, determining the size of the operation occupied memory unit corresponding to the current processing operator based on the operator type and the resource occupancy list corresponding to the current processing operator.
Specifically, the computer device queries in the resource occupation list according to the operator type corresponding to the current processing operator, and further determines the size of the operation occupation memory unit corresponding to the current processing operator.
In this embodiment, the size of the operation occupied memory unit corresponding to the current processing operator is determined by acquiring the resource occupancy list and then matching the operator type corresponding to the current processing operator with the resource occupancy list, so that the size of the corresponding operation occupied memory unit can be quickly determined according to the operator type, and the program conversion efficiency is effectively improved.
In some embodiments, as shown in fig. 3, determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator includes:
step S302, determining the number of corresponding parallel memory units and the size of a single parallel memory unit based on the size of the memory unit occupied by the operation corresponding to the current processing operator.
Specifically, the computer device determines the size of the running occupied memory unit corresponding to the current processing operator according to the foregoing steps, and divides/allocates the number of parallel memory units and the size of a single parallel memory unit corresponding to the current processing operator, which specifically may be comparing the running occupied memory unit with each preset range, where each preset range has the corresponding number and size of parallel memory units, so that the corresponding number of parallel memory units and the size of a single parallel memory unit may be determined according to the range where the running occupied memory unit of the current processing operator is located.
Step S304, dividing the current memory resource according to the number of the parallel memory units and the size of the single parallel memory unit to obtain each parallel memory unit corresponding to the current processing operator.
In this embodiment, by analyzing the memory occupation feature/attribute corresponding to the current processing operator, the corresponding parallel memory units are configured for the current processing operator, so as to achieve the effect of reasonably dividing/configuring the current memory resources, thereby implementing reasonable division of the corresponding parallel memory units according to the size of the memory units occupied by each operator operation in the initial program, and thus rapidly completing arrangement of the operation memory units corresponding to the operation modules of each operator of the initial program, and improving the efficiency of generating the program to be compiled.
In some embodiments, as shown in fig. 4, based on each parallel memory unit corresponding to each operator, operation memory units corresponding to operation modules of each operator are respectively arranged to obtain a program to be compiled, including:
step S402, corresponding association operation modules are determined based on association relations among the operation modules of each operator.
The associated operation modules are operation modules corresponding to the data dependency relations among the operation modules.
It can be understood that there may be independent parallel operation modules among the operation modules in each operator, or there may be a logical data dependency relationship, and each operation module having a data dependency relationship is determined as a corresponding associated operation module, for example, input data of the operation module c is output data of the operation module b, and input data of the operation module b is output data of the operation module a, so that the association relationship among the operation modules a, b, and c is a data dependency relationship, that is, in execution of the operation modules, there is a strict sequential execution sequence.
Step S404, binding operation storage units corresponding to each operator association operation module respectively to obtain association storage units corresponding to each operator.
Wherein, the data flow direction corresponding to the associated storage unit is fixed.
Specifically, the computer device binds the operation storage units corresponding to the associated operation modules in each operator, so that when the computer device compiles the converted program, each operation module in the associated operation modules is operated in turn according to the binding relation, and the data dependency condition among each module in the associated operation modules is ensured, so that the associated storage units corresponding to the associated operation modules in each operator are connected.
Step S406, based on each parallel memory unit and associated memory unit corresponding to each operator, the operation memory units corresponding to the operation modules of each operator are arranged to obtain the program to be compiled.
Specifically, the computer equipment arranges the operation storage units corresponding to each operation module based on each parallel memory unit and the associated storage unit corresponding to each operator, so that when the corresponding converted compilable program compiles the operation modules in each operator in parallel pipeline in the compiling process, the sequential compiling mechanism corresponding to the associated operation module can be determined based on the associated storage unit corresponding to each operator, and therefore the sequential compiling mechanism corresponding to the associated operation module is realized under the parallel pipeline compiling mechanism of the operation module corresponding to each operator, the associated operation module in the operation module can be ensured to be compiled in series, and the compiling reliability of the subsequent program is effectively ensured.
In this embodiment, the storage units of the related operation modules in each operator are bound by the association relation between the operation modules of each operator, and the reasonable configuration of the memory resources of the related operation modules is realized according to the bound association storage units, so as to obtain the program to be compiled, thereby reasonably distributing the operation storage units of the associated operation modules and the non-associated operation modules corresponding to each operator, effectively ensuring the compiling efficiency of the program after conversion, and improving the compiling reliability of the program.
In some embodiments, as shown in fig. 5, based on each parallel memory unit corresponding to each operator, operation memory units corresponding to operation modules of each operator are respectively arranged to obtain a program to be compiled, including:
step S502, for each operator, matching the data variable corresponding to each operation module of the operator with the corresponding parallel memory unit to obtain the memory index corresponding to each data variable;
the memory index is used for linking each data variable to a corresponding parallel memory unit when the program is compiled.
Specifically, for each operator, the computer device associates the data variable corresponding to each operation module of the corresponding operator with the corresponding parallel memory unit, and establishes the memory index between each data variable and the corresponding parallel memory unit, so that when the converted program is compiled, the converted program can be connected to each corresponding data variable based on the memory index, and further parallel pipeline compilation of each operation module is realized.
Step S504, based on the memory index corresponding to each operator, the operation storage units corresponding to the operation modules of each operator are arranged to obtain the program to be compiled.
In this embodiment, an index relationship is established between the data variables of the operation modules of the operator and the parallel memory units, so that effective arrangement of the corresponding operation memory units is completed according to the index relationship, and a program to be compiled is obtained, so that reasonable arrangement of the memory resources corresponding to the variables in each operation module is realized, and the compiling efficiency of the generated target program is effectively improved.
In some embodiments, as shown in fig. 6, the method for converting a program to be compiled further includes, before inputting the program to be compiled into the trained operator optimization model for processing, outputting a target operation module corresponding to a target operator that can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module, the method further includes:
Step S602, a model training sample is obtained.
The model training samples comprise input operators of various types and corresponding conversion operators, and the conversion operators comprise various operation modules corresponding to the input operators; wherein, each operation module corresponds to a performance parameter, and the performance parameter is used for representing the operation performance of the corresponding operation module; the computational performance may include computational speed, algorithm complexity, computational accuracy, and the like.
Step S604, inputting the model training sample into the current operator optimizing model for processing, outputting the current conversion operator, and calculating the difference value between the operation performance and the target performance corresponding to the current conversion operator.
The computing performance corresponding to the current conversion operator is obtained by fusion based on the computing performance of each computing module corresponding to the current conversion operator, and the fusion mode comprises weighted superposition.
Specifically, the computer equipment inputs the model training sample obtained in the step to a current operator optimization model, outputs a corresponding current conversion operator under the current iteration times, superimposes the operation performances of each operation module corresponding to the current conversion operator to obtain the operation performances corresponding to the current conversion operator, and then performs difference calculation on the operation performances corresponding to the current conversion operator and the corresponding target performances to obtain the corresponding difference value.
And step S606, if the difference value is smaller than a preset threshold value, stopping model training to obtain a trained operator optimization model.
The preset threshold may be flexibly set by a technician according to a debugging result, or may be an optimized value obtained by iteration according to a heuristic algorithm (such as a particle swarm algorithm, a genetic algorithm, etc.).
And step 608, if the difference value is greater than the preset threshold value, adjusting the corresponding parameters of the current operator optimization model based on the difference value, and returning to the step of inputting the model training sample into the current operator optimization model for processing and outputting the current conversion operator.
In this embodiment, an operator optimization model is obtained by obtaining a model training sample and training based on the model training sample data, so that the operator optimization model can perform conversion and splitting of an operation module on each target operator in an input program to be compiled, and a target operation module with higher operation performance corresponding to each target operator is obtained, so that program conversion of each target operator is realized, program conversion of the program to be compiled is further completed rapidly, and conversion efficiency of the program is improved.
In some embodiments, as shown in fig. 7, the program conversion method further includes:
In step S702, each reference operation module is determined from each operation module corresponding to each type of operator.
Each reference operation module is a group of operation modules capable of completing all functions of the corresponding operator in each operation module.
Step S704, each reference operation module is operated separately, and the reference operation performance corresponding to each reference operation module is determined based on the operation result.
Step S706, based on the feature vectors and the reference operation performance corresponding to each reference operation module, the performance sample data is constructed.
The feature vector is used for representing attribute features of the corresponding reference operation module, and the feature vector can be generated by a common semantic mining model (such as a BERT model and the like) or a neural network model.
Step S708, training the initial evaluation model based on the performance sample data to obtain a target evaluation model.
The target evaluation model can be used for evaluating the operation performance of each operation module of the corresponding operator.
Step S710, obtaining target feature vectors corresponding to the operation modules to be evaluated, inputting the target feature vectors into the target evaluation model for processing, and outputting the operation performance of the operation modules to be evaluated.
The operation modules to be evaluated are other operation modules except for each reference operation module.
In this embodiment, a part of reference operation modules are randomly selected from all operation modules of corresponding operators, and each reference operation module is operated to realize the corresponding operation performance, then a target evaluation model is obtained by training based on training sample data established by the reference operation module and the operation performance thereof, and the operation performance of a mass operation module corresponding to each operator is evaluated based on the target evaluation model, so that the corresponding operation performance can be rapidly determined without operating each operation module, and the efficiency of program conversion is improved.
The application also provides an application scene, which is applied to the program conversion method, and the method is applied to compiling application scenes of deep learning corresponding high-performance tensor programs. Specifically, the application of the program conversion method in the application scene is as follows:
Inputting a corresponding high-performance tensor program by deep learning, generating a corresponding data flow diagram by computer equipment based on the tensor program, wherein the data flow diagram comprises operators and data flow/data dependency relations among the operators, each operator in the data flow diagram is correspondingly marked with a group of input cache units and a group of output cache units, determining the corresponding operation occupied memory size according to the operator type corresponding to each operator, dividing the current available memory resource according to the current memory resource size of the computer equipment and the operation occupied memory size corresponding to each operator, configuring a group of parallel memory units corresponding to each operator, judging/detecting the available state of each parallel memory unit in real time by the current memory resource size in the compiling process of the program, and supporting parallel pipeline type program compiling based on each current available parallel memory unit; the parallel memory unit is used for supporting the operation module in the corresponding operator to perform parallel operation/compiling.
Based on the set of parallel memory units corresponding to each operator determined in the steps, respectively establishing indexes/arrangements for the operation memory units of the operation module corresponding to each operator to obtain a program to be compiled, inputting the program to be compiled into an operator optimization model, outputting target operation modules corresponding to each operator in the program to be compiled, and replacing the original operators in the program to be compiled with the target operation modules corresponding to each operator to obtain a corresponding target program, as shown in fig. 8.
The operator optimization model is obtained by training sample data formed by input operators of various types and corresponding conversion operators, the conversion operators are consistent with the algorithm functions completed by the corresponding input operators, the operation performance and the algorithm complexity of the conversion operators are different, the operation performance of the conversion operators is higher than that of the input operators, and the conversion operators comprise operation modules corresponding to the input operators.
When training sample data corresponding to an operator optimization model is constructed, a part of reference operation modules are randomly selected from all operation modules of corresponding operators, all the reference operation modules are operated to measure the corresponding operation performance, then a target evaluation model is obtained through training based on the training sample data established by the reference operation modules and the operation performance, and the operation performance of massive operation modules corresponding to each operator is evaluated based on the target evaluation model, so that the operation performance of all the operation modules corresponding to each operator is rapidly determined.
In this embodiment, by combing each operator in the initial program and the data dependency relationship between operators, corresponding parallel memory units are sequentially allocated to each operator, and the program to be compiled is obtained by arranging operation memory units corresponding to operation modules of each operator based on the parallel memory units, so as to implement parallel pipeline compilation of operation modules in the operators during program compilation, thereby effectively improving the compilation efficiency of the program, inputting the program to be compiled into an operator optimization model, implementing reasonable splitting and conversion of each target operator of the current program to be compiled according to hardware resources of the program compiler, obtaining a target operation module corresponding to each target operator, thereby completing conversion of each target operator in the program, further completing conversion of the initial program, and enabling each operation module in each operator to implement parallel pipeline compilation during compilation of the converted program, thereby effectively improving the compilation efficiency of the program.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a program conversion device. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the program conversion device provided below may refer to the limitation of the program conversion method hereinabove, and will not be repeated herein.
As shown in fig. 9, there is provided a program conversion apparatus including:
an obtaining module 902, configured to obtain an initial program, where the initial program includes each operator and a data dependency relationship between each operator;
A processing module 904, configured to determine, for a current processing operator in each operator, each parallel memory unit corresponding to the current processing operator based on a current memory resource and a size of a memory unit occupied by an operation corresponding to the current processing operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator;
An arrangement unit 906, configured to respectively arrange the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators, so as to obtain a program to be compiled;
The conversion module 908 is configured to input a program to be compiled into a trained operator optimization model for processing, output a target operation module corresponding to a target operator that can be compiled in parallel in the program to be compiled, and convert the program to be compiled based on the target operation module, so as to obtain a target program; the trained operator optimization model is used for determining a target operation module with maximum operation performance corresponding to the target operator according to hardware resources of the program compiler.
In some embodiments, the obtaining module 902 is further configured to:
Acquiring a resource occupation list, wherein the resource occupation list is used for representing the relation between operators of all operator types and memory occupation resources; and determining the size of the operation occupied memory unit corresponding to the current processing operator based on the operator type and the resource occupancy list corresponding to the current processing operator.
In some embodiments, in determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator, the processing module 904 is specifically configured to:
determining the number of corresponding parallel memory units and the size of a single parallel memory unit based on the size of the memory units occupied by the operation corresponding to the current processing operator;
And dividing the current memory resource according to the number of the parallel memory units and the size of the single parallel memory unit to obtain each parallel memory unit corresponding to the current processing operator.
In some embodiments, in the aspect of obtaining the program to be compiled by respectively arranging the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators, the arrangement unit 906 is specifically configured to:
Determining corresponding association operation modules based on association relations among the operation modules of each operator respectively; the associated operation module is an operation module corresponding to the data dependency relationship in each operation module;
Binding operation storage units corresponding to each operator association operation module respectively to obtain association storage units corresponding to each operator; the data flow direction corresponding to the associated storage unit is fixed;
And arranging the operation storage units corresponding to the operation modules of each operator based on the parallel memory units and the associated storage units corresponding to each operator respectively to obtain a program to be compiled.
In some embodiments, in the aspect of obtaining the program to be compiled by respectively arranging the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators, the arrangement unit 906 is specifically configured to:
For each operator, matching the data variable corresponding to each operation module of the operator with the corresponding parallel memory unit to obtain the memory index corresponding to each data variable;
and arranging the operation storage units corresponding to the operation modules of each operator based on the memory index corresponding to each operator to obtain the program to be compiled.
In some embodiments, the conversion module 908 is further to:
Obtaining a model training sample, wherein the model training sample comprises input operators of various types and corresponding conversion operators, and the conversion operators comprise various operation modules corresponding to the input operators; wherein, each operation module corresponds to a performance parameter, and the performance parameter is used for representing the operation performance of the corresponding operation module;
Inputting the model training sample into a current operator optimizing model for processing, outputting a current conversion operator, and calculating a difference value between the operation performance and the target performance corresponding to the current conversion operator; the operation performance corresponding to the current conversion operator is obtained by fusion based on the operation performance of each operation module corresponding to the current conversion operator;
If the difference value is smaller than the preset threshold value, stopping model training to obtain a trained operator optimization model; or alternatively
And if the difference value is greater than the preset threshold value, adjusting corresponding parameters of the current operator optimization model based on the difference value, and returning to the step of inputting the model training sample into the current operator optimization model for processing and outputting the current conversion operator.
In some embodiments, the conversion module 908 is further to:
Determining each reference operation module from each operation module corresponding to each type of operator; respectively operating each reference operation module, and determining the corresponding reference operation performance of each reference operation module based on the operation result; based on the feature vectors and the reference operation performance corresponding to each reference operation module, constructing and obtaining performance sample data; the feature vector is used for representing attribute features of the corresponding reference operation module; training the initial evaluation model based on the performance sample data to obtain a target evaluation model; obtaining target feature vectors corresponding to the operation modules to be evaluated, inputting the target feature vectors into a target evaluation model for processing, and outputting the operation performance of the corresponding operation modules to be evaluated; the operation module to be evaluated is other operation modules except for each reference operation module.
Each module in the above-described program conversion apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing the corresponding conversion relation data of each operator and the operation module. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the steps of the program conversion method described above.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, an Input/Output interface (I/O), a communication interface, a display unit, and an Input device. Wherein the processor, the memory and the input/output interface are connected via a system bus, and the communication interface, the input device and the display unit are connected via the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps of the program conversion method described above. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 10 and 11 are merely block diagrams of portions of structures associated with aspects of the present application and are not intended to limit the computer device to which aspects of the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In some embodiments, there is also provided a computer device comprising a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, a computer readable storage medium 1200 is provided, on which a computer program 1202 is stored, which computer program 1202, when executed by a processor, implements the steps of the method embodiments described above, and the internal structure diagram of which may be shown in fig. 12.
In some embodiments, a computer program product is provided, which comprises a computer program, which when executed by a processor implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A program conversion method, characterized by comprising:
acquiring an initial program, wherein the initial program comprises operators and data dependency relations among the operators;
aiming at the current processing operator in each operator, determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the size of the memory unit occupied by the operation corresponding to the current processing operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator;
Based on each parallel memory unit corresponding to each operator, respectively arranging operation memory units corresponding to operation modules of each operator to obtain a program to be compiled;
Inputting the to-be-compiled program into a trained operator optimization model for processing, outputting a target operation module corresponding to a target operator which can be compiled in parallel in the to-be-compiled program, and converting the to-be-compiled program based on the target operation module to obtain a target program; the trained operator optimization model is used for determining a target operation module with the maximum operation performance corresponding to the target operator according to hardware resources of a program compiler.
2. The method according to claim 1, wherein the method further comprises:
acquiring a resource occupation list, wherein the resource occupation list is used for representing the relation between operators of all operator types and memory occupation resources;
and determining the size of the operation occupied memory unit corresponding to the current processing operator based on the operator type corresponding to the current processing operator and the resource occupancy list.
3. The method of claim 1, wherein determining each parallel memory unit corresponding to the current processing operator based on the current memory resource and the operation occupied memory unit size corresponding to the current processing operator comprises:
Determining the number of corresponding parallel memory units and the size of a single parallel memory unit based on the size of the memory unit occupied by the operation corresponding to the current processing operator;
And dividing the current memory resource according to the number of the parallel memory units and the size of the single parallel memory unit to obtain each parallel memory unit corresponding to the current processing operator.
4. The method according to claim 1, wherein the arranging the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators to obtain the program to be compiled includes:
Determining corresponding association operation modules based on association relations among the operation modules of each operator respectively; the associated operation module is an operation module with data dependency relationship in the corresponding operation modules;
Binding operation storage units corresponding to each operator association operation module respectively to obtain association storage units corresponding to each operator; the data flow direction corresponding to the associated storage unit is fixed;
And arranging the operation storage units corresponding to the operation modules of each operator based on the parallel memory units and the associated storage units corresponding to each operator respectively to obtain a program to be compiled.
5. A method according to any one of claims 1 to 3, wherein the arranging, based on each parallel memory unit corresponding to each operator, the operation storage units corresponding to the operation modules of each operator respectively, to obtain the program to be compiled includes:
For each operator, matching the data variable corresponding to each operation module of the operator with the corresponding parallel memory unit to obtain the memory index corresponding to each data variable;
And arranging the operation storage units corresponding to the operation modules of the operators based on the memory index corresponding to each operator to obtain a program to be compiled.
6. The method according to claim 1, wherein the inputting the program to be compiled into the trained operator optimization model is performed, outputting a target operation module corresponding to a target operator that can be compiled in parallel in the program to be compiled, and converting the program to be compiled based on the target operation module, and before obtaining the target program, the method further includes:
obtaining a model training sample, wherein the model training sample comprises input operators of various types and corresponding conversion operators, and the conversion operators comprise operation modules corresponding to the input operators; wherein, each operation module corresponds to a performance parameter, and the performance parameter is used for representing the operation performance of the corresponding operation module;
inputting the model training sample into a current operator optimization model for processing, outputting a current conversion operator, and calculating a difference value between the operation performance and the target performance corresponding to the current conversion operator; the operation performance corresponding to the current conversion operator is obtained by fusion based on the operation performance of each operation module corresponding to the current conversion operator;
if the difference value is smaller than a preset threshold value, stopping model training to obtain the trained operator optimization model; or alternatively
And if the difference value is larger than a preset threshold value, adjusting corresponding parameters of the current operator optimization model based on the difference value, and returning to the step of inputting the model training sample into the current operator optimization model for processing and outputting the current conversion operator.
7. The method of claim 6, wherein the method further comprises:
Determining each reference operation module from each operation module corresponding to each type of operator;
respectively operating each reference operation module, and determining the corresponding reference operation performance of each reference operation module based on an operation result;
Based on the feature vectors and the reference operation performance corresponding to the reference operation modules, constructing and obtaining performance sample data; the feature vector is used for representing attribute features of the corresponding reference operation module;
training an initial evaluation model based on the performance sample data to obtain a target evaluation model;
Obtaining target feature vectors corresponding to all the operation modules to be evaluated, inputting the target feature vectors into the target evaluation model for processing, and outputting the operation performance of the corresponding operation modules to be evaluated; the operation module to be evaluated is other operation modules except the reference operation modules.
8. A program conversion apparatus, comprising:
The acquisition module is used for acquiring an initial program, wherein the initial program comprises operators and data dependency relations among the operators;
The processing module is used for determining each parallel memory unit corresponding to each operator according to the current memory resource and the size of the memory unit occupied by the operation corresponding to the current operator aiming at the current operator in each operator; each parallel memory unit is a memory unit supporting the parallel operation of an operation module in the current processing operator;
The arrangement unit is used for respectively arranging the operation storage units corresponding to the operation modules of the operators based on the parallel memory units corresponding to the operators to obtain a program to be compiled;
The conversion module is used for inputting the to-be-compiled program into the trained operator optimization model for processing, outputting a target operation module corresponding to a target operator which can be compiled in parallel in the to-be-compiled program, and converting the to-be-compiled program based on the target operation module to obtain a target program; the trained operator optimization model is used for determining a target operation module with the maximum operation performance corresponding to the target operator according to hardware resources of a program compiler.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202410126367.1A 2024-01-29 2024-01-29 Program conversion method, program conversion device, computer device, and computer-readable storage medium Pending CN117992061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410126367.1A CN117992061A (en) 2024-01-29 2024-01-29 Program conversion method, program conversion device, computer device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410126367.1A CN117992061A (en) 2024-01-29 2024-01-29 Program conversion method, program conversion device, computer device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN117992061A true CN117992061A (en) 2024-05-07

Family

ID=90900876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410126367.1A Pending CN117992061A (en) 2024-01-29 2024-01-29 Program conversion method, program conversion device, computer device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN117992061A (en)

Similar Documents

Publication Publication Date Title
CN111340237B (en) Data processing and model running method, device and computer equipment
US20160034547A1 (en) Systems and methods for an sql-driven distributed operating system
CN110908667A (en) Method and device for joint compilation of neural network and electronic equipment
US20210295158A1 (en) End-to-end optimization
CN116126341A (en) Model compiling method, device, computer equipment and computer readable storage medium
CN116010226A (en) Software system reliability simulation evaluation method and device and computer equipment
CN114549849A (en) Image recognition method and device, computer equipment and storage medium
CN115859016B (en) Processor-based operation method, processor-based operation device, computer equipment and storage medium
CN117992061A (en) Program conversion method, program conversion device, computer device, and computer-readable storage medium
CN115130002A (en) Recommendation request processing method and device, computer equipment and storage medium
CN103678545A (en) Network resource clustering method and device
CN114356512A (en) Data processing method, data processing equipment and computer readable storage medium
Bicer et al. Improving I/O throughput of scientific applications using transparent parallel compression
CN116755714B (en) Method, device, equipment and storage medium for operating deep neural network model
CN114661301B (en) Graphics processing unit compiling method, device, compiling acceleration library and storage medium
CN117908825A (en) Chip data sequential processing method, device, computer equipment and storage medium
CN116483645A (en) Device virtual debugging method, device, storage medium and program product
CN118069044A (en) Chip data storage method, device, equipment, medium and product
CN117608684A (en) Reconfigurable architecture generation method, reconfigurable architecture generation device, reconfigurable architecture generation equipment, reconfigurable architecture generation medium and reconfigurable architecture generation product
CN116881116A (en) Interface test method, apparatus, computer device, storage medium, and program product
CN117435910A (en) Abnormal data detection method and device and computer equipment
KR20230155943A (en) Method for embedding graph and system thereof
CN116450093A (en) Application function sharing method, device, computer equipment and storage medium
KR20240063137A (en) Hardware accelerator-optimized group convolution-based neural network model
CN116205281A (en) Network model quantization method, device, computer equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination