CN116663618B - Operator optimization method and device, storage medium and electronic equipment - Google Patents

Operator optimization method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116663618B
CN116663618B CN202310941263.1A CN202310941263A CN116663618B CN 116663618 B CN116663618 B CN 116663618B CN 202310941263 A CN202310941263 A CN 202310941263A CN 116663618 B CN116663618 B CN 116663618B
Authority
CN
China
Prior art keywords
solution
operator
determining
candidate
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310941263.1A
Other languages
Chinese (zh)
Other versions
CN116663618A (en
Inventor
王鹏程
吕波
孙红江
陈晨
李勇
胡陈枢
曾令仿
陈�光
程稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310941263.1A priority Critical patent/CN116663618B/en
Publication of CN116663618A publication Critical patent/CN116663618A/en
Application granted granted Critical
Publication of CN116663618B publication Critical patent/CN116663618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The specification discloses an operator optimization method, an operator optimization device, a storage medium and electronic equipment. In the operator optimization method provided by the specification, a target neural network model is obtained, and a calculation map of the target neural network model is determined; for each operator in the computational graph, determining a search space containing all feasible solutions of the operator; selecting a plurality of feasible solutions as candidate solutions in the search space, determining an evaluation value of each candidate solution, and taking the highest evaluation value as a solution to be determined; determining the running time of the target hardware for running the solution to be determined, and increasing the iteration times; when the running time is smaller than the current optimal time or the current optimal time does not exist, determining the running time as the current optimal time, and determining the solution to be determined as the current optimal solution; when the iteration times are smaller than the appointed times, selecting appointed number of unselected candidate solutions in the search space of the operator again; and when the iteration times are not less than the designated times, determining the current optimal solution as the optimal solution of the operator.

Description

Operator optimization method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an operator optimization method, an operator optimization device, a storage medium, and an electronic device.
Background
In recent years, with the continuous development of deep learning technology, as the parameter quantity of the neural network model is larger and larger, the structure is more and more complex, and the requirement on model deployment is also higher and higher. However, the construction and optimization of the high-performance computing library requires huge engineering amount, and the structure of the neural network is very different day by day, which definitely causes great engineering pressure.
In order to solve the above problems, a scheme of mapping a computational graph corresponding to a neural network model to different hardware and generating hardware executable code by a compiling technique achieves a remarkable effect. In this process, optimizing operators in the computational graph to promote the adaptation of the neural network model to the hardware is a crucial step.
In the existing method, in the process of realizing operator optimization, all implementation methods corresponding to different feasible solutions of operators are tested in hardware, and the feasible solution with the minimum running time in a test result is used as the optimal solution finally adopted. However, the time taken for operator optimization in the above scheme is often very long and the cost is high.
Therefore, how to improve the efficiency in performing operator optimization is a highly desirable problem.
Disclosure of Invention
The present specification provides a method, apparatus, storage medium, and electronic device to at least partially solve the above-mentioned problems of the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides an operator optimization method, which comprises the following steps:
acquiring a target neural network model, and determining a calculation map of the target neural network model;
determining a search space of each operator in the computational graph for the operator, wherein the search space contains all feasible solutions of the operator;
selecting a specified number of feasible solutions in a search space of the operator as candidate solutions, determining an evaluation value of each candidate solution, and taking the candidate solution with the highest evaluation value as a solution to be determined;
determining the running time of the target hardware for running the solution to be determined, and increasing the iteration times;
when the running time is smaller than the current optimal time or the current optimal time does not exist, determining the running time as the current optimal time, and determining the solution to be determined as the current optimal solution;
when the iteration times are smaller than the appointed times, selecting appointed number of unselected candidate solutions in the search space of the operator again;
And when the iteration times are not smaller than the designated times, determining the current optimal solution as the optimal solution of the operator.
Optionally, selecting a specified number of candidate solutions in the search space of the operator specifically includes:
selecting a specified solution from feasible solutions of the operator as a candidate solution;
and determining a neighborhood solution of the designated solution in the feasible solutions of the operator, and re-determining the neighborhood solution as the designated solution until the determined number of candidate solutions reaches the designated number.
Optionally, determining the evaluation value of each candidate solution specifically includes:
inputting each candidate solution into a pre-trained evaluation model aiming at each candidate solution to obtain an evaluation result of the candidate solution output by the evaluation model;
and determining the evaluation value of the candidate solution according to the evaluation result.
Optionally, the evaluation model comprises a number of prediction models;
inputting the candidate solution into a pre-trained evaluation model to obtain an evaluation result of the candidate solution output by the evaluation model, wherein the evaluation result specifically comprises:
inputting the candidate solution into each prediction model in a pre-trained evaluation model to obtain the prediction running time output by each prediction model;
and determining the evaluation result of the candidate solution according to each prediction running time.
Optionally, the evaluation results include an average run time and standard deviation;
determining an evaluation result of the candidate solution according to each prediction running time, wherein the evaluation result specifically comprises:
the average run time and standard deviation of the candidate solution are determined based on each predicted run time.
Optionally, determining the evaluation value of the candidate solution according to the evaluation result specifically includes:
and determining the evaluation value of the candidate solution according to the evaluation result and the current optimal time.
Optionally, pre-training the evaluation model specifically includes:
acquiring a sample operator;
determining a sample feasible solution of the sample operator and a true run time of the sample feasible solution in target hardware;
inputting the sample feasible solution into each prediction model to obtain the prediction running time to be optimized output by the prediction model, wherein the sample operators to which the sample feasible solution input into each prediction model belongs are different;
and training the prediction model by taking the minimum difference between the sample to-be-optimized prediction running time and the actual running time of the sample feasible solution input into the prediction model in target hardware as an optimization target.
An operator optimizing apparatus provided in the present specification, the apparatus comprising:
The acquisition module is used for acquiring a target neural network model and determining a calculation map of the target neural network model;
a determining module, configured to determine, for each operator in the computation graph, a search space of the operator, where the search space includes all feasible solutions of the operator;
the selection module is used for selecting a specified number of feasible solutions in the search space of the operator to serve as candidate solutions, determining an evaluation value of each candidate solution, and taking the candidate solution with the highest evaluation value as a solution to be determined;
the adding module is used for determining the running time of the target hardware for running the solution to be determined and adding iteration times;
the updating module is used for determining the running time as the current optimal time and determining the solution to be determined as the current optimal solution when the running time is smaller than the current optimal time or the current optimal time does not exist;
the iteration module is used for reselecting a designated number of unselected candidate solutions in the search space of the operator when the iteration times are smaller than the designated times;
and the optimization module is used for determining the current optimal solution as the optimal solution of the operator when the iteration times are not smaller than the appointed times.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the operator optimisation method described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the operator optimisation method described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the operator optimization method provided by the specification, a target neural network model is obtained, and a calculation map of the target neural network model is determined; for each operator in the computational graph, determining a search space containing all feasible solutions of the operator; selecting a plurality of feasible solutions as candidate solutions in the search space, determining an evaluation value of each candidate solution, and taking the highest evaluation value as a solution to be determined; determining the running time of the target hardware for running the solution to be determined, and increasing the iteration times; when the running time is smaller than the current optimal time or the current optimal time does not exist, determining the running time as the current optimal time, and determining the solution to be determined as the current optimal solution; when the iteration times are smaller than the appointed times, selecting appointed number of unselected candidate solutions in the search space of the operator again; and when the iteration times are not less than the designated times, determining the current optimal solution as the optimal solution of the operator.
When the operator optimization method provided by the specification is adopted to optimize operators in the computational graph of the target neural network model, a designated number of candidate solutions can be repeatedly selected in each iteration in a searching space of the operators in a cyclic iteration mode, and a local optimal solution is found; and finally searching a global optimal solution in the plurality of local optimal solutions to serve as an operator ground optimal solution. Compared with the traditional method, the operator optimization method provided by the specification is based on the Bayesian optimization idea, and potential optimal solutions are searched in a pre-evaluation mode, so that the number of times of operating different feasible solutions of operators in target hardware can be greatly reduced, and the time cost and the cost of operator optimization are obviously reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow diagram of an operator optimization method in the present specification;
FIG. 2 is a schematic diagram of an operator optimizing apparatus provided herein;
fig. 3 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
Many neural network models can now be derived in the form of computational graphs. Such as a model trained using a deep learning framework, such as Tensorflow, pytorch. At deployment time, the operators in the computational graph are mapped to the computation libraries provided by the hardware vendor to execute.
Currently, a relatively efficient deep learning compiler typically comprises the following flow: firstly, leading in a computational graph corresponding to a neural network model as a high-level intermediate representation (Intermediate Representation, IR), optimizing the graph level, then converting the optimized high-level IR into a low-level IR, optimizing the operator level, finally generating codes, and converting the low-level IR into executable codes of target hardware. Deep learning compilers such as TVM, halide, etc. employ schemes similar to those described above.
However, at present, the process of optimizing operators in the above-mentioned flow is still not mature enough, and the time consumption and the cost are relatively long. In order to solve the above problems, the present specification provides an operator optimizing method with higher efficiency.
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an operator optimization method in the present specification, specifically including the following steps:
s100: and obtaining a target neural network model and determining a calculation map of the target neural network model.
All steps in the operator optimization method provided in the present specification may be implemented by any electronic device having a computing function, such as a terminal, a server, or the like.
The operator optimization method provided by the specification is used for optimizing operators in the computational graph derived from the neural network model with higher efficiency, so that the speed of the operators in the neural network model when the operators run on hardware is improved. Based on this, in this step, the target neural network model may be acquired first, and after the target neural network model is acquired, a calculation map of the target neural network model may be determined.
A computation graph is a data structure for representing a computation process, and is composed of nodes and edges as well, as general graph data. In a computational graph derived from the neural network model, the nodes represent network layers, each different node in the computational graph corresponding to a different network layer in the neural network model, respectively; edges represent data flows, i.e. data is transferred between nodes (network layers). Each node contains several operators, each operator representing an operation, such as a matrix multiplication, etc. The operations represented by the different operators are different, and the operators contained in each node are used for realizing the functions of the network layer corresponding to the node.
S102: for each operator in the computational graph, determining a search space for the operator, wherein the search space contains all feasible solutions of the operator.
In the computational graph of the neural model network determined in step S100, there are several operators. The goal of the method is to ensure that each operator can run on the hardware at the fastest speed to the greatest extent, and the speed at which the operator runs on the hardware is generally dependent on how well the operator's feasible solution is adapted to the hardware. Wherein the feasible solution of an operator is a feasible way of calculating the operator. In general, there will be many different possible solutions for an operator, and the degree of adaptation of each possible solution to the hardware may be different. The higher the adaptation degree of one feasible solution of the operator and the hardware is, the faster the hardware runs the operator under the feasible solution; conversely, the lower the degree of adaptation of a viable solution of an operator to the hardware, the slower the hardware will run the operator at that viable solution.
Thus, in this step, the search space for each operator may be determined first. Wherein the search space for an operator contains all possible solutions for that operator. And determining all feasible solutions of each operator to find the feasible solutions with higher adaptation degree to hardware in the subsequent steps.
Preferably, the same operators can be combined preferentially before the search space of each operator is determined, so that the operator optimization efficiency is further improved. Typically, in each node of the computational graph, duplicate operators are likely to be included. For example, the same operator may be repeatedly applied for realizing the function of the network layer corresponding to one node; for another example, the same operator may be used when implementing the functions of the network layers corresponding to different nodes. The above situation is more common in practical applications. On the other hand, the adaptation degree, that is, the operation efficiency, of the same operator on the same hardware is unchanged, and is irrelevant to the node where the operator is located. Based on this, for an operator that repeatedly appears multiple times in the computational graph, the operator need only be optimized once. Thus, the same operators in the computational graph may be combined prior to determining the search space for each operator, such that no more than one of each operator is retained, and then the search space for each operator is determined.
S104: and selecting a specified number of feasible solutions in the search space of the operator as candidate solutions, determining the evaluation value of each candidate solution, and taking the candidate solution with the highest evaluation value as the solution to be determined.
For each operator, a specified number of possible solutions may be selected in the operator's search space for use as candidate solutions. Then, the evaluation value of each candidate solution may be determined, and the candidate solution with the highest evaluation value may be regarded as the solution to be determined. Wherein the specified number can be set according to specific requirements.
Wherein a candidate solution evaluation value of an operator is used to characterize the speed at which target hardware runs the operator under the candidate solution. The higher the evaluation value of a candidate solution, the faster the target hardware runs the corresponding operator under the candidate solution. It should be noted that the evaluation value is a value predicted in the method to assist the execution of the subsequent steps, and is not a real result.
In the operator optimization method provided in the present disclosure, steps S104 to S112 are cyclically executed from step S104 to step S112, and before a certain condition is not achieved, steps S104 to S112 are continuously cyclically executed to find an optimal solution of the operator. Step S104 is the start of a loop, and at the start of a loop for each round, a specified number of feasible solutions need to be reselected as candidate solutions. Further, to maximize the scope of the covered search space, a specified number of possible solutions should be selected each time that were not selected in the previous loop.
Preferably, in the process of determining the candidate solution, the method can determine the neighborhood solution based on the concept of Bayesian optimization so as to enable the determined candidate solutions to be relatively close, thereby ensuring that the determined candidate solutions in each round of circulation are different as much as possible. Specifically, a specified solution can be selected from the feasible solutions of the operator to serve as a candidate solution; and determining a neighborhood solution of the designated solution in the feasible solutions of the operator, and re-determining the neighborhood solution as the designated solution until the determined number of candidate solutions reaches the designated number.
Wherein the neighborhood of the specified solution is a feasible solution adjacent to the specified solution. For example, assuming a matrix multiplication operator with two different dimensional transforms, there are 4 possibilities for the first dimension transform and 55 different possibilities for the second dimension transform, then the operator has 4×55=220 different possible solutions. Here, adopt%) To represent a feasible solution, wherein +.>Represents one possibility of a change in the first dimension,/->One possibility in the transformation representing the second dimension is conceivable, in this case 1 +.>≤4,1≤/>Less than or equal to 55, and%>、/>Are integers. For any two feasible solutions of the same operator, if there is only one dimensional change between the two feasible solutions The two feasible solutions are neighborhood solutions, instead of the two. For example, assuming that the solution is (4, 20) in the above example, its neighborhood solution may be (3, 20), may be (1, 20), may be (4, 25), or (4, 10), etc. As long as there is only one dimension of transform difference between the two feasible solutions, then the two feasible solutions are neighborhood solutions to each other.
In the above method, for each operator, a specified solution may be first selected from available solutions of the operator, and the specified solution may be determined as one of candidate solutions; then, a neighborhood solution of the specified solution is determined, the determined neighborhood solution replaces the original specified solution, the new specified solution is determined, the specified solution is continuously determined as one of the candidate solutions … …, and the process is repeatedly executed until the number of the determined candidate solutions reaches the specified number. It can be seen that the above process is repeated a number of times until the number of determined candidate solutions reaches the specified number, in which process the determined specified solution and each neighborhood solution will eventually become the candidate solution. The candidate solutions determined by the method are relatively close in the transformation mode, so that the locally optimal solution to be determined is easier to determine; combining the processes of repeatedly determining the candidate solution and the undetermined solution for a plurality of times, combining the local optima, and obtaining the feasible solution of the global optimum more accurately.
More preferably, there may be a number of different ways in which the evaluation value of each candidate solution is determined. This specification provides one example for reference herein. Specifically, for each candidate solution, inputting the candidate solution into a pre-trained evaluation model to obtain an evaluation result of the candidate solution output by the evaluation model; and determining the evaluation value of the candidate solution according to the evaluation result.
The above manner evaluates each candidate solution by means of a pre-trained evaluation model. The specific structure and parameters of the evaluation model may be set according to the requirements, and the manner of training the evaluation model may also be changed along with the change of the structure of the evaluation model, which is not specifically limited in this specification. Also, the specification provides a common reference to one embodiment. Specifically, the evaluation model may include a plurality of prediction models; when determining an evaluation value of a candidate solution, inputting the candidate solution into each prediction model in a pre-trained evaluation model to obtain a prediction running time output by each prediction model; and determining the evaluation result of the candidate solution according to each prediction running time.
In the above embodiment, the evaluation model adopts the idea of a random forest model, and is composed of a plurality of different prediction models. The prediction models can be models with different structures and parameters, or can be models with the same structures and different parameters, and the specification does not limit the structure, and only ensures that the models can give out non-identical outputs for the same input.
The functions of all the prediction models are the same, and the prediction running time of a given operator under a candidate solution is output according to the candidate solution of the operator. However, since the prediction models are different, there must be a difference between the results given by the prediction models. At this time, the evaluation result of the candidate solution is determined according to each prediction running time given by each prediction model.
Under the random forest thought, when a plurality of prediction models are adopted to form an evaluation model, the evaluation result of the evaluation model can also be related to the prediction running time obtained by the prediction model. Specifically, the evaluation result may include an average running time and a standard deviation; when the evaluation result of the candidate solution is determined according to the prediction running time output by the prediction model, the average running time and standard deviation of the candidate solution can be determined according to each prediction running time.
The average running time can be the average value of each prediction running time, the standard deviation is the standard deviation of each prediction running time, and the running speed of an operator under a candidate solution on hardware can be objectively and accurately reflected by adopting the two values.
Further, since the evaluation results of the respective candidate solutions cannot be directly compared strictly, it is necessary to convert the evaluation results into evaluation values that are likely to be compared and then compare them. Specifically, the evaluation value of the candidate solution may be determined according to the evaluation result and the current optimal time.
The current optimal time is the shortest running time among the running times of the known partial feasible solutions on the target hardware at the current moment. Since steps S104-S112 are performed in a loop and are repeated several times, there may be some known running time of the feasible solution on the target hardware when performing the steps. And the minimum running time is taken as the current optimal time among the above known running times.
Additionally, there may be a number of different ways in which the evaluation value of the candidate solution is determined based on the evaluation result and the current optimal time. The present specification provides a specific embodiment for reference, specifically, the evaluation value of the candidate solution may be determined according to the following formula:
where EI represents an evaluation value, x represents a candidate solution,the average run time in the evaluation result is indicated,represents the standard deviation in the evaluation result, +.>The current optimal time; z= = ->And->The probability distribution function and the density function at which the standard normal distribution is located are represented, respectively. An evaluation value of a candidate solution can be obtained by the above formula.
When the evaluation values of all the selected candidate solutions of one operator are determined by the evaluation model, the candidate solution with the highest evaluation value can be selected as the undetermined solution of the operator for use in the subsequent step.
S106: and determining the running time of the target hardware for running the solution to be determined, and increasing the iteration times.
In this step, for each operator in the computational graph, the runtime of the undetermined solution of that operator in the target hardware may be determined while increasing the number of iterations. Specifically, an application program corresponding to an operator under the undetermined state, namely, a code for realizing the operator by adopting the undetermined state, is input into target hardware, and the time required by the target hardware to execute the code is determined as the running time of the undetermined state of the operator in the target hardware.
In the operator optimization method provided by the specification, the iteration times are times of circularly executing the steps S104 to S112. Normally, when the method is executed, the iteration number starts with 0, and each time the loop of step S104 to step S112 is executed, the iteration number is increased by 1 in step S106.
S108: and when the running time is smaller than the current optimal time or the current optimal time does not exist, determining the running time as the current optimal time, and determining the undetermined solution as the current optimal solution.
The run time of the undetermined solution of the operator in the target hardware may be determined in step S106, while the operator optimization is ultimately aimed at finding a feasible solution for the operator in the computational graph that is as short as possible in run time. Thus, the difference in the runtime of the solution to be determined for the operator in the target hardware can be handled in different ways.
When the running time of the undetermined solution of the operator in the target hardware is smaller than the current optimal time or the current optimal time does not exist, the running time of the undetermined solution of the operator in the target hardware can be determined to be the current optimal time, and meanwhile the undetermined solution of the operator is determined to be the current optimal solution of the operator. When the running time is smaller than the current optimal time, the undetermined solution of the operator can be better adapted to the target hardware and is better than the existing current optimal solution, so that the running time of the undetermined solution of the operator can be used for replacing the current optimal time to serve as new current optimal time, and the undetermined solution of the operator is used for replacing the current optimal solution to serve as new current optimal solution. And for the situation that the current optimal time does not exist, the method generally only occurs in the loop of the first round, and any feasible solution is not operated in the target hardware, at this time, the operation time of the solution to be determined of the operator can be directly used as the current optimal time, and the solution to be determined of the operator can be used as the current optimal solution.
In another case, when the running time of the solution to be determined of the operator is not less than the current optimal time, the solution to be determined of the operator is described as not being superior to the current optimal solution, so that the solution to be determined of the operator can be directly discarded, no processing is performed, and the subsequent steps are directly executed.
S110: and when the iteration times are smaller than the designated times, selecting a designated number of unselected feasible solutions in the search space of the operator again to serve as candidate solutions.
In order to ensure the execution efficiency of the operator optimization method provided in the present specification, when the loop iteration of steps S104 to S112 is executed, the iteration number is limited, that is, the iteration number does not exceed the designated number. The designated times can be set according to specific requirements.
And when the iteration number does not reach the upper limit, that is, the iteration number is smaller than the designated number, returning to step S104, and selecting a designated number of unselected feasible solutions in the search space of the operator again to serve as candidate solutions, and re-executing steps S104-S112.
S112: and when the iteration times are not smaller than the designated times, determining the current optimal solution as the optimal solution of the operator.
For each operator, when the iteration number reaches the upper limit, that is, when the iteration number is not less than the designated number, the current optimal solution finally determined after multiple times of loop iteration can be determined as the optimal solution of the operator. The method thus completes the optimization of the operators in the computational graph. When the method is applied, the optimal solution of the operator determined in the step is directly applied to the target neural network model, so that optimal adaptation can be achieved when the target neural network model is realized in target hardware, and the operation in the target neural network model can be realized with higher efficiency.
When the operator optimization method provided by the specification is adopted to optimize operators in the computational graph of the target neural network model, a designated number of candidate solutions can be repeatedly selected in each iteration in a searching space of the operators in a cyclic iteration mode, and a local optimal solution is found; and finally searching a global optimal solution in the plurality of local optimal solutions to serve as an operator ground optimal solution. Compared with the traditional method, the operator optimization method provided by the specification is based on the Bayesian optimization idea, and potential optimal solutions are searched in a pre-evaluation mode, so that the number of times of operating different feasible solutions of operators in target hardware can be greatly reduced, and the time cost and the cost of operator optimization are obviously reduced.
Additionally, the assessment model employed in the present specification may be pre-trained. Specifically, a sample operator may be obtained; determining a sample feasible solution of the sample operator and a true run time of the sample feasible solution in target hardware; inputting the sample feasible solution into each prediction model to obtain the prediction running time to be optimized output by the prediction model, wherein the sample operators to which the sample feasible solution input into each prediction model belongs are different; and training the prediction model by taking the minimum difference between the sample to-be-optimized prediction running time and the actual running time of the sample feasible solution input into the prediction model in target hardware as an optimization target.
In the case where the assessment model is used on the basis of a random forest model, the training of the assessment model may in fact be translated into the training of each predictive model contained in the assessment model. The multiple predictive models contained in the assessment model cannot be identical and need to have different parameters so that each predictive model gives a differential output when faced with the same input. Thus, while each predictive model may be trained in the same manner during training, different sample operators are required to be trained. Thus, each of the trained predictive models can give different outputs for the same input. Of course, in an ideal case, even prediction models trained with different operator samples may have the same parameters. However, in practical application, an ideal situation cannot occur, so that the output probabilities obtained by the prediction models for the same input can be similar, but are rarely the same.
More preferably, in order to further improve the effect of operator optimization, the undetermined solution determined in step S104 may be used as one of the feasible solutions of the samples, and the evaluation model may be further trained in combination with the running time of the undetermined solution determined in step S106 in the target hardware. That is, in the practical application process, optimization and adjustment of the evaluation model are continuously inserted, so as to achieve the best operator optimization effect as much as possible.
The operator optimizing method provided by the specification is based on the same thought, and the specification also provides a corresponding operator optimizing device, as shown in fig. 2.
Fig. 2 is a schematic diagram of an operator optimizing apparatus provided in the present specification, specifically including:
an obtaining module 200, configured to obtain a target neural network model, and determine a computation graph of the target neural network model;
a determining module 202, configured to determine, for each operator in the computational graph, a search space of the operator, where the search space includes all feasible solutions of the operator;
a selecting module 204, configured to select a specified number of feasible solutions in the search space of the operator as candidate solutions, determine an evaluation value of each candidate solution, and use a candidate solution with the highest evaluation value as a solution to be determined;
an adding module 206, configured to determine a running time for the target hardware to run the solution to be determined, and increase the iteration number;
an updating module 208, configured to determine the running time as the current optimal time and determine the pending solution as the current optimal solution when the running time is less than the current optimal time or the current optimal time does not exist;
an iteration module 210, configured to reselect a specified number of unselected candidate solutions in the search space of the operator when the iteration number is less than a specified number;
And an optimization module 212, configured to determine the current optimal solution as an optimal solution of the operator when the iteration number is not less than the specified number.
Optionally, the selecting module 204 is specifically configured to select a specified solution from available solutions of the operator as a candidate solution; and determining a neighborhood solution of the designated solution in the feasible solutions of the operator, and re-determining the neighborhood solution as the designated solution until the determined number of candidate solutions reaches the designated number.
Optionally, the selecting module 204 is specifically configured to input, for each candidate solution, the candidate solution into a pre-trained evaluation model, so as to obtain an evaluation result of the candidate solution output by the evaluation model; and determining the evaluation value of the candidate solution according to the evaluation result.
Optionally, the evaluation model comprises a number of prediction models;
the selection module 204 is specifically configured to input the candidate solution into each prediction model in the pre-trained evaluation model, so as to obtain a prediction running time output by each prediction model; and determining the evaluation result of the candidate solution according to each prediction running time.
Optionally, the evaluation results include an average run time and standard deviation;
the selection module 204 is specifically configured to determine an average running time and a standard deviation of the candidate solution according to each predicted running time.
Optionally, the selecting module 204 is specifically configured to determine an evaluation value of the candidate solution according to the evaluation result and the current optimal time.
Optionally, the apparatus further comprises a training module 214, specifically configured to obtain a sample operator; determining a sample feasible solution of the sample operator and a true run time of the sample feasible solution in target hardware; inputting the sample feasible solution into each prediction model to obtain the prediction running time to be optimized output by the prediction model, wherein the sample operators to which the sample feasible solution input into each prediction model belongs are different; and training the prediction model by taking the minimum difference between the sample to-be-optimized prediction running time and the actual running time of the sample feasible solution input into the prediction model in target hardware as an optimization target.
The present specification also provides a computer readable storage medium having stored thereon a computer program operable to perform the operator optimisation method provided in figure 1 above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 3. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 3, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the operator optimization method described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims (7)

1. An operator optimization method, comprising:
acquiring a target neural network model, and determining a calculation map of the target neural network model;
determining a search space of each operator in the computational graph for the operator, wherein the search space contains all feasible solutions of the operator;
selecting a specified number of feasible solutions in a search space of the operator as candidate solutions, determining an evaluation value of each candidate solution, and taking the candidate solution with the highest evaluation value as a solution to be determined;
determining the running time of the target hardware for running the solution to be determined, and increasing the iteration times;
when the running time is smaller than the current optimal time or the current optimal time does not exist, determining the running time as the current optimal time, and determining the solution to be determined as the current optimal solution;
when the iteration times are smaller than the appointed times, selecting appointed number of unselected candidate solutions in the search space of the operator again;
when the iteration times are not smaller than the appointed times, determining the current optimal solution as the optimal solution of the operator;
wherein, determining the evaluation value of each candidate solution specifically includes:
inputting each candidate solution into a pre-trained evaluation model aiming at each candidate solution to obtain an evaluation result of the candidate solution output by the evaluation model;
Determining an evaluation value of the candidate solution according to the evaluation result;
the evaluation model comprises a plurality of prediction models;
inputting the candidate solution into a pre-trained evaluation model to obtain an evaluation result of the candidate solution output by the evaluation model, wherein the evaluation result specifically comprises:
inputting the candidate solution into each prediction model in a pre-trained evaluation model to obtain the prediction running time output by each prediction model;
determining an evaluation result of the candidate solution according to each prediction running time;
the evaluation results include average run time and standard deviation;
determining an evaluation result of the candidate solution according to each prediction running time, wherein the evaluation result specifically comprises:
the average run time and standard deviation of the candidate solution are determined based on each predicted run time.
2. The method of claim 1, wherein selecting a specified number of candidate solutions in the operator's search space comprises:
selecting a specified solution from feasible solutions of the operator as a candidate solution;
and determining a neighborhood solution of the designated solution in the feasible solutions of the operator, and re-determining the neighborhood solution as the designated solution until the determined number of candidate solutions reaches the designated number.
3. The method of claim 1, wherein determining the evaluation value of the candidate solution based on the evaluation result, specifically comprises:
And determining the evaluation value of the candidate solution according to the evaluation result and the current optimal time.
4. The method of claim 1, wherein pre-training the assessment model, in particular, comprises:
acquiring a sample operator;
determining a sample feasible solution of the sample operator and a true run time of the sample feasible solution in target hardware;
inputting the sample feasible solution into each prediction model to obtain the prediction running time to be optimized output by the prediction model, wherein the sample operators to which the sample feasible solution input into each prediction model belongs are different;
and training the prediction model by taking the minimum difference between the sample to-be-optimized prediction running time and the actual running time of the sample feasible solution input into the prediction model in target hardware as an optimization target.
5. An operator optimizing apparatus, comprising:
the acquisition module is used for acquiring a target neural network model and determining a calculation map of the target neural network model;
a determining module, configured to determine, for each operator in the computation graph, a search space of the operator, where the search space includes all feasible solutions of the operator;
The selection module is used for selecting a specified number of feasible solutions in the search space of the operator to serve as candidate solutions, determining an evaluation value of each candidate solution, and taking the candidate solution with the highest evaluation value as a solution to be determined;
the adding module is used for determining the running time of the target hardware for running the solution to be determined and adding iteration times;
the updating module is used for determining the running time as the current optimal time and determining the solution to be determined as the current optimal solution when the running time is smaller than the current optimal time or the current optimal time does not exist;
the iteration module is used for reselecting a designated number of unselected candidate solutions in the search space of the operator when the iteration times are smaller than the designated times;
the optimization module is used for determining the current optimal solution as the optimal solution of the operator when the iteration times are not smaller than the appointed times;
the selection module is specifically configured to input each candidate solution into a pre-trained evaluation model to obtain an evaluation result of the candidate solution output by the evaluation model; determining an evaluation value of the candidate solution according to the evaluation result;
the evaluation model comprises a plurality of prediction models;
The selection module is specifically configured to input the candidate solution into each prediction model in the pre-trained evaluation model, so as to obtain a prediction running time output by each prediction model; determining an evaluation result of the candidate solution according to each prediction running time;
the evaluation results include average run time and standard deviation;
the selection module is specifically configured to determine an average running time and a standard deviation of the candidate solution according to each predicted running time.
6. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-4.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-4 when executing the program.
CN202310941263.1A 2023-07-28 2023-07-28 Operator optimization method and device, storage medium and electronic equipment Active CN116663618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310941263.1A CN116663618B (en) 2023-07-28 2023-07-28 Operator optimization method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310941263.1A CN116663618B (en) 2023-07-28 2023-07-28 Operator optimization method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN116663618A CN116663618A (en) 2023-08-29
CN116663618B true CN116663618B (en) 2023-12-05

Family

ID=87717473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310941263.1A Active CN116663618B (en) 2023-07-28 2023-07-28 Operator optimization method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116663618B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860259B (en) * 2023-09-05 2023-12-19 之江实验室 Method, device and equipment for model training and automatic optimization of compiler
CN117075918B (en) * 2023-10-13 2024-01-09 之江实验室 Model deployment method and device, storage medium and electronic equipment
CN117171577B (en) * 2023-11-02 2024-03-22 之江实验室 Dynamic decision method and device for high-performance operator selection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
CN110598842A (en) * 2019-07-17 2019-12-20 深圳大学 Deep neural network hyper-parameter optimization method, electronic device and storage medium
WO2021175058A1 (en) * 2020-03-05 2021-09-10 华为技术有限公司 Neural network architecture search method and apparatus, device and medium
WO2023287239A1 (en) * 2021-07-16 2023-01-19 서울대학교산학협력단 Function optimization method and apparatus
CN116301904A (en) * 2023-05-18 2023-06-23 之江实验室 Operator optimization acceleration method and device for deep learning compiler
CN116306856A (en) * 2023-05-17 2023-06-23 之江实验室 Deep learning model deployment method and device based on search
CN116484906A (en) * 2023-04-23 2023-07-25 北京航空航天大学 Method and device for searching graph neural network architecture

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7003753B2 (en) * 2018-03-14 2022-01-21 富士通株式会社 Search point determination program, search point determination method and search point determination device
US11120014B2 (en) * 2018-11-23 2021-09-14 International Business Machines Corporation Enhanced search construction and deployment
US20220035878A1 (en) * 2021-10-19 2022-02-03 Intel Corporation Framework for optimization of machine learning architectures

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
CN110598842A (en) * 2019-07-17 2019-12-20 深圳大学 Deep neural network hyper-parameter optimization method, electronic device and storage medium
WO2021175058A1 (en) * 2020-03-05 2021-09-10 华为技术有限公司 Neural network architecture search method and apparatus, device and medium
WO2023287239A1 (en) * 2021-07-16 2023-01-19 서울대학교산학협력단 Function optimization method and apparatus
CN116484906A (en) * 2023-04-23 2023-07-25 北京航空航天大学 Method and device for searching graph neural network architecture
CN116306856A (en) * 2023-05-17 2023-06-23 之江实验室 Deep learning model deployment method and device based on search
CN116301904A (en) * 2023-05-18 2023-06-23 之江实验室 Operator optimization acceleration method and device for deep learning compiler

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search;Linnan Wang et al.;《arXiv[cs.LG]》;第1-15页 *
贝叶斯优化方法和应用综述;崔佳旭 等;《软件学报》;第29卷(第10期);第3068-3090页 *

Also Published As

Publication number Publication date
CN116663618A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN116663618B (en) Operator optimization method and device, storage medium and electronic equipment
CN116304720B (en) Cost model training method and device, storage medium and electronic equipment
CN116306856B (en) Deep learning model deployment method and device based on search
CN115981870B (en) Data processing method and device, storage medium and electronic equipment
CN116502679B (en) Model construction method and device, storage medium and electronic equipment
CN116185532B (en) Task execution system, method, storage medium and electronic equipment
CN110826894A (en) Hyper-parameter determination method and device and electronic equipment
CN116502633A (en) Method and device for executing service, storage medium and electronic equipment
CN116860259B (en) Method, device and equipment for model training and automatic optimization of compiler
CN116049761A (en) Data processing method, device and equipment
CN116151355B (en) Method, device, medium and equipment for model training and service execution
CN115543945B (en) Model compression method and device, storage medium and electronic equipment
CN116150563A (en) Service execution method and device, storage medium and electronic equipment
CN113205377A (en) Information recommendation method and device
CN116991388B (en) Graph optimization sequence generation method and device of deep learning compiler
CN116415103B (en) Data processing method, device, storage medium and electronic equipment
CN116434787B (en) Voice emotion recognition method and device, storage medium and electronic equipment
CN116186272B (en) Combined training method and device, storage medium and electronic equipment
CN115862675B (en) Emotion recognition method, device, equipment and storage medium
CN117075918B (en) Model deployment method and device, storage medium and electronic equipment
CN117171577B (en) Dynamic decision method and device for high-performance operator selection
CN117009729B (en) Data processing method and device based on softmax
CN116340469B (en) Synonym mining method and device, storage medium and electronic equipment
CN116931955B (en) Compiler automatic optimization method and device based on artificial intelligence
CN117828360A (en) Model training method, model training device, model code generating device, storage medium and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant