CN116450344A

CN116450344A - Task execution method and device, storage medium and electronic equipment

Info

Publication number: CN116450344A
Application number: CN202310269682.5A
Authority: CN
Inventors: 卢政先; 谢学说; 李涛
Original assignee: Nankai University; Zhejiang Lab
Current assignee: Nankai University; Zhejiang Lab
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-07-18

Abstract

The specification discloses a task execution method, a task execution device, a storage medium and electronic equipment. The task execution method comprises the following steps: according to a first task instruction, a target model and each candidate training frame are obtained, at least one of designated parameters related to the training of the target model by different candidate training frames, operators called by different candidate training frames and dependency relations among operators, and updating modes of the target model by different candidate training frames are kept equivalent as targets, each adjusted frame is obtained, the operation duration of terminal equipment for deploying the target model when the adjusted frame executes operation of the target model is determined, the priority corresponding to the adjusted frame is determined according to the operation duration, the target training frame is determined from each candidate training frame according to the priority of each adjusted frame, and when a second task instruction is received, a model training task is executed through the target training frame.

Description

Task execution method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a task execution method, a task execution device, a storage medium, and an electronic device.

Background

In recent years, deep learning has been widely used in various fields such as image recognition, natural language processing, information recommendation, etc., and a training framework which is convenient to use and capable of completing model training tasks in a reasonable time is indispensable for efficient construction and training of a deep learning model.

The deep learning framework provides a platform for shielding the underlying computing environment for the user, and the user only needs to pay attention to the construction of the model in the platform, does not need to pay attention to how the model is calculated in the underlying hardware, and relieves the burden of the user when training the model.

However, the number of training frames in the market is numerous, so that the user cannot intuitively judge the difference and the quality between different training frames, and the user cannot easily select the training frames meeting the own expectations from a plurality of training frames because no mature method for evaluating the different training frames exists at present.

Therefore, how to accurately evaluate different training frames, so that a user can select a training frame which accords with the expected training frame according to the evaluation result is a problem to be solved.

Disclosure of Invention

The present disclosure provides a task execution method, a task execution device, a storage medium, and an electronic device, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a task execution method, which comprises the following steps:

receiving a first task instruction;

acquiring a target model and each candidate training frame according to the first task instruction;

adjusting each candidate training frame by taking at least one of designated parameters related to the training of the target model by different candidate training frames, operators called by the different candidate training frames and dependency relations among the operators, and updating modes of the target model by the different candidate training frames as targets, wherein the updating modes are kept equivalent;

for each adjusted frame, determining the duration of the terminal equipment deploying the target model when the terminal equipment executes the operation of the target model based on the adjusted frame, and taking the duration as the operation duration;

determining the priority corresponding to the adjusted frame according to the operation duration;

determining a target training frame from the candidate training frames according to the priority corresponding to each adjusted frame;

And when a second task instruction is received, executing a model training task aiming at the target model through the target training framework.

Optionally, the specified parameters include: at least one of a super parameter involved in preprocessing the input data of the target model, a super parameter corresponding to an operator called by each candidate training frame, a super parameter involved in updating the weight of the target model, and a super parameter related to training performance.

Optionally, the updating mode includes: at least one of a transformation function, a weight update function, and a regularization function applied to the gradient.

Optionally, before adjusting each candidate training frame to obtain each adjusted frame, the method further includes:

judging whether the operator called by each candidate training frame and the dependency relationship between the operators are the same or not;

if not, the candidate training frameworks are adjusted by taking the operator called by each candidate training framework and the dependency relationship between the operators as the target of maintaining the equivalence.

Optionally, determining whether the operator invoked by each candidate training frame and the dependency relationship between the operators are the same specifically includes:

And deploying the target models in the candidate training frames, inputting the same data for the target models under the candidate training frames, setting the same parameters, judging whether the target models generate the same output in the candidate training frames, and if so, determining that the operator called by the candidate training frames is the same as the dependency relationship between the operators.

Optionally, inputting the same data for the target model under each candidate training frame and setting the same parameters, and judging whether the target model generates the same output in each candidate training frame or not, which specifically includes:

selecting one of the candidate training frames to train the target model;

after at least one iteration, the model parameters of the target model are exported and converted into a designated parameter format, and the designated parameter format is loaded into the target models of other candidate training frameworks;

and exporting the target model in each candidate training frame into a designated model format, inputting the same data into the target models of the designated model formats, and judging whether the target models of the designated model formats generate the same output.

Optionally, for each adjusted framework, determining a duration when the terminal device deploying the target model performs the operation of the target model based on the adjusted framework, where the duration is taken as an operation duration, and specifically includes:

sampling the running state of the terminal equipment according to a preset sampling period;

and determining the duration of the terminal equipment for deploying the target model in the sampling period when the terminal equipment executes the operation of the target model based on the adjusted framework, and taking the duration as the operation duration.

Optionally, determining the priority corresponding to the adjusted frame according to the operation duration specifically includes:

according to the operation duration, determining the equipment utilization rate and the calculation efficiency of the adjusted framework on the terminal equipment;

and determining the priority according to the equipment utilization rate and the computing efficiency.

Optionally, determining the device utilization rate according to the operation duration specifically includes:

and determining the equipment utilization rate according to the operation time length and the sampling time length corresponding to the sampling period, wherein the equipment utilization rate and the operation time length are in positive correlation.

Alternatively, the longer the operation duration, the lower the calculation efficiency.

Optionally, the training framework includes: deep learning framework.

The present specification provides a task execution device including:

the receiving module receives a first task instruction;

the acquisition module acquires a target model and each candidate training frame according to the first task instruction;

the adjustment module is used for adjusting each candidate training frame by taking at least one of specified parameters related to the training of the target model by different candidate training frames, operators called by the different candidate training frames and dependency relations among the operators and the updating modes of the target model by the different candidate training frames as a target, wherein the updating modes are kept equivalent;

the first determining module is used for determining the duration of the terminal equipment deploying the target model when the terminal equipment executes the operation of the target model based on the adjusted frames as the operation duration aiming at each adjusted frame;

the second determining module is used for determining the priority corresponding to the adjusted frame according to the operation duration;

the third determining module determines a target training frame from the candidate training frames according to the priority corresponding to each adjusted frame;

And the execution module is used for executing a model training task aiming at the target model through the target training frame when receiving a second task instruction.

Optionally, the first determining module is specifically configured to sample an operation state of the terminal device according to a preset sampling period; and determining the duration of the terminal equipment for deploying the target model in the sampling period when the terminal equipment executes the operation of the target model based on the adjusted framework, and taking the duration as the operation duration.

Optionally, the second determining module is specifically configured to determine, according to the operation duration, a device utilization rate and a computing efficiency of the adjusted frame on the terminal device; and determining the priority according to the equipment utilization rate and the computing efficiency.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the task execution method described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the task execution method described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the task execution method provided by the specification, a target model and each candidate training frame are obtained according to a first task instruction, at least one of designated parameters related to training the target model by different candidate training frames, operators called by different candidate training frames and dependency relations among the operators, and updating modes when the target model is updated by different candidate training frames are kept equivalent as targets, each adjusted frame is obtained, the operation duration when the terminal equipment for deploying the target model executes the operation of the target model based on the adjusted frame is determined, the priority corresponding to the adjusted frame is determined according to the operation duration, the target training frame is determined from each candidate training frame according to the priority of each adjusted frame, and when a second task instruction is received, the model training task is executed through the target training frame.

According to the method, the priority corresponding to each training frame can be determined on the premise that at least one of specified parameters, called operators and dependency relations among the operators and updating modes of the target model are kept equivalent when different frames train the target model, and therefore the target training frame is selected to train the model to be trained according to the priority. Compared with the existing method, the method and the system can evaluate the bottom logic of different training frames on the premise of controlling various variables, so that a target training frame which meets expectations can be selected according to an evaluation result to execute a model training task.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a schematic flow chart of a task execution method provided in the present specification;

FIG. 2 is a schematic illustration of an iterative process for one model provided in the present specification;

FIG. 3 is a schematic diagram of an evaluation method of a training framework provided in the present specification;

FIG. 4 is a schematic diagram of a task performing device provided in the present specification;

fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the prior art, the advantages and disadvantages of the training frames are generally compared by the measurement dimensions such as training time, data throughput, quality (accuracy) of the trained model and the like of the model under different training frames, but the methods cannot realize effective variable control in the measurement process, so that the final measurement result is inaccurate.

Moreover, because of many factors affecting the evaluation dimension, the adoption of the evaluation dimension is too macroscopic, so that analysis of a specific technology or implementation of the training framework is difficult, and less reflection of algorithms and technologies used by the training framework is achieved. Because these evaluation dimensions are mainly focused on information representing the aspects of the application or hardware, the advantages and disadvantages of the training framework cannot be intuitively represented.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a task execution method provided in the present specification, including the following steps:

s101: and receiving a first task instruction.

S102: and acquiring a target model and each candidate training frame according to the first task instruction.

In general, the process of model training can be seen as an iterative process of multiple training rounds (epochs) in which the data set of the entire training sample will be used in its entirety once in each epoch. Each epoch, in turn, randomly divides the entire data set into equal numbers of pieces of data, each of which is referred to as a mini-batch. Each mini-batch performs a complete training process, which is called a training step. Thus, the training step will be repeated multiple times in one epoch until all mini-batches have undergone one training process.

In each training step, the training samples need to be first pre-processed, such as data enhancement, normalization, etc. And taking the preprocessed data as the input of the model to perform forward propagation calculation once, and calculating to obtain a loss function. Since the model training process is a process of continuously adjusting model parameters to minimize the loss function, the back propagation process will find the gradient of model parameters to the loss function according to the chain law. Finally, the model parameters are updated along the opposite direction of the gradient, so that the loss function is reduced, and a tracking step is completed. The process of model training is largely defined by the user in code form, with only the back propagation process derived from the automatic differential derivation of the framework.

In model training, the training framework needs to construct computing graph containing data preprocessing, forward computation, back propagation and gradient update either explicitly or implicitly from application training step defined in user code.

According to the constructed computational graph, the training framework can complete device management, memory application release, call computing devices to execute computing functions (kernel functions), and the like. In invoking the terminal device to perform the computation, the framework maps the nodes in the one or more computation graphs into one or more computation functions and executes the computation functions in the terminal device. The training framework may map nodes in the computational graph to the same computational function, e.g., by means of a deep learning computational library such as cuDNN, or to different computational functions. In addition, the device utilization is also inconsistent. If the framework cannot allocate the next computing function before the terminal device completes the computing task, the terminal device is idle for a period of time. These factors together lead to performance gaps between frames.

Based on this, the present specification provides a task execution method, in which a training frame is used as a scheduler for scheduling a specified computation graph to a terminal device for computation, and on the premise of ensuring parameter consistency, operator and dependency consistency and update mode consistency under different training frames, performance of the training frame is evaluated through two evaluation dimensions of high efficiency (computation efficiency) and sufficiency (device utilization), so that a target training frame is selected according to an evaluation result.

When the server receives the first task instruction, the target model and each candidate training frame may need to be acquired, where in this specification, the first task instruction may be an instruction for evaluating quality of each candidate training frame or determining a target training frame for training a model to be trained.

In addition, the target model may be obtained from a known model library, and the training framework may be a deep learning framework, such as Caffe, tensor Flow, microsoft cognitive toolset (Microsoft Cognitive Toolkit, CNTK), MXNet, pyTorch, paddlePaddle, one Flow, etc., although the training framework in the present specification may also be other training frameworks, such as a reinforcement learning framework, which is not specifically limited in the present specification.

It should be noted that, each candidate training frame in the present specification may be a different training frame, or may be different versions of the same training frame.

In this specification, the execution body for implementing the task execution method may be a designated device such as a server, and for convenience of description, only the server is taken as an execution body in this specification, and one task execution method provided in this specification will be described.

S103: and adjusting each candidate training frame by taking at least one of designated parameters related to the training of the target model by different candidate training frames, operators called by the different candidate training frames and dependency relations among the operators, and an updating mode of the target model when the target model is updated by the different candidate training frames as a target, so as to obtain each adjusted frame.

In order to effectively control variables of each candidate training frame, objectivity and accuracy of evaluation results are guaranteed, the server needs to guarantee that appointed parameters involved in training of the target model by different candidate training frames are equivalent, dependency relationships among operators called by different candidate training frames are equivalent (namely model equivalent), and updating modes of the models are equivalent (namely training equivalent) in the training process, so that running mechanisms (such as bottom logic of algorithms, technologies and the like adopted by different candidate training frames) of different candidate training frames can be effectively compared.

Specifically, in the process of realizing parameter equivalence, the server can ensure that the designated parameters involved in training the target model are equivalent, and the designated parameters include:

the super parameters involved in preprocessing the input data of the target model, such as the mean value, variance, random scaling factor and other super parameters used by standardized data;

super parameters corresponding to operators called by each training frame, such as whether convolution contains bias items bias, eps value, momentum value and the like of Batch Normalization layers, and a weight initialization method of operators called by each training frame;

super parameters such as learning rate, regularization factors and the like which are involved in updating the weight of the target model;

super parameters related to training performance, such as the number of threads that perform data preprocessing in parallel, etc.

It should be noted that the above-mentioned super parameters are supported by all training frameworks, and by setting these super parameters, the performance can be improved over that in the default state. In practice, the parameters described above may be different under different training frameworks, which involve adjusting the model during the model training process, processing the data, and so on. Taking the eps value as an example, the values represent floating point relative precision, some eps default values in the training framework are 1e-6, some eps default values are 1e-3, 1e-5 and the like, and different eps values can control different floating point relative precision aiming at a target model in the training process.

In the process of realizing model equivalence, the server can firstly judge whether operators called by different training frameworks and the dependency relationship between the operators are the same or not.

In general, the computational graph constructed by the training framework can directly reflect the operator called by the training framework and the dependency relationship between the operators, and in the present specification, the computational graph in the whole training process can be G _g The calculation graph of model forward propagation is denoted as G _m G, i.e _m Is G _g Is a sub-graph of (c). In order for the evaluation result to be meaningful, the user code should be fairly implemented for each frame, i.e., the frames should construct the same G _g . The training process can be regarded as trainingstep iterative process, whereas a single tracking step contains data preprocessing, forward computation, back propagation and parameter updating. For ease of understanding, the present description provides an iterative process schematic of a model, as shown in fig. 2.

Fig. 2 is a schematic diagram of an iterative process of a model provided in the present specification.

The single training step of the model comprises four stages of data preprocessing, forward calculation, back propagation and parameter updating.

The forward calculation is based on G _m And sequentially executing the forward computation process of the operators to finally obtain the output and loss function of each operator. Similarly, the back propagation process is based on G _m And (3) sequentially executing the reverse calculation of operators in the reverse order of the topological order, and obtaining the gradient of the weight of the target model according to a chain rule. The weight updating is to update the weight according to a certain updating method according to the weight and the gradient.

The operator may include an operator of the target model itself, such as a convolution operator, a ReLU operator, and the like, or may include other operators provided by the candidate training framework, such as vector multiplication and addition involved in updating parameters of the target model.

In practical application, G of training frame structure is not directly obtained _m The server can check whether the target model generates the same output by giving the same input and the same parameters to the models realized by different candidate training frameworks, so as to indirectly verify whether model equivalence is realized, namely, operators called by different training frameworks and dependency relationships among the operators are equivalent.

The server can deploy the target model in each candidate training frame, input the same data for the target model under each candidate training frame and set the same parameters, judge whether the target model generates the same output in each candidate training frame, if yes, determine that the dependency relationship between operators called by different candidate training frames and each operator is the same, otherwise, the model equivalence is not realized.

If the specified parameters are not kept equivalent, the server may adjust each candidate training frame to obtain an adjusted frame so that the specified parameters related to each candidate training frame are kept equivalent.

For example, the server may take one of an average value, a median value, or a crowd value of the specified parameters corresponding to each candidate training frame as the specified parameter corresponding to each adjusted frame.

Furthermore, as the forward reasoning of the target model adopts floating point number calculation, the results can generate small differences due to different calculation sequences, and errors are accumulated along with the forward calculation process, so that the outputs of different candidate training frames are difficult to directly compare. Therefore, the server can select one of the candidate training frames to train the target model, and after at least one iteration, the model parameters of the target model are exported and converted into a designated parameter format (such as a NumPy format) so as to be loaded into the target models in other candidate training frames.

The server may then export the target model in each candidate training frame to a specified model format (e.g., open neural network exchange (Open Neural Network Exchange, ONNX)), and input the same data to the target model for each specified model format (ONNX), thereby running the target model using ONNX, and further determining that the target model for each specified format is to produce the same output.

If the operator called by each candidate training frame and the dependency relationship between the operators are not kept equivalent, the server can adjust the parameters of each candidate training frame so as to obtain an adjusted frame, so that the operator called by each adjustment frame and the dependency relationship between the operators are not kept equivalent.

In the process of realizing training equivalence, the server can keep the updating modes of updating the target model by different training frameworks consistent, and the updating modes can comprise: a transformation function, a weight update function, a regularization function, etc. applied to the gradient.

Taking a weight updating function as an example, a weight updating function (also called an optimizer) which is widely adopted is a Momentum optimizer (Momentum), and Momentum items are added on the basis of a random gradient descent optimizer, so that the model has the advantages of fast convergence and reduction of the possibility of sinking into local minimum values. However, the momentum optimizers implemented by different training frameworks may be different.

In w _t For model parameters at the t-th iteration, ε is the learning rate, μ is the momentum factor, and gt is the gradient value for the t-th iteration, then one of the momentum optimizers can be expressed as:

v _t+1 ＝μv _t +εg _t+1

w _t+1 ＝w _t -v _t+1

while another momentum optimizer may be expressed as:

v _t+1 ＝μv _t +g _t+1

w _t+1 ＝w _t ―εv _t+1

In this case, the server may keep the updating mode of updating the target model as the target in different training frameworks, and adjust the other momentum optimizer so that the two momentum optimizers are equivalent, and the adjusted second momentum optimizer may be expressed as:

w _t+1 ＝w _t ―ε _t+1 v _t+1

the server can realize the updating mode of the target model of fairness of different candidate training frameworks by adjusting the transformation function, the weight updating function and the regularization function applied to the gradient to obtain an adjusted framework.

The server may adjust each candidate training frame to obtain each adjusted frame by using one or more of a specified parameter related to training the target model by different candidate training frames, an operator called by the different candidate training frames, and a dependency relationship between each operator, and an update mode when the target model is updated by the different candidate training frames, as a target, and of course, may adjust each training frame by using all conditions as targets for maintaining the equivalence.

S104, determining the time length when the terminal equipment deploying the target model executes the operation of the target model based on the adjusted frames as the operation time length aiming at each adjusted frame.

S105: and determining the priority corresponding to the adjusted framework according to the operation duration.

S106: and determining a target training frame from the candidate training frames according to the priority corresponding to each adjusted frame.

During the model training process, the terminal device deploying the target model framework will always be in one of the idle state or the operational state (busy state). The operation state refers to an operation of a certain or some calculation functions being executed by the terminal device, and the idle state refers to a state that the terminal device is idle and waiting because the calculation functions are not distributed to the terminal device in time.

If the upper layer application is considered as a workload with an interdependence and the terminal device is considered as a service facility carrying out workload functions, the training framework can be considered as a scheduler for workload scheduling between the two. In model training, the training framework needs to map the workload to a computational function that can be executed on the service facility and then invoke the terminal device to perform the computation.

For a fixed upper layer application, the shorter the time the terminal equipment is in an operation state, the more efficient the use of the terminal equipment by the framework is, and the shorter the time the terminal equipment is in an idle state, the more sufficient the use of the terminal equipment by the framework is. Thus, the server can employ an evaluation system of both efficiency and sufficiency to evaluate the performance of the framework.

The two-dimensional evaluation system with high efficiency and sufficiency can better establish the relation between the evaluation result and the technology. In practical application, in order to obtain shorter training time, the training frame needs to be optimized from two aspects, firstly, by designing a more efficient calculation function and adopting an operator fusion technology, the execution time of the calculation function is reduced as much as possible, and therefore the high efficiency of the training frame is improved. Secondly, the framework reasonably schedules resources, time cost of Input/Output (I/O) and central processing unit (Central Processing Unit, CPU) control flow is covered by time for executing a computing function by the terminal equipment, and unnecessary Host equipment interface (Host-Device) synchronous operation is canceled, so that the terminal equipment is prevented from being in an idle state, and the sufficiency of the training framework is improved.

Therefore, the server can determine the priority corresponding to each candidate training frame through the utilization rate and the calculation efficiency of the equipment corresponding to the frames after different adjustment.

Specifically, for high efficiency, the server may measure the operation duration when the terminal device deploying the target model performs the operation of the target model based on each adjusted frame. The server may sample the running state of the terminal device according to a preset sampling period, so as to measure the total time overhead of executing the calculation function by the terminal device in the sampling period, that is, the duration of the terminal device in the operation state in the sampling period. The preset sampling period may be set according to practical situations, which is not specifically limited in the present specification. For each adjusted frame, the operation duration DCT of the terminal device under the adjusted frame can be expressed as:

Wherein t is _s Represents the start time of the sampling period, t _e Represents the end time of the sampling period, D _active(t) Representing the state of the terminal equipment at the t-th moment, D _active(t)dt Can be expressed as:

wherein for any one time in the sampling periodt, when the terminal equipment is in an operation state, D _active(t) And the value of (2) is 1, otherwise 0.

The server can determine the calculation efficiency corresponding to each adjusted frame according to the calculation time DCT corresponding to each adjusted frame so as to express the high efficiency of different training frames through the calculation efficiency. The shorter the operation duration of the terminal equipment is, the higher the calculation efficiency corresponding to the period is, and the higher the use efficiency of the training framework on the terminal equipment is. For example, for the same upper layer application, when a training framework uses a calculation function that takes a shorter time to calculate a certain convolution layer, the calculation time of the terminal device will be less, the calculation efficiency of the training framework is higher, and when a framework repeatedly calculates certain calculation, the calculation time of the terminal device is more, the calculation efficiency is lower, and the calculation efficiency of the training framework is lower.

Of course, the server may also determine a time length for training the target model once through each adjusted frame, and determine the computing efficiency corresponding to each adjusted frame according to the time length.

For sufficiency, the server may determine, according to a ratio of a busy state duration of the terminal device in a sampling period to the sampling duration, an equipment utilization rate of the adjusted frame, so as to express sufficiency of different training frames through the equipment utilization rate, where the equipment utilization rate may be expressed as:

the theoretical value range of DOR is between 0 and 1, the longer the terminal equipment is in idle time, the more the DOR is close to 0, and when the terminal equipment is in a busy state all the time in a sampling period, the value of DOR is the maximum value of 1. The greater the device utilization means the more fully the training framework is used for the terminal device. For example, when the training framework cancels unnecessary synchronization operation, the computing function can be distributed to the terminal equipment earlier, the equipment utilization rate is improved, and the sufficiency of the framework is higher.

Of course, in this specification, the server may determine the operation duration without sampling the state of the terminal device according to a preset sampling period, and instead determine the duration in the operation state when the terminal device deploying the target model trains the target model, as the operation duration.

For easy understanding, the present disclosure provides a schematic diagram of an evaluation method of a training frame, as shown in fig. 3.

Fig. 3 is a schematic diagram of an evaluation method of a training frame provided in the present specification.

The server needs to determine the calculation efficiency and the equipment utilization rate of each adjusted frame on the premise of ensuring the consistency of parameters, consistency of models and consistency of training of each training frame, and takes the calculation efficiency and the equipment utilization rate of each training frame as the corresponding evaluation result of each training frame.

The server can determine the utilization rate and the calculation efficiency of the equipment corresponding to each adjusted frame through the method, so that the priority corresponding to each adjusted frame is determined according to the utilization rate and the calculation efficiency of the equipment corresponding to each adjusted frame, and the target training frame is determined from the candidate training frames according to the priority.

For example, the server may determine, according to model information such as a type or a structure of the model to be trained, whether the training model adopts an efficient training frame (training frame with higher calculation efficiency) or a full-scale computing frame (training frame with higher equipment utilization), so as to select a training frame matched with the model to be trained as the target training frame.

In addition, the server may determine which type of training frame the user prefers to select according to the user's (model developer's) settings, and then select the target training frame according to the user's settings.

Of course, the server may also display the device utilization rate and training efficiency of each training frame as the evaluation result to the user, so that the user selects a target training frame according with the evaluation result.

In addition, the evaluation result can be displayed to a frame developer, so that the frame developer can find out the defects of the frame which is researched and developed by the frame developer according to the equipment utilization rate and the training efficiency of each training frame, and therefore the training frames are adjusted and optimized, and the breakthrough of the frame performance is completed.

S107: and when a second task instruction is received, executing a model training task aiming at the target model through the target training framework.

When the server receives the second task instruction (e.g., an instruction to perform a training task for the target model), the server may perform the training task for the target model through the determined target training frame. It should be noted that, the target model for executing the training task may be a model used for determining the corresponding priority of each adjusted frame, and of course, may be another model to be trained that needs to be trained.

According to the method, a two-dimensional (sufficiency and high efficiency) evaluation system can be established, and the high efficiency and sufficiency evaluation dimension is established through two optimization directions of the training frame, so that the technology and implementation mode used by the frame can be more closely related to the evaluation result. Through the equipment utilization rate and the calculation efficiency of each training frame to the terminal equipment, the evaluation dimension of the high efficiency and the sufficiency can be well quantized, and compared with the prior index, the method is more effective.

In addition, the scheme also provides a method for ensuring fairness of different training frameworks used for evaluation. Compared with the prior method for simply ensuring the same training configuration, the method for ensuring fairness realizes abstracting the computational load of the framework into a computational graph, and realizes the full coverage of the data preprocessing, forward calculation, reverse propagation and weight updating processes in the training process through three equivalent steps, so that unfair factors in different framework realization processes can be effectively found and removed, and the evaluation result is more reliable.

The above is a method for implementing task execution for one or more embodiments of the present disclosure, and based on the same concept, the present disclosure further provides a corresponding task execution device, as shown in fig. 4.

Fig. 4 is a schematic diagram of a task execution device provided in the present specification, including:

a receiving module 401, configured to receive a first task instruction;

an obtaining module 402, configured to obtain a target model and each candidate training frame according to the first task instruction;

an adjustment module 403, configured to adjust each candidate training frame by using at least one of a specified parameter involved in training the target model by the different candidate training frames, an operator called by the different candidate training frames, and a dependency relationship between each operator, and an update manner when the target model is updated by the different candidate training frames, as a target, to obtain each adjusted frame;

a first determining module 404, configured to determine, for each adjusted frame, a duration when the terminal device deploying the target model performs an operation of the target model based on the adjusted frame, as an operation duration;

a second determining module 405, configured to determine, according to the operation duration, a priority corresponding to the adjusted frame;

a third determining module 406, configured to determine a target training frame from the candidate training frames according to the priority corresponding to each adjusted frame;

And the execution module 407 is used for executing the model training task aiming at the target model through the target training framework when receiving the second task instruction.

Optionally, before adjusting each candidate training frame to obtain each adjusted frame, the adjusting module 403 is further configured to determine whether the operator called by each candidate training frame and the dependency relationship between each operator are the same; if not, the candidate training frameworks are adjusted by taking the operator called by each candidate training framework and the dependency relationship between the operators as the target of maintaining the equivalence.

Optionally, the adjusting module 403 is specifically configured to deploy the target model in each candidate training frame, input the same data for the target model under each candidate training frame and set the same parameters, determine whether the target model generates the same output in each candidate training frame, and if so, determine that the operator invoked by each candidate training frame is the same as the dependency relationship between each operator.

Optionally, the adjusting module 403 is specifically configured to select one of the candidate training frames to train the target model; after at least one iteration, the model parameters of the target model are exported and converted into a designated parameter format, and the designated parameter format is loaded into the target models of other candidate training frameworks; and exporting the target model in each candidate training frame into a designated model format, inputting the same data into the target models of the designated model formats, and judging whether the target models of the designated model formats generate the same output.

Optionally, the first determining module 404 is specifically configured to sample the operation state of the terminal device according to a preset sampling period; and determining the duration of the terminal equipment for deploying the target model in the sampling period when the terminal equipment executes the operation of the target model based on the adjusted framework, and taking the duration as the operation duration.

Optionally, the second determining module 405 is specifically configured to determine, according to the operation duration, a device utilization rate and a computing efficiency of the adjusted frame on the terminal device; and determining the priority according to the equipment utilization rate and the computing efficiency.

Optionally, the second determining module 405 is specifically configured to determine the device utilization rate according to the operation duration and a sampling duration corresponding to the sampling period, where the device utilization rate and the operation duration have a positive correlation.

Optionally, the training framework includes: deep learning framework.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a task execution method as provided in fig. 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the task execution method described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of performing a task, comprising:

receiving a first task instruction;

2. The method of claim 1, wherein the specified parameters comprise: at least one of a super parameter involved in preprocessing the input data of the target model, a super parameter corresponding to an operator called by each candidate training frame, a super parameter involved in updating the weight of the target model, and a super parameter related to training performance.

3. The method of claim 1, wherein the updating means comprises: at least one of a transformation function, a weight update function, and a regularization function applied to the gradient.

4. The method of claim 1, wherein before adjusting each candidate training frame to obtain each adjusted frame, the method further comprises:

5. The method of claim 4, wherein determining whether the operator invoked by each candidate training frame and the dependency relationship between each operator are the same, specifically comprises:

6. The method of claim 5, wherein inputting the same data and setting the same parameters for the target model under each candidate training frame, and determining whether the target model produces the same output in each candidate training frame, comprises:

Selecting one of the candidate training frames to train the target model;

7. The method according to claim 1, wherein for each adjusted framework, determining a duration when the terminal device deploying the target model performs an operation of the target model based on the adjusted framework, as the operation duration, specifically includes:

8. The method of claim 7, wherein determining the priority corresponding to the adjusted frame according to the operation duration, specifically comprises:

9. The method of claim 8, wherein determining the device utilization based on the operational duration comprises:

10. The method of claim 8, wherein the longer the operation duration, the less efficient the computation.

11. The method of claim 1, wherein the training framework comprises: deep learning framework.

12. A task execution device, characterized by comprising:

the receiving module receives a first task instruction;

13. The apparatus of claim 12, wherein the first determining module is specifically configured to sample an operation state of the terminal device according to a preset sampling period; and determining the duration of the terminal equipment for deploying the target model in the sampling period when the terminal equipment executes the operation of the target model based on the adjusted framework, and taking the duration as the operation duration.

14. The apparatus of claim 13, wherein the device utilization and computational efficiency of the adjusted framework for the terminal device are determined based on the operational duration; and determining the priority according to the equipment utilization rate and the computing efficiency.

15. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-11.

16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-11 when executing the program.