CN113469322B

CN113469322B - Method, device, equipment and storage medium for determining executable program of model

Info

Publication number: CN113469322B
Application number: CN202010247043.5A
Authority: CN
Inventors: 刘伟良
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2023-07-18
Anticipated expiration: 2040-03-31
Also published as: CN113469322A

Abstract

The application discloses a method, a device, equipment and a storage medium for determining an executable program of a model, and belongs to the technical field of computers. The method comprises the following steps: obtaining model architecture information for indicating a dependency relationship between a plurality of operators of a model; determining an executable program of a model based on code optimization modes of a plurality of operators included in a search space, model architecture information and an operator logic library, wherein each operator corresponds to at least one code optimization mode; each time an executable program of the model is determined, performing model training iteration once based on the currently determined executable program, and determining training iteration time corresponding to the executable program; when the current model training iteration meets the stop optimizing condition, selecting the executable program with the shortest training iteration duration from all the determined executable programs as a target executable program of the model. Thus, the training speed of the model is accelerated by combining the determining executable program and the model training iteration.

Description

Method, device, equipment and storage medium for determining executable program of model

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an executable program of a model.

Background

When one wants to use a machine learning model to solve a problem, one needs to first select a model and then train the model according to training data so that the model can meet the user's needs.

In the related art, generally, a manually-written general running code is compiled to obtain an executable program, and then training data is used as input of the executable program, and parameters of a model are continuously adjusted by running the executable program to train the model. However, the executable program is obtained by compiling the general-purpose running code, so that optimization is not performed for a specific model, the time consumption of single iteration is increased, and the model training usually needs to be performed for a plurality of iteration processes, so that the time required for model training is long, and sometimes, days and even months are required, and the efficiency of model training is reduced.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for determining an executable program of a model, which can solve the problem that model training efficiency is reduced due to overlong time required by model training in related technologies.

The technical scheme is as follows:

in one aspect, there is provided a method of determining an executable program of a model, the method comprising:

obtaining model architecture information of a model, wherein the model architecture information is used for indicating the dependency relationship among a plurality of operators of the model;

determining an executable program of the model based on a plurality of operator optimization modes, the model architecture information and an operator logic library, wherein the plurality of operator optimization modes comprise code optimization modes of a plurality of operators, and each operator corresponds to at least one code optimization mode;

when one executable program of the model is determined, performing one model training iteration based on the currently determined executable program, and determining the training iteration time corresponding to the currently determined executable program;

when the current model training iteration meets the stop optimizing condition, selecting an executable program with the shortest training iteration duration from all the determined executable programs as a target executable program of the model.

In one possible implementation manner of the present application, the determining the executable program of the model based on the multiple operator optimization manners included in the search space, the model architecture information and the operator logic library includes:

Acquiring operation logic description codes of the operators from the operator logic library;

determining a combination of code optimization modes of a plurality of operators as a model optimization mode based on a plurality of operator optimization modes included in the search space;

and generating an executable program of the model based on the model optimization mode, the model architecture information and the running logic description codes of the operators.

In one possible implementation manner of the present application, the current model training iteration meets a stop-optimizing condition, including:

counting the number of combination determination times, wherein the number of combination determination times refers to the number of combinations of code optimization modes of the plurality of operators;

and when the combination determination times reach a time threshold, determining that the current model training iteration meets the stop optimizing condition.

detecting whether a new combination of code optimization modes of the operators exists in the search space or not, wherein the new combination is different from a combination determined before the current time;

and when no new combination of the code optimization modes of the operators exists in the search space, determining that the current model training iteration meets the stop optimizing condition.

and if the currently determined training iteration duration is smaller than a training iteration duration threshold, determining that the current model training iteration meets the stop optimizing condition.

In one possible implementation manner of the present application, the obtaining the running logic description code of the plurality of operators from the operator logic library includes:

acquiring operation logic description codes of a plurality of operators corresponding to the forward propagation process of the model from the operator logic library to acquire the operation logic description codes of the operators; or,

acquiring operation logic description codes of a plurality of operators corresponding to the back propagation process of the model from the operator logic library to acquire the operation logic description codes of the operators; or,

and acquiring operation logic description codes of a plurality of operators corresponding to the forward propagation process of the model and operation logic description codes of a plurality of operators corresponding to the backward propagation process from the operator logic library to obtain the operation logic description codes of the operators.

In another aspect, there is provided an apparatus for determining an executable program of a model, the apparatus comprising:

The system comprises an acquisition module, a calculation module and a calculation module, wherein the acquisition module is used for acquiring model architecture information of a model, and the model architecture information is used for indicating the dependency relationship among a plurality of operators of the model;

the first determining module is used for determining an executable program of the model based on a plurality of operator optimization modes included in the search space, the model architecture information and an operator logic library, wherein the plurality of operator optimization modes include code optimization modes of the plurality of operators, and each operator corresponds to at least one code optimization mode;

the second determining module is used for performing one model training iteration based on the currently determined executable program when determining one executable program of the model, and determining the training iteration duration corresponding to the currently determined executable program;

and the selection module is used for selecting the executable program with the shortest training iteration duration from all the determined executable programs as the target executable program of the model when the current model training iteration meets the stop optimizing condition.

In one possible implementation manner of the present application, the first determining module is configured to:

In one possible implementation manner of the present application, the selecting module is configured to:

In another aspect, an apparatus is provided, the apparatus comprising a memory for storing a computer program and a processor for executing the computer program stored on the memory to implement the steps of the method of determining a model executable program as described above.

In another aspect, a computer readable storage medium is provided, in which a computer program is stored, which when being executed by a processor, implements the steps of the method of determining a model of an executable program as described above.

In another aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the method of determining a model executable program as described above.

The technical scheme that this application provided can bring following beneficial effect at least:

the method comprises the steps of obtaining model architecture information of a model, wherein the model architecture information is used for indicating the dependency relationship among a plurality of operators of the model, and determining an executable program of the model according to the model architecture information, a plurality of operator optimization modes included in a search space and an operator logic library, wherein the plurality of operator optimization modes comprise code optimization modes of the plurality of operators, and each operator corresponds to at least one code optimization mode. Since the model architecture information of different models is different, the executable program determined by the model architecture information of the model has stronger pertinence and is more suitable for the model. And then, when one executable program of the model is determined, carrying out model training iteration based on the currently determined executable program, determining the training iteration time of each time, and when the current model training iteration meets the condition of stopping optimizing, selecting the executable program with the shortest training iteration time from all the determined executable programs as the target executable program of the model. Therefore, when the target executable program of the model is determined, the model architecture information of the model is fully considered, so that the executable program has pertinence, the model training iteration and the determination executable program are combined in a single iteration process, two operations are performed in the time of one iteration, the training speed of the model is further improved, and the training efficiency of the model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment, shown in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a method of determining an executable program of a model, according to an example embodiment;

FIG. 3 is a schematic diagram illustrating a method of determining an executable program of a neural network model, according to an example embodiment;

FIG. 4 is a schematic diagram illustrating a method of determining an executable program of a model, according to another exemplary embodiment;

FIG. 5 is a schematic diagram illustrating an apparatus for determining an executable program of a model according to an exemplary embodiment;

fig. 6 is a schematic diagram illustrating a structure of an apparatus according to an exemplary embodiment.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before explaining the method for determining the executable program of the model provided by the embodiment of the application in detail, the application scenario and the implementation environment provided by the embodiment of the application are described.

First, an application scenario provided in the embodiment of the present application is described.

In training a model, a specific model architecture is generally defined, and a plurality of model parameters are included in the model architecture, so that the training aims at enabling the model to meet the requirements of users by adjusting the model parameters. Specifically, the model parameters are set to be random numbers, then the universal running codes manually written by the user are compiled to obtain an executable program, the executable program comprises a model execution program and a model parameter adjustment program, then training data is used as input of the model execution program, the executable program is operated to obtain output of the executable program, the output and expected output included in the training data are used as input of the model parameter adjustment program, parameter adjustment data for adjusting the model parameters are obtained, adjustment of the model parameters is achieved, so that repeated iteration is carried out for thousands of times, thousands of training data are used until the accuracy of the model reaches a certain requirement, and the model parameters can be considered to be adjusted, namely the model is trained. However, the executable program obtained according to the general-purpose running code may not be the most suitable executable program for training the model, so that the time required for training the model is long, and the efficiency of training the model is reduced.

In practical implementation, in order to improve efficiency, a certain optimization mode is sometimes manually added, so that the execution speed of the executable program is faster. However, since the user does not know the super parameters of the model when writing the general-purpose running code, optimization cannot be performed according to the super parameters of the model, and the general-purpose running code must be written in a general form so that various super parameters can be handled. The executable program thus obtained is not determined for a specific model, so that the time to execute the executable program is long, and thus the time required for model training is long. The super-parameters are parameters obtained by non-training in the model, and are usually set before the model is trained. For example, in a neural network model, the output size, convolution and magnitude of the convolution layer are the super-parameters of the convolution layer.

To this end, the present application proposes a method of determining an executable program of a model. The method can generate an executable program aiming at the model in the model training process, and determines the executable program with the highest running speed as a target executable program in the iterative training process, so that the problems can be solved, and the specific implementation modes can be seen in the following embodiments.

Next, an implementation environment provided by the embodiments of the present application will be described.

The method for determining the execution code of the model can be realized by a terminal.

As an example, the terminal may include a GPU (Graphics Processing Unit, graphics processor) or a CPU (central processing unit, central processor). The training of the model may run on GPU hardware and the generation of the executable program may run on CPU hardware.

As shown in fig. 1, the terminal may include a search space module 101, an executable program generation module 102, and a model training module 103. The search space module 101 may store a code optimization manner of a plurality of operators, for providing a code optimization manner for each operator according to a certain policy, the executable program generating module 102 is configured to receive the code optimization manner provided by the search space module 101, generate an executable program according to the code optimization manner, and the model training module 103 is configured to perform model training, and record a running time of the executable program, that is, a training iteration duration of the model training.

As an example, the terminal may be any electronic product that can perform man-machine interaction with a user through one or more of a keyboard, a touch pad, a touch screen, a remote control, a voice interaction or handwriting device, such as a PC (Personal Computer, a personal computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device, a palm top computer PPC (Pocket PC), a tablet computer, a smart car machine, a smart television, a smart sound box, etc.

Those skilled in the art will appreciate that the above terminals are only examples, and that other terminals or servers that may be present in the present application or in the future are also included within the scope of the present application and are incorporated herein by reference.

After the application scenario and the implementation environment provided by the embodiments of the present application are introduced, a method for determining the executable program of the model provided by the embodiments of the present application is explained in detail below.

Fig. 2 is a flowchart illustrating a method for determining an executable program of a model according to an exemplary embodiment, which is described by way of example in the above-described implementation environment. Referring to fig. 2, the method may include the steps of:

step 201: model architecture information of a model is obtained, the model architecture information being used to indicate dependencies between a plurality of operators of the model.

The model architecture is a composition structure of a model, the architecture information of the model comprises a plurality of operators and a connection sequence among each operator, and each operator comprises a plurality of super parameters.

Wherein the operator is a calculation unit in the model. For example, in a neural network model, a convolutional layer, a pooling layer, a fully-connected layer, and the like are operators.

As an example, the model architecture information of the model may be stored in the terminal in the form of a file, and the model architecture information of the model may be obtained by reading the file.

Step 202: and determining an executable program of the model based on a plurality of operator optimization modes, model architecture information and operator logic libraries included in the search space.

The plurality of operator optimization modes comprise code optimization modes of a plurality of operators, and each operator corresponds to at least one code optimization mode.

As an example, the code optimization manner of multiple operators included in the search space may be preset by the user according to the super parameters of each operator in the model architecture information, so as to be able to adapt to the model, or all possible code optimization manners of each operator may be automatically generated by the terminal, which is not limited in the embodiment of the present application.

As an example, the code optimization may include: data segmentation optimization, data transfer DMA (Direct Memory Access ) optimization, data transfer and computation parallel optimization, platform computing core binding optimization of core computing, general loop optimization, and the like. If an optimization mode of data segmentation optimization adopting tensors is selected, the optimization process of data segmentation of the tensors needs to be continuously determined, for example, how large the data of the tensors are segmented into unit data.

In some embodiments, based on the plurality of operator optimizations, model architecture information, and operator logic libraries included in the search space, determining the executable program for the model may include the steps of:

(1) The method comprises the steps of obtaining operation logic description codes of a plurality of operators from an operator logic library.

Wherein the operator's execution logic description code is a special code that is used to describe the operator's execution logic, as opposed to the code that is used to compile the executable program.

As an example, the operator logic library is configured to store the operation logic description codes of all operators in the model, and the identification information of the operators may be stored correspondingly to the operation logic description codes of the operators. When the operation logic description codes of the operators are obtained, the operators required by model training can be determined according to model architecture information, and then the operation logic description codes of the operators corresponding to the identification information are obtained from an operator logic library according to the identification information of the operators.

As another example, when the model is a neural network model, since the training of the neural network model includes two parts of forward propagation and backward propagation, the running logic description code of the corresponding operator at the time of forward propagation is different from the running logic description code of the corresponding operator at the time of backward propagation. Thus, when the training process of the model includes both forward propagation and backward propagation, it is necessary to determine whether to optimize the running code of the forward propagation process or the running code of the backward propagation process, and then determine how to obtain the running logic description codes of the plurality of operators.

In one possible implementation, obtaining the run logic description code for the plurality of operators from the operator logic library may include: and acquiring the operation logic description codes of a plurality of operators corresponding to the forward propagation process of the model from the operator logic library to obtain the operation logic description codes of the operators. Or, obtaining the operation logic description codes of a plurality of operators corresponding to the back propagation process of the model from the operator logic library to obtain the operation logic description codes of the operators. Or, obtaining the operation logic description codes of a plurality of operators corresponding to the forward propagation process of the model and the operation logic description codes of a plurality of operators corresponding to the backward propagation process from the operator logic library to obtain the operation logic description codes of the plurality of operators.

That is, only the operation logic description codes of the plurality of operators corresponding to the forward propagation process, or only the operation logic description codes of the plurality of operators corresponding to the backward propagation process, or the operation logic description codes of the plurality of operators corresponding to the forward propagation process and the backward propagation process, respectively, may be acquired.

In the operator logic library, the identification information of the operator and the operation logic description code of the operator can be correspondingly stored, and different marks are used for distinguishing the operation logic description code of the operator corresponding to forward propagation and backward propagation during storage. For example, the running logic description code of the corresponding operator is labeled a for forward propagation, and the running logic description code of the corresponding operator is labeled B for reverse propagation.

For example, when the operation logic description codes of the operators are acquired, the operators corresponding to the model training forward propagation process can be determined according to the model architecture information, then the operation logic description codes of the operators corresponding to the identification information are determined from the operator logic library according to the identification information of the operators, and then the operation logic description codes marked as A are acquired from the operation logic description codes of the operators. Or, according to the model architecture information, determining a plurality of operators corresponding to the model training back propagation process, then according to the identification information of the operators, determining the operation logic description codes of the operators corresponding to the identification information from an operator logic library, and then acquiring the operation logic description codes marked as B from the operation logic description codes of the operators. Or, a plurality of operators corresponding to a forward propagation process and a plurality of operators corresponding to a backward propagation process of model training can be determined according to model architecture information, and because the operators for forward propagation and backward propagation are the same, the operation logic description codes of the operators corresponding to the identification information can be obtained from an operator logic library according to the identification information of the operators corresponding to the forward propagation or the identification information of the operators corresponding to the backward propagation.

It should be noted that the foregoing description is only given by taking a neural network model as an example, and in other embodiments, the method is applicable to any model that includes two steps of model execution and model parameter adjustment when model training is performed, which is not limited in this embodiment of the present application.

(2) Based on a plurality of operator optimization modes included in the search space, a combination of code optimization modes of a plurality of operators is determined as a model optimization mode.

In some embodiments, multiple operators may be included in the search space in code optimization, with each operator corresponding to at least one code optimization. A strategy may be provided to determine a code optimization from the search space for a plurality of operators.

As an example, when determining the model optimization mode, code optimization modes corresponding to a plurality of operators may be obtained from the search space at random or in a certain order each time, and the obtained code optimization modes of the plurality of operators are combined to determine the model optimization mode.

As another example, an optimization mode determination model can be trained in advance through a code optimization mode and training iteration time, the code optimization modes of a plurality of operators can be combined at will to obtain a plurality of code optimization mode combinations, then the plurality of code optimization mode combinations are input into the optimization mode determination model, the training iteration time of the plurality of code optimization mode combinations is predicted, and the code optimization mode combination with the shortest training iteration time is determined as the model optimization mode.

In other embodiments, each operator may correspond to a search space that may include at least one code optimization mode to which the operator corresponds.

As an example, when determining the model optimization mode, each time the code optimization mode corresponding to each operator can be randomly obtained from the search space of each operator, the code optimization modes corresponding to the operators are combined, and the model optimization mode is determined.

As another example, an optimization mode determination model may be trained in advance through a code optimization mode and a training iteration time, at least one code optimization mode of any operator may be input into the optimization mode determination model, a training iteration time of at least one code optimization mode of the operator is predicted, a code optimization mode with the shortest training iteration time is determined as the target code optimization mode selected by the operator, the above operation is performed on each operator of the plurality of operators, thereby determining the target code optimization modes of the plurality of operators, combining the target code optimization modes of the plurality of operators, and determining the combination as the model optimization mode.

(3) And generating an executable program of the model based on the model optimization mode, the model architecture information and the running logic description codes of the operators.

As an example, a computational graph capable of representing a connection relationship between operators can be obtained according to model architecture information, then the computational graph is subjected to optimization such as operator fusion or operator deletion, the operation logic description codes of a plurality of operators are combined according to the optimized computational graph to determine a complete operation logic description code, the complete operation logic description code is converted into an intermediate representation code, optimization processing is performed on the intermediate representation code according to a model optimization mode, the intermediate representation code after optimization processing is converted into an optimized operation code, and the optimized operation code is compiled to generate an executable program.

As another example, the optimized run code may be generated directly through the TVM framework as well as the executable program.

When determining the executable program of the model, the method can determine the optimization running code of the model not only by referring to the specific type of each operator of the model, but also by referring to the specific super-parameters of each operator, and further can determine the optimization running code of the model based on the hardware characteristics of the terminal.

It should be noted that, in other embodiments, the model optimization mode may be determined without considering a combination mode, and the code optimization mode of at least one operator is determined each time, so as to generate the executable program of the model according to the code optimization mode of the at least one operator, the model architecture information and the running logic description codes of a plurality of operators. The code optimization of at least one operator may be determined randomly, in a certain order, or by an optimization.

Step 203: and when one executable program of the model is determined, performing model training iteration based on the currently determined executable program, and determining the training iteration time corresponding to the currently determined executable program.

That is, each time an executable program of the model is determined, a model training iteration is performed based on the executable program, and a training iteration duration of the model training iteration is determined.

In implementation, when an executable program of the model is determined, the model can be subjected to iterative training by running the executable program, the running time of the executable program is determined, namely, the training iteration time of the iterative training is recorded, and the training iteration time is used as the training iteration time corresponding to the currently determined executable program. The shorter the running time of the executable program, the better the running performance of the executable program, the shorter the training iteration time of the model, namely the faster the training speed of the model. The longer the running time of the executable program, the worse the running performance of the executable program, the longer the training iteration time of the model, i.e. the slower the training speed of the model.

As an example, the terminal may run the executable program, when starting running, the timing module in the terminal may be started to start timing, after the terminal finishes running, the timing module ends the timing, and then counts the overall duration from the start timing to the end timing, and determines the overall duration as the training iteration duration of the training iteration of the model. The process of model training iteration is actually a process of continuously adjusting model parameters. The numerical value of the model parameter can be randomly generated when the model training iteration is carried out for the first time, the executable program is generated each time through different optimization modes, and the model parameter can be changed after the executable program is operated to carry out the model training iteration, so that the model parameter used each time is the model parameter adjusted after the last model training iteration when the training iteration time of different executable programs is tested.

As an example, referring to fig. 3, when the model is a neural network model, if the executable program is generated according to only the running logic description code of the operator corresponding to the forward propagation process, only the training iteration duration required for the forward propagation may be determined. Alternatively, if the executable code is generated only from the running logic description code of the operator corresponding to the back propagation process, only the training iteration duration required for back propagation may be determined. Or if the executable program is generated according to the operation logic description codes of the operators corresponding to the forward propagation and the backward propagation, the training iteration time required by performing one iteration training can be directly recorded.

In some embodiments, in the process of running the executable program to perform one model iteration training, training iteration duration corresponding to each operator may be recorded respectively, and the code optimization mode of each operator this time and the training iteration duration corresponding to each operator this time may be recorded correspondingly. For an operator, after performing model iterative training for a plurality of times, the training iteration duration corresponding to the operator in each code optimization mode can be recorded.

When determining the training iteration time of the currently determined executable program, the training iteration time and the currently determined executable program can be input into the optimization mode determination model, so that the optimization mode determination model is trained, and the optimization mode determination model is more accurate and is more suitable for the current model.

The method combines the process of determining the executable program and the process of model training iteration, so that the optimization of the executable program and the model training iteration are performed in the process of one iteration training, the model training speed is increased, and the model training efficiency is improved.

Step 204: when the current model training iteration meets the stop optimizing condition, selecting the executable program with the shortest training iteration duration from all the determined executable programs as a target executable program of the model.

That is, when the current model training iteration satisfies the stop optimizing condition, it is explained that the executable program satisfying the condition has been determined, the process of redefining the executable programs of the model may be stopped, and the executable program having the shortest training iteration duration among all the executable programs may be determined as the target executable program of the model.

In some embodiments, determining whether the current model training iteration satisfies the stop-optimizing condition may include the following three implementations:

the first implementation mode: and when the number of times of combination determination reaches a frequency threshold, determining that the current model training iteration meets the stop optimizing condition.

The frequency threshold may be set by a user, or may be set by a default of the terminal, which is not limited in the embodiment of the present application.

That is, whether the model optimization mode needs to be continuously determined or not may be determined according to the number of combination determinations. The number of times threshold can be preset, when the number of times of combination determination does not reach the number of times threshold, the model optimization mode which is tried at present can be considered to be less, the model optimization mode needs to be continuously determined, and an executable program of the model is determined according to the determined model optimization mode, namely, the fact that the current model training iteration does not meet the condition of stopping optimizing is determined. When the number of combined determinations reaches the number threshold, it is indicated that enough attempts have been made, and possibly that subsequent attempts are relatively time consuming, so that continued determination of the model optimization mode, i.e., determination that the current model training iteration satisfies the stop-optimizing condition, may be stopped.

The second implementation mode: when determining whether the current model training iteration meets the stop-optimizing condition, it may be detected whether there is still a new combination of code optimization modes for a plurality of operators in the search space, the new combination being different from the combination determined before the current time. When a new combination of code optimization modes of a plurality of operators does not exist in the search space, determining that the current model training iteration meets the stop optimizing condition.

That is, it may be considered that, depending on whether all possible model optimization modes have been tried, if all possible model optimization modes have not been tried, i.e., a new combination of code optimization modes that also include a plurality of operators that have not been used in the search space, it may be considered that further attempts to try the new model optimization mode are still needed, and an executable program of the model is determined based on the new model optimization mode, i.e., it is determined that the current model training iteration does not satisfy the stop-optimizing condition. When the search space does not include new combinations of code optimizations for a plurality of operators that are not used, it may be considered that all combinations of code optimizations have been tried, no new code optimization combinations have been used, and determining the model optimization may be stopped, i.e., determining that the current model training iteration satisfies the stop-optimizing condition.

Third implementation: if the currently determined training iteration duration is smaller than the training iteration duration threshold, determining that the current model training iteration meets the stop optimizing condition.

The training iteration duration threshold may be set by a user, or may be set by a default of the terminal, which is not limited in the embodiment of the present application.

That is, when determining whether the current model training iteration satisfies the stop-optimizing condition, it may be determined whether the currently determined training iteration duration is less than a training iteration duration threshold. When the currently determined training iteration time length is greater than or equal to the time length threshold, determining that an executable program meeting the condition is not found yet, and continuing to determine the model executable program, namely determining that the current model training iteration does not meet the stop optimizing condition. When the currently determined training iteration duration is less than the duration threshold, it may be determined that an executable program that satisfies the condition has been found, and the executable program does not need to be redetermined, i.e., it is determined that the current model training iteration satisfies the stop-optimizing condition.

In other embodiments, when the training iteration duration corresponding to a certain operator in the process of iterative training of a certain model is smaller than the training iteration duration threshold, the code optimization mode of the operator at this time can be considered as an object code optimization mode, and in the process of subsequently determining the executable program of the model, the code optimization mode can be continuously used for the operator.

The target code optimization mode is a code optimization mode which enables training iteration time of a single operator to be shortest. That is, for any operator, the code optimization mode for which the corresponding training iteration duration is the shortest may be referred to as the object code optimization mode.

In the above case, determining whether the current model iteration training satisfies the specific implementation of the stop-optimizing condition may include: when the training iteration time length corresponding to each operator is smaller than the training iteration time length threshold value, determining that the current code optimization mode of each operator is the target code optimization mode, namely determining that the current model iteration training meets the stop optimization condition.

As an example, the number of code optimizations that each operator has been used may also be determined, when the number of code optimizations that a certain operator has been used is greater than a number threshold, continuing to determine the code optimizations of the operator may be stopped, and the target code optimization of the operator may be determined according to training iteration durations corresponding to the plurality of code optimizations of the operator, respectively. Further, when the number of code optimization modes that each operator has been used is greater than the number threshold, it is indicated that a large number of code optimization mode attempts have been made for each operator, and if the operation is continued, time is wasted, the continued determination of the code optimization modes of the operators can be stopped, that is, it can be determined that the current model training iteration satisfies the stop optimizing condition.

In implementation, when the current model training iteration does not meet the stop-optimizing condition, the executable program of the model can be continuously determined until the combination determination times reach the time threshold, or until no new combination of code optimization modes of a plurality of operators exists in the search space, or until the current determined training iteration duration is smaller than the training iteration duration threshold, the current model training iteration can be considered to meet the stop-optimizing condition.

As an example, when the current model training iteration does not meet the stop-optimizing condition, i.e. the combined determination number does not reach the number threshold, the step of repeatedly executing the executable program for determining the model may be returned until the combined determination number reaches the number threshold, which indicates that a plurality of attempts have been made, and it may be considered that no attempt for model optimization is required, i.e. the current model training iteration meets the stop-optimizing condition.

As another example, when the current model training iteration does not satisfy the stop-optimizing condition, i.e., a new combination of code optimization modes that further include a plurality of operators in the search space is detected, the step of repeatedly executing the executable program that determines the model may be returned until a new combination of code optimization modes that do not include a plurality of operators in the search space is detected, indicating that all code optimization modes in the search space have been used, without further attempts to perform the model optimization modes, i.e., the current model training iteration satisfies the stop-optimizing condition.

As yet another example, when the current model training iteration does not satisfy the stop-optimizing condition, i.e., the currently determined training iteration time period is greater than or equal to the time period threshold, the step of repeatedly executing the executable program that determines the model may be returned until the currently determined training iteration time period is less than the time period threshold, indicating that an executable program that satisfies the condition has been found, without re-determining the executable program, i.e., the current model training iteration satisfies the stop-optimizing condition.

In some embodiments, when the current model training iteration meets the stop optimizing condition, the executable program with the shortest training iteration duration can be determined to be the target executable program of the model from all executable programs.

Because the executable program with the shortest training iteration time has the best running performance and can make the model training iteration speed reach the fastest, the executable program with the shortest training iteration time can be determined as the executable program of the model.

When the currently determined training iteration duration is smaller than the duration threshold, the currently determined training iteration duration is the shortest training iteration duration, and the currently determined executable program can be directly used as the target executable program of the model.

In other embodiments, when the current model training iteration satisfies the stop-optimizing condition, an executable program corresponding to the object code optimization mode of the plurality of operators may be determined, and the executable program may be determined as the executable program of the model. In implementation, since the object code optimization mode is a code optimization mode capable of enabling the training iteration time corresponding to each operator to be the shortest, the object code optimization modes of each operator can be combined to obtain a model optimization mode, an executable program is generated according to the model optimization mode, and the executable program can be directly determined to be the target executable program of the model.

Further, referring to fig. 4, after determining the executable program of the model, the model may be continuously trained by the executable program, and whether to stop training is determined according to the reference training condition. If the reference training condition is reached, the training can be stopped, the target model is output, and if the reference training condition is not reached, the training can be continuously performed through the executable program until the reference training condition is reached, and the target model is output.

The reference training condition may be a model accuracy or an iteration number.

As an example, when the model accuracy meets the requirements, the output of the model may be considered to have met the user requirements, model training may be considered to have been completed, and the current model may be determined to be the target model.

As another example, when the number of iterations of model training reaches a certain number, the model may be considered to have been trained for a long time, the user needs may be satisfied, the model may be considered to have been trained, and the current model may be determined to be the target model.

According to the method and the device, model architecture information of the model is fully considered when the executable program of the model is determined, the optimized running code is generated in a targeted mode, the training iteration time of the executable program on the actual terminal is used for judging, and the executable program with the shortest training iteration time for the specific terminal can be more accurately selected. The executable program with the shortest training iteration time is determined as the target executable program of the model, and the model training iteration is carried out by using the determined executable program of the model, so that the training iteration time of the model training iteration is reduced, and the model training speed is accelerated. In addition, the method and the device combine executable program optimizing of the model and model training iteration, and save model training time.

After model training is finished, the target executable program of the model can be output for later use when the problem is solved by the model.

In the embodiment of the application, model architecture information of a model is obtained, the model architecture information is used for indicating a dependency relationship among a plurality of operators of the model, and an executable program of the model can be determined according to the model architecture information, a plurality of operator optimization modes included in a search space and an operator logic library, wherein the plurality of operator optimization modes include code optimization modes of the plurality of operators, and each operator corresponds to at least one code optimization mode. Since the model architecture information of different models is different, the executable program determined by the model architecture information of the model has stronger pertinence and is more suitable for the model. And then, when one executable program of the model is determined, carrying out model training iteration based on the currently determined executable program, determining the training iteration time of each time, and when the current model training iteration meets the condition of stopping optimizing, selecting the executable program with the shortest training iteration time from all the determined executable programs as the target executable program of the model. Therefore, when the target executable program of the model is determined, the model architecture information of the model is fully considered, so that the executable program has pertinence, the model training iteration and the determination executable program are combined in a single iteration process, two operations are performed in the time of one iteration, the training speed of the model is further improved, and the training efficiency of the model is improved.

Fig. 5 is a schematic structural diagram of an apparatus for determining an executable program of a model, which may be implemented as part or all of a device, which may be a terminal shown in an implementation environment, by software, hardware, or a combination of both, according to an exemplary embodiment. Referring to fig. 5, the apparatus includes: an acquisition module 501, a first determination module 502, a second determination module 503, and a selection module 504.

An obtaining module 501, configured to obtain model architecture information of a model, where the model architecture information is used to indicate a dependency relationship between multiple operators of the model;

a first determining module 502, configured to determine an executable program of a model based on a plurality of operator optimization modes, model architecture information, and operator logic libraries included in a search space, where the plurality of operator optimization modes include code optimization modes of a plurality of operators, and each operator corresponds to at least one code optimization mode;

a second determining module 503, configured to perform a model training iteration based on the currently determined executable program every time an executable program of the model is determined, and determine a training iteration duration corresponding to the currently determined executable program;

and the selecting module 504 is configured to select, when the current model training iteration meets the stop optimizing condition, the executable program with the shortest training iteration duration from all the determined executable programs as the target executable program of the model.

In one possible implementation manner of the present application, the first determining module 502 is configured to:

acquiring operation logic description codes of a plurality of operators from an operator logic library;

In one possible implementation of the present application, the selection module 504 is configured to:

counting the number of combination determination times, wherein the number of combination determination times refers to the number of times of determining the combination of code optimization modes of a plurality of operators;

when the combination determination times reach the times threshold, determining that the current model training iteration meets the stop optimizing condition.

detecting whether a new combination of code optimization modes of a plurality of operators exists in the search space or not, wherein the new combination is different from a combination determined before the current time;

when a new combination of code optimization modes of a plurality of operators does not exist in the search space, determining that the current model training iteration meets the stop optimizing condition.

and if the currently determined training iteration duration is smaller than the training iteration duration threshold, determining that the current model training iteration meets the stop optimizing condition.

acquiring operation logic description codes of a plurality of operators corresponding to a forward propagation process of a model from an operator logic library to obtain operation logic description codes of the operators; or,

acquiring operation logic description codes of a plurality of operators corresponding to a back propagation process of a model from an operator logic library to obtain operation logic description codes of the operators; or,

and acquiring the operation logic description codes of a plurality of operators corresponding to the forward propagation process of the model and the operation logic description codes of a plurality of operators corresponding to the backward propagation process from an operator logic library to obtain the operation logic description codes of the operators.

It should be noted that: the apparatus for determining an executable program of a model provided in the above embodiment only illustrates the division of the above functional modules when determining an executable program of a model, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. In addition, the device for determining the executable program of the model provided in the above embodiment belongs to the same concept as the method embodiment for determining the executable program of the model, and the specific implementation process is detailed in the method embodiment, which is not described herein again.

Fig. 6 is a block diagram of an apparatus 600 according to an embodiment of the present application. The device 600 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Device 600 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the apparatus 600 includes: a processor 601 and a memory 602.

Processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 601 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 601 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 601 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 601 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the method of determining an executable program of a model provided by the method embodiments in the present application.

In some embodiments, the apparatus 600 may further optionally include: a peripheral interface 603, and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 603 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 604, a touch display 605, a camera 606, audio circuitry 607, a positioning component 608, and a power supply 609.

Peripheral interface 603 may be used to connect at least one Input/Output (I/O) related peripheral to processor 601 and memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 601, memory 602, and peripheral interface 603 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 604 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 604 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 604 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 604 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited in this application.

The display screen 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 605 is a touch display, the display 605 also has the ability to collect touch signals at or above the surface of the display 605. The touch signal may be input as a control signal to the processor 601 for processing. At this point, the display 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 605 may be one, providing a front panel of the device 600; in other embodiments, the display 605 may be at least two, each disposed on a different surface of the device 600 or in a folded configuration; in still other embodiments, the display 605 may be a flexible display, disposed on a curved surface or a folded surface of the device 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 605 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 606 is used to capture images or video. Optionally, the camera assembly 606 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 607 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing, or inputting the electric signals to the radio frequency circuit 604 for voice communication. The microphone may be provided in a plurality of different locations of the apparatus 600 for stereo acquisition or noise reduction purposes. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 601 or the radio frequency circuit 604 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 607 may also include a headphone jack.

The location component 608 is used to locate the current geographic location of the device 600 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 608 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

The power supply 609 is used to power the various components in the device 600. The power source 609 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 609 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the device 600 further includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: acceleration sensor 611, gyroscope sensor 612, pressure sensor 613, fingerprint sensor 614, optical sensor 615, and proximity sensor 616.

The acceleration sensor 611 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the apparatus 600. For example, the acceleration sensor 611 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 601 may control the touch display screen 605 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 611. The acceleration sensor 611 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 612 may detect the body direction and the rotation angle of the apparatus 600, and the gyro sensor 612 may collect the 3D motion of the user on the apparatus 600 in cooperation with the acceleration sensor 611. The processor 601 may implement the following functions based on the data collected by the gyro sensor 612: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 613 may be disposed on a side frame of device 600 and/or below touch screen 605. When the pressure sensor 613 is disposed at a side frame of the apparatus 600, a grip signal of the apparatus 600 by a user may be detected, and a left-right hand recognition or a shortcut operation may be performed by the processor 601 according to the grip signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed at the lower layer of the touch display screen 605, the processor 601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 605. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 614 is used for collecting the fingerprint of the user, and the processor 601 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 601 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 614 may be provided on the front, back, or side of the device 600. When a physical key or vendor Logo is provided on device 600, fingerprint sensor 614 may be integrated with the physical key or vendor Logo.

The optical sensor 615 is used to collect ambient light intensity. In one embodiment, processor 601 may control the display brightness of touch display 605 based on the intensity of ambient light collected by optical sensor 615. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 605 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 605 is turned down. In another embodiment, the processor 601 may also dynamically adjust the shooting parameters of the camera assembly 606 based on the ambient light intensity collected by the optical sensor 615.

A proximity sensor 616, also referred to as a distance sensor, is typically provided on the front panel of the device 600. The proximity sensor 616 is used to capture the distance between the user and the front of the device 600. In one embodiment, when the proximity sensor 616 detects a gradual decrease in the distance between the user and the front of the device 600, the processor 601 controls the touch display 605 to switch from the bright screen state to the off screen state; when the proximity sensor 616 detects that the distance between the user and the front of the device 600 gradually increases, the touch display 605 is controlled by the processor 601 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 6 is not limiting of the device 600 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In some embodiments, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the method of determining an executable program of a model in the above embodiments. For example, the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that the computer readable storage medium mentioned in the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.

It should be understood that all or part of the steps to implement the above-described embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the method of determining a model of an executable program as described above.

The above embodiments are provided for the purpose of not limiting the present application, but rather, any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of determining an executable program of a model, the method comprising:

2. The method of claim 1, wherein the determining the executable program of the model based on the plurality of operator optimizations included in the search space, the model architecture information, and operator logic libraries comprises:

3. The method of claim 2, wherein the current model training iteration satisfies a stop-optimizing condition, comprising:

4. The method of claim 2, wherein the current model training iteration satisfies a stop-optimizing condition, comprising:

5. The method of claim 1 or 2, wherein the current model training iteration satisfies a stop-optimizing condition, comprising:

6. The method of claim 2, wherein the obtaining the run logic description code for the plurality of operators from the operator logic library comprises:

7. An apparatus for determining an executable program of a model, the apparatus comprising:

8. The apparatus of claim 7, wherein the first determination module is to:

9. The apparatus of claim 8, wherein the first determination module is to:

10. An apparatus comprising a memory for storing a computer program and a processor for executing the computer program stored on the memory to perform the steps of the method of any of the preceding claims 1-6.

11. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-6.