CN116843010A

CN116843010A - Method and device for predicting neural network model performance and related equipment

Info

Publication number: CN116843010A
Application number: CN202210278743.XA
Authority: CN
Inventors: 林菁; 袁熙昊; 严一超; 王兵
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2023-10-03

Abstract

The application discloses a method for predicting the performance of a neural network model, in particular to a method for obtaining configuration information, such as computational power configuration, memory configuration and the like, of the neural network model and a first hardware platform capable of running the neural network model, so that the prediction model predicts the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform, and a first prediction result is obtained according to the prediction of the prediction model. The performance of the neural network model is predicted according to the configuration information of the first hardware platform, so that the obtained first prediction result is more fit with the performance of the neural network model when the neural network model runs on the first hardware platform. Therefore, the accuracy of predicting the performance of the neural network model can be effectively improved, and a user can conveniently select a proper hardware platform to deploy the neural network model according to the prediction result. In addition, the application also provides a corresponding device and related equipment.

Description

Method and device for predicting neural network model performance and related equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for predicting performance of a neural network model, and related devices.

Background

Currently, neural Networks (NN) are used as one of the main research directions in the field of artificial intelligence (artificial intelligence, AI), and are widely applied to numerous scenes such as computer vision, natural language processing, data searching, information recommendation, etc., and a neural network model can be generally utilized to output results meeting the expectations of users.

In an actual application scenario, before the neural network model is deployed to the hardware platform, whether the performance of the neural network model can be efficiently operated on the hardware platform can be predicted. For example, whether the time delay of the neural network model for outputting the reasoning result in the operation process is lower than a preset threshold value can be predicted, and the neural network model is deployed to the hardware platform when the reasoning time delay of the neural network model is lower than the threshold value.

However, after the neural network model is deployed to the hardware platform, performance of the neural network model in actual operation often does not reach expectations, for example, inference delay of the neural network model is often greater than predicted inference delay, and accuracy of predicting performance of the neural network model is poor.

Disclosure of Invention

The embodiment of the application provides a method, a device and related equipment for predicting the performance of a neural network model, so as to improve the accuracy of predicting the performance of the neural network model.

In a first aspect, an embodiment of the present application provides a method for predicting performance of a neural network model, and specifically, obtains configuration information (such as computational power configuration, memory configuration, etc.) of a neural network model and a first hardware platform capable of running the neural network model, so that the prediction model predicts, according to the configuration information of the first hardware platform, the performance of the neural network model running on the first hardware platform, and obtains a first prediction result according to the prediction of the prediction model.

The performance of the neural network model is predicted according to the configuration information of the first hardware platform, so that the obtained first prediction result is more fit with the performance of the neural network model when the neural network model runs on the first hardware platform. Therefore, the accuracy of predicting the performance of the neural network model can be effectively improved, and a user can conveniently select a proper hardware platform to deploy the neural network model according to the prediction result.

In one possible implementation, not only can the performance of the neural network model running on the first hardware platform be predicted, but also the performance of the neural network model running on the second hardware platform can be predicted. Specifically, configuration information of a second hardware platform capable of running the neural network model is obtained, so that the prediction model predicts the running performance of the neural network model on the second hardware platform according to the configuration information of the second hardware platform, and a second prediction result is obtained. Therefore, the prediction model can be used for respectively predicting the running performance of the neural network model on a plurality of different hardware platforms, and the generalization of the performance prediction is improved.

In a possible implementation manner, when the prediction model predicts the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform, the prediction model may specifically determine a plurality of operators used for predicting the performance in the neural network model first, so that the prediction model may predict the performance of each operator of the plurality of operators running on the first hardware platform according to the configuration information of the first hardware platform, obtain the prediction results of the plurality of operators, and then the prediction model obtains the first prediction result according to the prediction results of the plurality of operators. In this way, a first prediction result of the performance of the neural network model running on the first hardware platform is obtained by performing performance prediction on the operator granularity.

In this embodiment, the process of determining a plurality of operators for predicting performance and the process of obtaining a first prediction result from the prediction results of the plurality of operators are performed by a prediction model.

In one possible implementation manner, before the prediction model predicts the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform, the client or the server determines a plurality of operators used for predicting the performance in the neural network model, so that the prediction model predicts the performance of each operator of the plurality of operators running on the first hardware platform according to the configuration information of the first hardware platform to obtain a prediction result of the plurality of operators, and then the client or the server obtains the first prediction result according to the prediction result of the plurality of operators. In this way, a first prediction result of the performance of the neural network model running on the first hardware platform is obtained by performing performance prediction on the operator granularity.

In this embodiment, the process of determining a plurality of operators for predicting performance and the process of obtaining a first prediction result from the prediction results of the plurality of operators may be performed by a client or a server.

The multiple operators used for predicting the performance in the neural network model may be all operators in the neural network model, or may be part of operators in the neural network model, for example, operators with the worst performance (such as the longest time consumption and the greatest resource consumption) in each layer of network structure of the neural network model.

In a possible implementation manner, the prediction model includes a plurality of sub-prediction models, and each sub-prediction model corresponds to one operator, and when the prediction model predicts the determined performance of the plurality of operators running on the first hardware platform and obtains the prediction result of the plurality of operators, specifically, the first sub-prediction model corresponding to the first operator of the plurality of operators and the attribute information of the first operator are determined first, the first operator is any operator of the plurality of operators, then the configuration information of the first hardware platform and the attribute information of the first operator are input to the first sub-prediction model, and the performance of the first operator is predicted by using the first sub-prediction model to obtain the prediction result of the first operator. And, based on executing the same operation as the first operator on the plurality of operators, the prediction results corresponding to the plurality of operators can be obtained. In this way, the performance corresponding to each of the plurality of operators can be predicted by using the plurality of sub-prediction models.

Optionally, for each operator, the performance of the operator running on the first hardware platform may be predicted by using the prediction model, so that the performance of each operator is respectively predicted by using the prediction model, and a prediction result respectively corresponding to the plurality of operators is obtained.

In one possible implementation manner, when determining the plurality of operators used for predicting the performance of the neural network model running on the first hardware platform in the neural network model, the operator configuration file corresponding to the first hardware platform may be obtained first, where the operator configuration file includes a relationship between operators that can run on the first hardware platform, for example, includes an operator identifier, a fusion rule used for defining fusion of the plurality of operators into one operator, a splitting rule used for defining splitting of the one operator into the plurality of operators, and the like, so that the plurality of operators used for predicting the performance of the neural network model running on the first hardware platform in the neural network model are determined according to the operator configuration file. Therefore, the operators split from the operator configuration file corresponding to the first hardware platform can be matched with the configuration of the first hardware platform, so that the accuracy of predicting the running performance of the neural network model on the first hardware platform can be further improved.

In one possible implementation manner, multiple iterative training is performed on multiple sub-prediction models included in the prediction model, where the second sub-prediction model is any one of the multiple sub-prediction models, and then one iterative training process for the second sub-prediction model includes: and inputting attribute information of a second operator corresponding to the second sub-prediction model, configuration information of at least one hardware platform capable of running the second operator and performance parameters of the second operator running on the at least one hardware platform into the second sub-prediction model, and training the second sub-prediction model according to the input. The configuration information of the hardware platform input in different iterative processes of the multiple iterative training of the second sub-prediction model and the performance parameters of the second operator running on the at least one hardware platform are different. Thus, training of the plurality of sub-prediction models can be finished in advance, so that the performance of the neural network model running on different hardware platforms can be predicted by using the plurality of sub-prediction models after training.

In one possible implementation, the predictive model is built based on a convolutional neural network.

Alternatively, the prediction model may be constructed based on other neural networks, such as a cyclic neural network model.

In one possible implementation, the performance of the neural network model includes the time spent by the neural network model outputting the inference results based on the input data, or the time spent training the neural network model, or the energy spent by the neural network model outputting the inference results based on the input data. Therefore, the performance prediction of the neural network model can be realized in a similar manner according to different application scenes, and the generalization of the performance prediction is improved.

In a second aspect, based on the same inventive concept as the method embodiment of the first aspect, an embodiment of the present application provides an apparatus for predicting performance of a neural network model. The apparatus has functions corresponding to the embodiments of the first aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, embodiments of the present application provide a computing device comprising: a processor and a memory; the memory is configured to store instructions that, when executed by the computing device, cause the computing device to perform the method of predicting neural network model performance in any implementation of the first aspect or the first aspect described above. It should be noted that the memory may be integrated into the processor or may be independent of the processor. The apparatus may also include a bus. The processor is connected with the memory through a bus. The memory may include a readable memory and a random access memory, among others.

In a fourth aspect, embodiments of the present application further provide a readable storage medium having stored therein a program or instructions that, when run on a computer, cause the method of predicting neural network model performance in the first aspect or any implementation of the first aspect described above to be performed.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of predicting neural network model performance in any implementation of the first aspect or the first aspect described above.

In addition, the technical effects caused by any implementation manner of the second aspect to the sixth aspect may refer to the technical effects caused by different implementation manners of the first aspect, which are not described herein.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for those of ordinary skill in the art.

FIG. 1 is a schematic diagram of an exemplary system for predicting performance of a neural network model, according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for predicting performance of a neural network model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an exemplary interactive interface provided by an embodiment of the present application;

FIG. 4 is a flowchart of a method for predicting performance of a neural network model according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for training a predictive model according to an embodiment of the application;

FIG. 6 is a schematic structural diagram of an apparatus for predicting performance of a neural network model according to an embodiment of the present application;

fig. 7 is a schematic hardware structure of a computing device according to an embodiment of the present application.

Detailed Description

Referring to FIG. 1, a schematic diagram of an exemplary system architecture for predicting performance of a neural network model is shown. As shown in fig. 1, the system 100 may include a client 101, a server 102. And, data interaction may be performed between the client 101 and the server 102, such as data communication via hypertext transfer protocol (Hypertext Transfer Protocol, HTTP).

The client 101 may be configured to perform man-machine interaction with a user, for example, receive a neural network model to be predicted provided by the user. The server 102 may provide the client 101 with a prediction model or the like for predicting the performance of the neural network model. In this way, the client 101 may predict the performance of the neural network model using the prediction model after receiving the neural network model provided by the user, or may predict the performance of the neural network model using the prediction model by the server 102 or the like. In an actual application scenario, different hardware platforms have different data processing capabilities, so that when the neural network model runs on the different hardware platforms, the running performance of the neural network model may have a large difference. For example, when the neural network model is run on the hardware platform a, the inference delay of the neural network model is 0.2s (seconds), and when the neural network model is run on the hardware platform B, the inference delay of the neural network model is 2s (seconds), or the like.

Based on the above, the embodiment of the application provides a method for predicting the performance of a neural network model, so as to improve the accuracy of predicting the running performance of the neural network model on a hardware platform. In specific implementation, the client 101 may obtain a neural network model provided by a user and obtain configuration information of a hardware platform capable of running the neural network model, so that the client 101 may predict, through a prediction model, a performance of running the neural network model on the hardware platform according to the neural network model and the configuration information, and obtain a prediction result according to the prediction of the prediction model. Further, the client 101 may present the predicted results to the user.

The performance of the neural network model is predicted according to the configuration information of the hardware platform, so that the predicted result is more fit with the performance of the neural network model when the neural network model runs on the hardware platform. Therefore, the accuracy of predicting the performance of the neural network model can be effectively improved, and a user can conveniently select a proper hardware platform to deploy the neural network model according to the prediction result.

And when the user expects to obtain the prediction results of the running performance of the neural network model on different hardware platforms, the user can provide the configuration information of the different hardware platforms for the client 101, so that the client 101 can predict the running performance of the neural network model on the hardware platform according to the configuration information of the different hardware platforms to obtain different prediction results corresponding to the different hardware platforms.

In practical applications, the server 102 may be deployed in the cloud, for example, may be deployed in public cloud, edge cloud, or distributed cloud, so that the system 100 may provide cloud services for users to predict performance of the neural network model.

It should be noted that the system architecture shown in fig. 1 is only an example, and is not limited to the specific implementation thereof. For example, the system 100 may include a plurality of clients and servers; alternatively, in other possible system architectures, the system 100 for predicting the performance of the neural network model may be a terminal device deployed locally, and the terminal device provides a local prediction service for the user, which is not limited to the specific implementation and deployment scenario of the system 100.

In order that the above objects, features and advantages of the present application will be more readily understood, a more particular description of various non-limiting embodiments of the application will be rendered by reference to the appended drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 2, a flowchart of a method for predicting performance of a neural network model according to an embodiment of the present application may be applied to the system 100 for predicting performance of a neural network model shown in fig. 1, and may be executed by the client 101 or the server 102 in the system 100. In practical application, the method can be applied to other applicable model training systems. For ease of understanding and description, the following is an exemplary description of the method performed by the client 101 in the system 100, and the method may specifically include:

s201: the client 101 acquires configuration information of a neural network model and a hardware platform capable of running the neural network model.

The neural network model may generally include a multi-layer network structure, where each layer of network structure includes at least one operator, so that the operators in each layer of network structure are utilized to perform corresponding operation on input data, so as to obtain an inference result output by the neural network model, which may be used for image classification, object detection, or speech recognition, and the embodiment is not limited in this regard. The neural network model to be processed may be, for example, a feedforward neural network (feedforward neural network, FNN) model, a recurrent neural network (Recurrent Neural Network) model, a convolutional neural network (convolutional neural networks, CNN) model, a Long Short-Time Memory (LSTM) model, or the like, or may be other models, which are not listed here.

In addition, the hardware platform mentioned in step S201 refers to the hardware platform where the neural network model is running. The configuration information of the hardware platform acquired by the client 101 may be a computing capability attribute, a storage capability attribute, and the like in the hardware platform. The computing capability attribute may be one or more of a processor type and a number of processor cores, or may be other computing capability attributes; the storage capability attribute may be, for example, one or more of a memory size and a cache size, or other storage capability attributes.

In one possible implementation, the client 101 may present the user with an interactive interface as shown in FIG. 3. The user may click on the "import" button in the interactive interface to import the file of the neural network model to the client 101, so that the client 101 obtains the neural network model to be processed. Moreover, the user can also input configuration information of the hardware platform on the interactive interface; alternatively, the user may input the identifier of the hardware platform on the interactive interface, so that the client 101 obtains, from the network, configuration information of the hardware platform, and the like, according to the identifier of the hardware platform.

In this embodiment, after obtaining the configuration information of the neural network model and the hardware platform, the client 101 may utilize the prediction model to predict the performance of the neural network model running on the hardware platform according to the configuration information of the hardware platform, and obtain the prediction result according to the prediction of the prediction model.

The performance of the neural network model running on the hardware platform may be, for example, an inference delay of the neural network model, that is, a time consuming period from receiving input data to outputting an inference result according to the input data by the neural network model; alternatively, it may be time consuming for the neural network model to complete model training on the hardware platform; alternatively, the neural network model may output the energy consumption required for the inference result from the input data, and the like. In practical application, the performance of the neural network model may be one or more of time delay, time consumption for training and energy consumption for reasoning, or may be other information that may be used to measure the performance of the model, which is not limited in this embodiment.

Because the client 101 predicts the performance of the neural network model on the hardware platform according to the configuration information of the hardware platform, the prediction result obtained by the client 101 is more consistent with the performance of the neural network model when the neural network model actually runs on the hardware platform. Thus, the accuracy of predicting the performance of the neural network model can be effectively improved.

A specific implementation procedure of performance prediction by the client 101 using the prediction model is described below, and as shown in fig. 2, after step S201 is performed, the following steps S202 to S204 may be continuously performed.

S202: the client 101 determines a plurality of operators for predicting performance in the neural network model.

Because the neural network model performs reasoning according to operators in the neural network model, in the embodiment, the performance of the neural network model running on the hardware platform can be determined by summarizing the performance of each operator. The prediction model for predicting the operation performance of each operator may be trained in advance (for specific training process, see the following description, which is not repeated here).

In particular, the client 101 may determine a topology structure in the neural network model by traversing a file of the neural network model provided by the user, so that a plurality of operators for predicting the operation performance of the neural network model on the hardware platform in the neural network model may be further determined according to the topology structure.

In some examples, the plurality of operators determined by the client 101 may be all operators in the neural network model. For example, when the performance of the neural network model is specifically the energy required for outputting the inference result according to the input data during the operation of the neural network model on the hardware platform, the client 101 may determine all operators included in the neural network model as operators for predicting the performance of the neural network model.

Alternatively, the plurality of operators determined by the client 101 may be part of the operators in the neural network model. For example, the performance of the neural network model is specifically time consumption from receiving input data to outputting an inference result according to the input data in the process of operating the neural network model on a hardware platform, and since a part of operators (such as operators in a layer network structure) may exist in the neural network model to execute a calculation process in the same time period, the client 101 may determine, for each layer network structure, an operator with the greatest calculation time consumption as an operator for predicting the performance of the neural network model, and the other operators in the layer network structure may not participate in a subsequent performance prediction process.

In one possible implementation, the client 101 may be implemented with reference to a pre-generated operator profile when determining a plurality of operators from the topology of the neural network model. Specifically, when determining the hardware platform where the neural network model is running, the client 101 may acquire an operator configuration file corresponding to the hardware platform, where the operator configuration file includes a relationship between operators that can be run by the hardware platform. In this way, the client 101 may split the topology structure of the neural network model by using the operator configuration file, to obtain multiple operators for predicting the performance of the neural network model in the neural network model. Wherein, the client 101 may obtain attribute information of each operator after determining a plurality of operators from the neural network model.

As an implementation example, an operator matching rule may be included in the operator configuration file, where a plurality of identifiers (such as operator names and the like) of operators and a matching rule of the operator identifiers are defined in the operator matching rule, so that the client 101 may perform identifier matching on the operators in the topology structure of the neural network model and the operators in the operator configuration file, and use the matched operators as operators determined from the topology structure.

Further, an operator fusion rule may be defined in the operator configuration file, where the operator fusion rule is used to define that a plurality of operators are fused into one operator, so that after determining the plurality of operators by the identifier matching mode, the client 101 may further fuse some or all of the plurality of operators according to the operator fusion rule. For example, it is assumed that the multiple operators determined by the identification matching include a convolution operator and a batch normalization operator, and data calculated by the convolution operator is used as input of the batch normalization operator, at this time, the client 101 may fuse the convolution operator and the batch normalization operator into one operator according to an operator fusion rule, and in practical application, the operator fusion rule in the operator configuration file may be defined based on a data principle.

In addition, because there are differences in hardware environments in different hardware platforms, among the multiple operators determined by the client 101 through the identification matching manner, there may be a mismatch between the computing power required by some operators and the hardware environment of the hardware platform. Based on this, an operator splitting rule may be defined in the operator configuration file, where the operator splitting rule is used to define splitting an operator into a plurality of operators, so that the split operator matches with a hardware environment of the hardware platform. For example, a part of operators require calculation for 64-bit data each time, and the hardware platform only supports processing 32-bit data each time, at this time, the client 101 may split the operators for processing 64-bit data into two operators for processing 32-bit data, so as to implement processing for 64-bit data by using the split two operators. In practical application, the specific content of the operator splitting rule in the operator configuration file can be defined according to the limitation of the hardware environment in the hardware platform.

It should be noted that, the operator configuration file includes an operator matching rule, an operator fusion rule, and an operator splitting rule as examples for exemplary description, and other rules may be further included in the operator configuration file when the operator configuration file is actually applied, which is not limited in this embodiment. Moreover, the operator matching rule and the operator fusion rule can be used as general configuration independent of a hardware platform, and the operator splitting rule can be used as specific configuration aiming at the hardware platform.

Thus, the operator obtained by splitting the operator configuration file can be matched with the hardware processing capacity (hardware environment) of the hardware platform. In practical application, for hardware platforms with different hardware environments, the operator configuration files corresponding to the hardware platforms can be different. Accordingly, when the user expects to predict the performance of the neural network model when running on a specific hardware platform, the client 101 may obtain, according to the hardware platform specified by the user, an operator configuration file corresponding to the hardware platform, so as to further improve the accuracy of predicting the performance of the neural network model when running on the hardware platform.

S203: the prediction model predicts the running performance of the operators on the hardware platform respectively to obtain prediction results corresponding to the operators respectively.

As an implementation example, the client 101 may perform performance prediction on the plurality of operators one by using the prediction model, so as to obtain a prediction result corresponding to the operator. Wherein the predictive model may be pre-trained by the server 102 and sent to the client 101. In particular implementation, the client 101 may input configuration information of the hardware platform to the prediction model, so that the prediction model outputs a prediction result of the performance of each operator running on the hardware platform. In addition, since the same type of operator in different neural network models may have different configurations in the practical application scenario, for example, in the neural network model a, the input matrix dimension of the convolution operator is 128, and in the neural network model B, the input matrix dimension of the convolution operator is 64, etc. Therefore, in a further possible embodiment, when the client 101 predicts the performance of the operator by using the sub-prediction model, attribute information of the operator and configuration information of the hardware platform may be input into the sub-prediction model together, so that the sub-prediction model can predict the performance of the operator according to the attribute configuration of the operator and the configuration information of the hardware platform. The attribute of the operator may be, for example, one or more of an input matrix dimension, a convolution kernel size, a number of input/output channels, a step size, a zero padding mode, or may be other attribute information, which is not limited in this embodiment.

In yet other implementation examples, the prediction model may include a plurality of sub-prediction models, and each sub-prediction model is used to predict performance of a respective operator in the neural network model. In particular, for any one operator among the multiple operators determined by the client, hereinafter referred to as a first operator, the client 101 may determine a first sub-prediction model corresponding to the first operator from sub-prediction models corresponding to the multiple operators respectively, input configuration information of the hardware platform into the first sub-prediction model, and predict performance of the first operator by the first sub-prediction model to obtain a prediction result of the first operator. Thus, the prediction results corresponding to the operators can be obtained by executing the same operation as the first operator on the operators. Further, based on that operators of the same type in different neural network models may have different configurations in an actual application scenario, the client 101 may input configuration information of a hardware platform and attribute information of the first operator into the first sub-prediction model, so that the first sub-prediction model predicts performance of the first operator according to the input configuration information and attribute information to obtain a prediction result of the first operator, and sub-prediction models corresponding to other operators may also be obtained in a similar manner.

S204: the client 101 obtains a prediction result of the performance of the neural network model running on the hardware platform according to the prediction results of the plurality of operators.

In this embodiment, the client 101 may calculate, according to the prediction results corresponding to each operator, a prediction result corresponding to the performance of the neural network model when running on the hardware platform. For example, when the operator performance indicated by the prediction result corresponding to each operator is specifically the time delay when the operator infers on the hardware platform, the client 101 may add up the inference delays corresponding to each operator, and use the sum obtained finally as the inference delay of the neural network model (i.e. the performance of the neural network model).

In other embodiments, if there may be a part of operators (such as operators in a layer network structure) in the neural network model to execute the calculation process in the same time period, then, for each layer network structure, the client 101 may determine the inference delay corresponding to each operator included in the layer network structure, and use the inference delay as the inference delay corresponding to the layer network structure, so that the client 101 may obtain the inference delay of the neural network model by accumulating the inference delays corresponding to the layer network structures.

S205: the client 101 presents the predicted results of the performance of the neural network model running on the hardware platform.

Finally, the client 101 may present the calculated prediction to the user, so that the user may determine whether to choose to deploy or train the neural network model on the hardware platform based on the prediction.

It should be noted that, in this embodiment, the performance of the neural network model of the prediction model running on one hardware platform is used for the client 101 to predict, and in practical application, the client 101 may use the prediction model to predict the performance of the neural network model running on different hardware platforms for the user, so as to improve generalization of performance prediction. Specifically, the client 101 may receive the neural network model provided by the user and configuration information of a first hardware platform capable of running the neural network model, predict, according to the configuration information of the first hardware platform, a performance of the neural network model running on the first hardware platform by using a prediction model, obtain a first prediction result, and present the first prediction result to the user. When the user provides the configuration information of the second hardware platform to the client 101 again and instructs the client 101 to predict the performance of the neural network model running on the second hardware platform, the client 101 may predict the performance of the neural network model running on the second hardware platform according to the configuration information of the second hardware platform by using the prediction model, to obtain a second prediction result, and present the second prediction result to the user.

It should be noted that, in the embodiment shown in fig. 2, the performance of the neural network model is predicted by the client 101 by taking an example as an illustration, in other embodiments, the client 101 may also send the configuration information of the neural network model to be processed and the hardware platform to the server 102, and the server 102 predicts the performance of the neural network model when running on the hardware platform by using the prediction model; alternatively, the performance of the neural network model running on the hardware platform may be predicted by a local terminal device.

In the embodiment shown in fig. 2, the client 101 determines a plurality of operators for predicting performance in the neural network model, and obtains a prediction result of the performance of the neural network model running on the hardware platform according to the prediction results corresponding to the plurality of operators, and in other embodiments, the process of determining the plurality of operators and generating the prediction result corresponding to the neural network model may be performed by the prediction model.

Specifically, referring to fig. 4, fig. 4 shows a schematic flow chart of another method for predicting the performance of a neural network model. As shown in fig. 4, the method specifically may include:

S401: the client 101 acquires configuration information of a neural network model and a hardware platform capable of running the neural network model.

S402: the client 101 inputs configuration information of the neural network model and the hardware platform to the prediction model.

Wherein the predictive model may be pre-trained by the server 102 and sent to the client 101.

S403: the prediction model determines a plurality of operators for prediction performance in the neural network model.

In specific implementation, the prediction model may obtain an operator configuration file corresponding to the hardware platform, and determine, according to the topology structure of the input neural network model, a plurality of operators for predicting performance in the neural network model, where the operator configuration file includes a relationship between operators that can run on the hardware platform, for example, the operator configuration file includes an operator matching rule, an operator fusion rule, an operator splitting rule, and the like.

S404: the prediction model predicts the running performance of the operators on the hardware platform respectively to obtain prediction results corresponding to the operators respectively.

In specific implementation, the prediction model can predict the performance of a plurality of operators one by one to obtain prediction results corresponding to the operators respectively. Or, the prediction model may include a plurality of sub-prediction models, where each sub-prediction model is configured to predict performance of one of the plurality of operators on the hardware platform, so as to obtain prediction results corresponding to the plurality of operators respectively. And the plurality of sub-prediction models can respectively predict the performances of the plurality of operators in parallel, so that the efficiency of obtaining the prediction results of the plurality of operators can be improved.

S405: and the prediction model obtains the prediction result of the performance of the neural network model running on the hardware platform according to the prediction results of the operators.

In this embodiment, the performance of the neural network model running on one hardware platform is predicted by using the prediction model as an example, and in practical application, the client 101 may predict the performance of the neural network model running on a plurality of different hardware platforms by using the prediction model.

S406: the client 101 presents the predicted results of the performance of the neural network model running on the hardware platform.

In this embodiment, the specific implementation process and technical effects of steps S401 to S406 can be referred to the related descriptions of steps S201 to S205 in the foregoing embodiment, and are not described herein.

In the embodiments shown in fig. 2 and fig. 4, the process of predicting the performance of the neural network model running on the hardware platform by using the prediction model by the client 101 is described as an example, and the process of constructing the prediction model and training the prediction model is described as an example.

The prediction model may include one model, and may or may not include a plurality of different sub-prediction models (e.g., sub-prediction models respectively corresponding to the plurality of operators). For ease of understanding, the process of constructing a predictive model and training the predictive model is described below by way of example with the predictive model comprising a plurality of sub-predictive models. Also, the building and training process for each sub-predictive model may be performed by server 102.

Referring to fig. 5, fig. 5 shows a schematic flow chart of training a second sub-predictive model. As shown in fig. 5, the method specifically may include:

s501: server 102 builds a second sub-predictive model for predicting performance of the second operator running on the at least one hardware platform.

In this embodiment, the sub-prediction models respectively corresponding to the operators may be models constructed based on a neural network. For example, the sub-prediction model may specifically be a cyclic neural network model, a convolutional neural network model, a deep learning network model, or the like. Particularly, when the sub-prediction model is specifically a convolutional neural network model, training and acquisition of the sub-prediction model are relatively simpler, the model is easier to converge, and when the sub-prediction model performs performance prediction, the required calculated amount is relatively smaller, the prediction time delay is relatively shorter, and the like.

In practice, server 102 may construct and train a plurality of sub-predictive models that are used to predict the performance of a plurality of different operators. For ease of understanding and description, in this embodiment, the second sub-prediction model is constructed and trained as an example, and the construction and training processes of the remaining sub-prediction models are similar, and reference may be made to understanding. The process of building the model based on the neural network has many applications in the actual scene, and this embodiment will not be described in detail.

S502: the server 102 obtains training data for training the second sub-prediction model, where the training data includes attribute information of a second operator corresponding to the second sub-prediction model, configuration information of at least one hardware platform capable of running the second operator, and performance parameters of the second operator running on the at least one hardware platform.

Specifically, taking the predicted performance as an example of the computational time consumption of an operator, server 102 may collect the time consumption of a second operator to compute data on multiple hardware platforms. For example, a user (or technician) may randomly construct input data for the second operator and provide it to the server 102. Then, the server 102 may send the input data constructed by the user and the second operator to different hardware platforms together, and collect the time consumed by the second operator for computing the input data on different hardware platforms. Then, the server 102 may obtain configuration information of each hardware platform, take the configuration information of the hardware platform as input of the sub-prediction model, and take the calculation time of the second operator on each hardware platform as output of the sub-prediction model (i.e. a label in the training data), so as to generate the training data.

In practice, different operators of the same type may have different attribute configurations, and thus, in a further possible embodiment, the user (or technician) may construct not only the input data, but also the different attribute configurations of the second operator. Accordingly, the second operator may be configured to run on different hardware platforms based on different attributes. When the server 102 collects data on different hardware platforms, configuration information of the hardware platforms and attribute configuration of the second operators when running on the hardware platforms can be used as inputs of the sub-prediction models, and calculation time of the second operators on each hardware platform is used as output of the sub-prediction models to generate training data.

It should be noted that the process of generating training data is controlled by the server 102, and in other embodiments, the user (or technician) may run second operators with different attribute configurations on different hardware platforms, and count the computation time of the second operators on each hardware platform for the constructed input data, so that the user generates the training data on the user side or the hardware platform side and provides the training data to the server 102. In this embodiment, the specific implementation of how the server 102 obtains the sub-prediction model for training is not limited.

S503: the server 102 inputs training data into the second sub-predictive model.

S504: the server 102 trains the second sub-predictive model based on training data input to the second sub-predictive model.

After obtaining the training data, the server 102 may use the training data to train the sub-prediction model corresponding to the second operator, for example, the server 102 may train the sub-prediction model by using an iterative training manner. In specific implementation, the server 102 may input configuration information (and attribute configuration of the second operator) of the hardware platform in the training data into the sub-prediction model to obtain an inference result (i.e. predicted inference time consumption) output by the sub-prediction model, and then the server 102 may calculate a loss value between the inference result and the actual calculation time consumption in the training data through a corresponding loss function, and update parameters in the sub-prediction model according to the calculated loss value. The server 102 may generate new training data according to the configuration information (and the attribute configuration of the second operator) of the hardware platform currently input to the sub-prediction model and the inference result output by the sub-prediction model, and train the sub-prediction model again by using the newly generated training data and the training data acquired by the server 102. That is, there is a difference between the configuration information of the hardware platform input to the second sub-prediction model in the multiple iterative training and the performance parameter of the second operator running on the at least one hardware platform. And repeating the process, so as to realize the iterative training of the sub-prediction model. In this way, through iterative training of the sub-prediction model, the prediction accuracy and generalization of the sub-prediction model for the performance of the second operator can be effectively improved. In one implementation example, server 102 may iteratively train a sub-predictive model constructed based on a convolutional neural network.

It should be noted that the server 102 may train the sub-prediction model in other manners, such as training the sub-prediction model in a supervised learning manner. Moreover, for sub-prediction models corresponding to any type of operators, the server 102 may complete model training in the above manner (the server 102 may also add the trained sub-prediction models to a database created in advance). In this way, the client 101 may obtain sub-prediction models respectively corresponding to the plurality of operators from the server 102 (or database), so as to predict the operation performance of each operator in the neural network model according to the sub-prediction model respectively corresponding to each operator.

S505: server 102 provides the trained second sub-predictive model to client 101.

In practical applications, the server 102 may train a plurality of different sub-prediction models (respectively used for predicting the performance of different operators) based on the above manner of training the second sub-prediction model, and send the trained plurality of different sub-prediction models to the client 101. In addition, the implementation manner of the foregoing server 102 for training the second sub-prediction model is merely an implementation example, and in other embodiments, the training of the second sub-prediction model may be performed by other devices, for example, devices independent of the client and the server 102, and after the training of the model is completed, the sub-prediction models respectively corresponding to the plurality of operators may be sent to the server 102 or the client 101.

Further, the system 100 for predicting the performance of a neural network model shown in FIG. 1 also supports compatibility with new operators. In particular, when the client 101 performs performance prediction on each operator in the neural network model to be processed by using the sub-prediction models respectively corresponding to the multiple operators provided by the server 102, if a sub-prediction model corresponding to a part of operators (such as a newly developed operator) is absent, the client 101 may send the part of operators and an attribute configuration corresponding to the part of operators to the server 102, and request the server 102 to provide the sub-prediction model corresponding to the part of operators. The server 102 may construct input data for the received operator according to the received attribute configuration, and send the operator, the attribute configuration, and the input data to a plurality of different hardware platforms for test operation, so as to acquire computation time of the operator for the input data from the plurality of different hardware platforms, thereby generating training data based on configuration information of the different hardware platforms, the attribute configuration of the operator, and the computation time of the operator on the different hardware platforms, and training the constructed sub-prediction model by using the training data. Finally, server 102 sends the trained sub-predictive model to client 101 (server 102 may also add the sub-predictive model corresponding to the operator to the database). Thus, when the partial operator is included in the neural network model provided by the user in the subsequent prediction task, the client 101 may predict the performance of the operator in the neural network model by using the sub-prediction model corresponding to the partial operator returned by the server 102, and generate the prediction result corresponding to the neural network model to be processed based on the sub-prediction result corresponding to the operator and the sub-prediction result corresponding to other operators.

The method for predicting the performance of the neural network model and the method for training the prediction model provided by the application are described in detail above with reference to fig. 1 to 5, and the apparatus and the computing device provided by the application are described below with reference to fig. 6 to 7.

The same inventive concept as the above method is also provided in the embodiments of the present application, where the device for predicting the performance of the neural network model may implement the functions of the client 101 in the embodiments shown in fig. 2 and fig. 4, so as to predict the performance of the neural network model running on one or more different hardware platforms; or to implement the functionality of the server 102 in the embodiment shown in fig. 5, to implement the training of the predictive model. Referring to fig. 6, the apparatus 600 for predicting the performance of a neural network model may include:

an obtaining module 601, configured to obtain a neural network model and configuration information of a first hardware platform capable of running the neural network model;

and the prediction module 602 is configured to predict, according to the configuration information of the first hardware platform, a performance of the neural network model running on the first hardware platform by using the prediction model, and obtain a first prediction result according to the prediction of the prediction model.

In a possible implementation manner, the obtaining module 601 is further configured to obtain configuration information of a second hardware platform capable of running the neural network model;

the prediction module 602 is further configured to predict, according to the configuration information of the second hardware platform, a performance of the neural network model running on the second hardware platform by using the prediction model, so as to obtain a second prediction result.

In a possible implementation manner, the prediction module 602 is configured to:

determining a plurality of operators in the neural network model for predicting the performance using the prediction model;

predicting the performance of the operators running on the first hardware platform by using the prediction model to obtain the prediction results of the operators;

and obtaining the first prediction result according to the prediction results of the operators by using the prediction model.

determining a plurality of operators in the neural network model for predicting the performance of the neural network model running on the first hardware platform before predicting the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform by using a prediction model;

and obtaining the first prediction result according to the prediction results of the operators.

In a possible implementation manner, the prediction model comprises a plurality of sub-prediction models, and each sub-prediction model corresponds to one operator;

the prediction module 602 is configured to:

determining a first sub-prediction model corresponding to a first operator in the plurality of operators and attribute information of the first operator, wherein the first operator is any operator in the plurality of operators;

inputting the configuration information and the attribute information into the first sub-prediction model;

the first sub-prediction model predicts the performance of the first operator to obtain a prediction result of the first operator;

and respectively executing the same operation as the first operator on the operators to obtain the prediction results of the operators.

acquiring an operator configuration file corresponding to the first hardware platform, wherein the operator configuration file comprises the relation between operators which can be operated by the first hardware platform;

And determining the operators according to the operator configuration file.

In one possible implementation, the apparatus 600 further includes:

the training module 603 is configured to perform multiple iterative training on a plurality of sub-prediction models included in the prediction model, where the second sub-prediction model is any one of the plurality of sub-prediction models, and a one-time iterative training process of the training module 603 on the second sub-prediction model includes:

inputting attribute information of a second operator corresponding to the second sub-prediction model, configuration information of at least one hardware platform capable of running the second operator and performance parameters of the second operator running on the at least one hardware platform to the second sub-prediction model;

training the second sub-predictive model according to the input;

and the configuration information of the hardware platform inputting the second sub-prediction model in the multi-iteration training and the performance parameters of the second operator running on the at least one hardware platform are different.

In one possible implementation, the performance of the neural network model includes time consumption of the neural network model to output the inference result based on the input data, or time consumption of training the neural network model, or energy consumption of the neural network model to output the inference result based on the input data.

The apparatus 600 for predicting the performance of the neural network model in this embodiment corresponds to the method shown in fig. 2 to 5, and therefore, for the specific implementation of each functional module and the technical effects thereof in the apparatus for predicting the performance of the neural network model in this embodiment, reference may be made to the description of the relevant places in the foregoing embodiments, and details are not repeated herein.

In addition, an embodiment of the present application further provides a computing device, as shown in fig. 7, where a computing device 700 may include a communication interface 710 and a processor 720. Optionally, a memory 730 may also be included in computing device 700. The memory 730 may be internal to the computing device 700 or external to the computing device 700. Illustratively, various actions described above in the embodiment of FIG. 3 may be implemented by processor 720. The processor 720 may obtain the neural network model provided by the client 101 and configuration information of the hardware platform, etc. through the communication interface 710, and may be used to implement any of the methods performed in fig. 2, 4, and 5. In implementation, the steps of the process flow may be performed by integrated logic circuitry in hardware in processor 720 or instructions in software to perform the methods performed in fig. 2, 4, and 5. For brevity, the description is omitted here. Program code executed by processor 720 to implement the methods described above may be stored in memory 730. Memory 730 is coupled to processor 720, such as by a coupling connection or the like.

Some features of embodiments of the present application may be implemented/supported by processor 720 executing program instructions or software code that is stored in memory 730. The software components loaded on the memory 730 may be functionally or logically summarized, for example, the prediction module 602, training module 603 shown in fig. 6. And the functionality of the acquisition module 601 may be implemented by the communication interface 710.

Any communication interface referred to in the embodiments of the present application may be a circuit, a bus, a transceiver, or any other device that may be used to perform information interaction. Such as communication interface 710 in computing device 700, which may be, for example, a device connected to computing device 700, etc.

The processors referred to in the embodiments of the present application may be general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, modules or modules, which may be in electrical, mechanical or other form for information interaction between the devices, modules or modules.

The processor may operate in conjunction with the memory. The memory may be a nonvolatile memory such as a hard disk (HDD) or a Solid State Drive (SSD), or may be a volatile memory (RAM) such as a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such.

The embodiment of the application is not limited to the specific connection medium among the communication interface, the processor and the memory. Such as memory, processor, and communication interfaces, may be connected by a bus. The buses may be classified as address buses, data buses, control buses, etc.

Based on the above embodiments, the present application further provides a computer storage medium, where a software program is stored, where the software program, when read and executed by one or more processors, can implement the method for predicting the performance of the neural network model provided in any one or more of the above embodiments. The computer storage medium may include: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

Based on the above embodiments, the present application further provides a chip, where the chip includes a processor, and the function of the apparatus 600 for predicting the performance of a neural network model according to the above embodiments is implemented. Optionally, the chip further comprises a memory for the necessary program instructions and data to be executed by the processor. The chip may be formed by a chip, or may include a chip and other discrete devices.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes.

It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is also intended to include such modifications and variations.

Claims

1. A method of predicting neural network model performance, the method comprising:

acquiring configuration information of a neural network model and a first hardware platform capable of running the neural network model;

the prediction model predicts the running performance of the neural network model on the first hardware platform according to the configuration information of the first hardware platform;

and obtaining a first prediction result according to the prediction of the prediction model.

2. The method according to claim 1, wherein the method further comprises:

acquiring configuration information of a second hardware platform capable of running the neural network model;

and the prediction model predicts the running performance of the neural network model on the second hardware platform according to the configuration information of the second hardware platform to obtain a second prediction result.

3. The method according to claim 1 or 2, wherein the predicting the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform comprises:

the prediction model determines a plurality of operators in the neural network model for predicting the performance;

the prediction model predicts the running performance of the operators on the first hardware platform according to the configuration information of the first hardware platform to obtain the prediction results of the operators;

the obtaining a first prediction result according to the prediction of the prediction model comprises:

and the prediction model obtains the first prediction result according to the prediction results of the operators.

4. The method according to claim 1 or 2, wherein before the prediction model predicts the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform, the method further comprises:

determining a plurality of operators in the neural network model for predicting the performance;

the predicting, by the prediction model, the performance of the neural network model running on the first hardware platform according to the configuration information of the first hardware platform includes:

5. The method of claim 3 or 4, wherein the prediction model comprises a plurality of sub-prediction models, each sub-prediction model corresponding to an operator;

the prediction model predicts the running performance of the operators on the first hardware platform according to the configuration information of the first hardware platform, and the obtaining of the prediction results of the operators comprises the following steps:

6. The method of any of claims 3 to 5, wherein said determining a plurality of operators in the neural network model for predicting the performance comprises:

and determining the operators according to the operator configuration file.

7. The method according to any one of claims 3 to 6, further comprising:

respectively performing multiple iterative training on a plurality of sub-prediction models included in the prediction model, wherein a second sub-prediction model is any one sub-prediction model in the plurality of sub-prediction models, and one iterative training process for the second sub-prediction model comprises:

Training the second sub-predictive model according to the input;

8. The method according to any one of claims 1 to 7, wherein the predictive model is constructed based on a convolutional neural network.

9. The method according to any of claims 1 to 8, wherein the performance of the neural network model comprises the time consumption of the neural network model to output the inference result based on the input data, or the time consumption of training the neural network model, or the energy consumption of the neural network model to output the inference result based on the input data.

10. An apparatus for predicting performance of a neural network model, the apparatus comprising:

the acquisition module is used for acquiring the neural network model and the configuration information of a first hardware platform capable of running the neural network model;

the prediction module is used for predicting the running performance of the neural network model on the first hardware platform according to the configuration information of the first hardware platform by using a prediction model; and obtaining a first prediction result according to the prediction of the prediction model.

11. The apparatus of claim 10, wherein the obtaining module is further configured to obtain configuration information of a second hardware platform capable of running the neural network model;

and the prediction module is further used for predicting the performance of the neural network model running on the second hardware platform according to the configuration information of the second hardware platform by using the prediction model to obtain a second prediction result.

12. The apparatus according to claim 9 or 10, wherein the prediction module is configured to:

13. The apparatus according to claim 9 or 10, wherein the prediction module is configured to:

14. The apparatus according to claim 12 or 13, wherein the prediction model comprises a plurality of sub-prediction models, each sub-prediction model corresponding to an operator;

the prediction module is used for:

15. The apparatus according to any one of claims 12 to 14, wherein the prediction module is configured to:

And determining the operators according to the operator configuration file.

16. The apparatus according to any one of claims 12 to 15, further comprising:

the training module is configured to perform multiple iterative training on multiple sub-prediction models included in the prediction model, where the second sub-prediction model is any one of the multiple sub-prediction models, and a one-time iterative training process of the training module on the second sub-prediction model includes:

training the second sub-predictive model according to the input;

17. The apparatus of any one of claims 10 to 16, wherein the predictive model is constructed based on a convolutional neural network.

18. The apparatus according to any one of claims 10 to 17, wherein the performance of the neural network model comprises a time consumption of the neural network model to output the inference result based on the input data, or a time consumption of training the neural network model, or an energy consumption of the neural network model to output the inference result based on the input data.

19. A computing device, the computing device comprising a processor and a memory;

the processor is configured to execute instructions stored in the memory to cause the computing device to perform the method of any one of claims 1 to 9.

20. A computer readable storage medium comprising instructions for implementing the method of any one of claims 1 to 9.

21. A computer program product containing instructions which, when run on a computing device, cause the computing device to perform the method of any of claims 1 to 9.