CN110555486B

CN110555486B - Model structure delay prediction method and device and electronic equipment

Info

Publication number: CN110555486B
Application number: CN201910860570.0A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2022-04-19
Anticipated expiration: 2039-09-11
Also published as: CN110555486A

Abstract

The application discloses a model structure delay prediction method and device and electronic equipment, and relates to the field of neural network search. The specific implementation scheme is as follows: training according to a hardware delay lookup table to obtain a classification model and a plurality of regression models; inputting the operation in the model structure into a classification model to obtain a delay class label corresponding to the operation; and inputting the operation with the delay category label into a regression model corresponding to the delay category to obtain the predicted delay. Because the classification model and the multiple regression models can be trained through operations in the hardware delay lookup table, the trained classification model and regression models are used for predicting the delay of any operation. Any operation delay is predicted through limited sampling, the delay of any model structure is estimated very efficiently, the hardware equipment is not required to be directly connected to estimate the delay of the model structure, and the technical problem that the delay testing efficiency is low due to the fact that time is consumed for building a traditional hardware lookup table is solved.

Description

Model structure delay prediction method and device and electronic equipment

Technical Field

The application relates to the field of computer vision, in particular to the field of neural network search.

Background

Deep learning techniques have enjoyed tremendous success in many directions, and NAS technology (Neural Architecture Search) has become a research hotspot in recent years. The NAS is a neural network architecture which is automatically searched out in a massive search space by replacing fussy manual operation with an algorithm. Many constraints, such as the delay of the model structure on the specific hardware, need to be considered in the actual search task. The scheme for obtaining the delay of the model structure in specific hardware can include testing the delay of the model by directly connecting hardware equipment. However, the direct connection hardware test delay is very inconvenient. In addition, during distributed search, a great number of hardware devices need to be connected to test delay, the hardware devices generate heat due to long-time work, so that test results are inaccurate, and meanwhile, the automatic search time of the model structure is long.

At present, a method of establishing a delay lookup table on a hardware device is generally adopted to test the delay of a model structure. However, when the search space is large, all operations included in the search space are acquired, and the delay lookup table is built according to a large number of operations. The table building takes months or even years, so that the scheme has great limitation in practical application, is only suitable for scenes with very small search space and is not suitable for scenes with larger search space.

Disclosure of Invention

Embodiments of the present application provide a delay prediction method and apparatus for a model structure, and an electronic device, so as to solve one or more technical problems in the prior art.

In a first aspect, an embodiment of the present application provides a method for predicting delay of a model structure, where the method includes:

training according to a hardware delay lookup table to obtain a classification model and a plurality of regression models;

inputting the operation in the model structure into a classification model to obtain a delay class label corresponding to the operation;

and inputting the operation with the delay category label into a regression model corresponding to the delay category to obtain the predicted delay.

In this embodiment, since the classification model and the multiple regression models can be trained by the operations in the hardware delay look-up table, the trained classification model and regression models are used to predict the delay of any operation. Any operation delay is predicted through limited sampling, the delay of any model structure is estimated very efficiently, the hardware equipment is not required to be directly connected to estimate the delay of the model structure, and the technical problem that the delay testing efficiency is low due to the fact that time is consumed for building a traditional hardware lookup table is solved.

In one embodiment, the training according to the hardware delay look-up table obtains a classification model and a plurality of regression models, including:

establishing a training set and a test set according to a hardware delay lookup table, wherein the training set and the test set are not intersected;

training by utilizing a training set to obtain a classification model;

evaluating the classification accuracy of the classification model by using the test set;

and under the condition that the classification accuracy is smaller than the first threshold, expanding the training set, training the classification model by using the expanded training set until the classification accuracy is larger than or equal to the first threshold, and finishing the training.

In the embodiment, the classification accuracy of the classification model is judged through the test set, and the classification accuracy of the classification model is adjusted according to the expanded training set, so that the classification accuracy of the classification model is improved.

In one embodiment, evaluating classification accuracy of a classification model using a test set includes:

acquiring a plurality of real categories corresponding to the test set;

inputting the test set into a classification model to obtain a plurality of prediction categories;

and comparing each prediction category with each real category to obtain the classification accuracy of the classification model.

In the embodiment, the prediction category and the real category are compared to determine the classification accuracy of the classification model, so that the classification result of the classification model is closer to the real classification condition.

In one embodiment, training the classification model and the plurality of regression models according to the hardware delay look-up table further comprises:

acquiring a plurality of subsets corresponding to the delay categories from the training set;

training by utilizing the subset to obtain a regression model corresponding to each delay category;

evaluating the prediction accuracy of each regression model by using the test set;

and expanding the subset under the condition that the prediction accuracy of each regression model is smaller than a second threshold, training each regression model by using the expanded subset until the prediction accuracy of each regression model is larger than or equal to the second threshold, and finishing the training.

In the embodiment, the regression accuracy of the regression model is judged through the test set, and the regression accuracy of the regression model is adjusted according to the expanded training set, so that the prediction accuracy of the regression model is improved.

In one embodiment, evaluating the prediction accuracy of each regression model using a test set comprises:

acquiring real time delay corresponding to the test set;

inputting the test set into a regression model to obtain a prediction delay;

and comparing the predicted delay with the real delay to obtain the prediction accuracy of the regression model.

In this embodiment, the predicted delay and the real delay are compared to determine the regression accuracy of the regression model, so that the predicted delay result of the regression model is closer to the real delay condition.

In a second aspect, there is provided a delay prediction apparatus of a model structure, including:

the model training module is used for obtaining a classification model and a plurality of regression models according to the training of the hardware delay lookup table;

the time delay type acquisition module is used for inputting the operation in the model structure into the classification model to obtain a time delay type label corresponding to the operation;

and the delay prediction module is used for inputting the operation with the delay category label into the regression model corresponding to the delay category to obtain the predicted delay.

In one embodiment, the model training module comprises:

the first construction submodule is used for establishing a training set and a test set according to the hardware delay lookup table, and the training set and the test set are not intersected;

the first training submodule is used for training by using a training set to obtain a classification model;

the first testing submodule is used for evaluating the classification accuracy of the classification model by using the test set;

and the second training submodule is used for expanding the training set under the condition that the classification accuracy is smaller than the first threshold, training the classification model by using the expanded training set until the classification accuracy is larger than or equal to the first threshold, and finishing the training.

In one embodiment, the first test submodule includes:

the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for acquiring a plurality of real categories corresponding to a test set;

the second unit is used for inputting the test set into the classification model to obtain a plurality of prediction categories;

and the third unit is used for comparing each prediction category with each real category to obtain the classification accuracy of the classification model.

In one embodiment, the model training module further comprises:

the second construction submodule is used for acquiring a plurality of subsets corresponding to the delay categories from the training set;

the third training submodule is used for obtaining a regression model corresponding to each delay category by utilizing the subset training;

the second testing submodule is used for evaluating the prediction accuracy of each regression model by using the test set;

and the fourth training sub-module is used for expanding the subset under the condition that the prediction accuracy of each regression model is smaller than the second threshold value, training each regression model by using the expanded subset until the prediction accuracy of each regression model is larger than or equal to the second threshold value, and finishing the training.

In one embodiment, the second test submodule includes:

the fourth unit is used for acquiring the real time delay corresponding to the test set;

a fifth unit, configured to input the test set into the regression model to obtain a prediction delay;

and the sixth unit is used for comparing the predicted delay with the real delay to obtain the prediction accuracy of the regression model.

One embodiment in the above application has the following advantages or benefits: because the technical means of predicting the time delay of any operation on the hardware equipment by adopting the classification model and the regression model, the technical problems of long table building time, complicated time delay prediction steps and low efficiency caused by directly connecting the hardware equipment to evaluate the time delay of the model structure are solved, and the technical effects of predicting any operation time delay through limited sampling and evaluating the time delay of any model structure very efficiently are achieved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flowchart of a delay prediction method for a model structure according to an embodiment of the present application;

FIG. 2 is a scene diagram of a delay prediction method for a model structure according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of another method for predicting delay of a model structure according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram of a classification model training and verification method provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a classification accuracy calculation process provided according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of another method for predicting delay of a model structure according to an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram illustrating a regression model training and verification method according to an embodiment of the present disclosure;

FIG. 8 is a schematic view of a calculation process of prediction accuracy according to an embodiment of the present application;

fig. 9 is a block diagram of a delay prediction apparatus of a model structure according to an embodiment of the present application;

fig. 10 is a block diagram of a delay prediction apparatus of another model structure according to an embodiment of the present application;

fig. 11 is a block diagram of a delay prediction apparatus of another model structure according to an embodiment of the present application;

fig. 12 is a block diagram of an electronic device for implementing a model structure delay prediction method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

EXAMPLE I …

In one embodiment, as shown in fig. 1, a method for predicting a delay of a model structure is provided, which includes:

step S10: training according to a hardware delay lookup table to obtain a classification model and a plurality of regression models;

step S20: inputting the operation in the model structure into a classification model to obtain a delay class label corresponding to the operation;

step S30: and inputting the operation with the delay category label into a regression model corresponding to the delay category to obtain the predicted delay.

In one example, as shown in FIG. 2, the hardware latency lookup table includes all operations in the search space and the latency that each operation incurs in the hardware device. Each operation includes data for the size of the convolution kernel, the number of channels, and the step size. Firstly, a hardware delay lookup table is used for training a classification model about delay, and delay is classified according to the distribution of the delay. For example, the method can be classified into types of 1ms to 5ms, 5ms to 10ms, 10ms to 40ms, and the like, or types of 10 to 100ms, 100ms to 1000ms, 1000ms to 5000ms, and the like. And inputting any operation into the classification model to obtain a delay classification label corresponding to the operation. It should be noted that the type of classification is determined according to the delay distribution obtained by testing each operation in different hardware devices, and the adaptive adjustment is performed according to the actual situation, which is within the protection scope of the present embodiment. The classification model can adopt a decision tree classification model, a random forest classification model and the like, and is within the protection scope of the embodiment. Then, a regression model is respectively corresponding to a plurality of delay categories output by the classification model. The regression model is used for carrying out delay prediction on the operation according to the delay category to obtain the predicted delay. For example, in the case of the delay type 5ms to 10ms, the predicted delay obtained by the corresponding regression model 1 is 6 ms. In the case of the delay class 10-100ms, the predicted delay of the operation using the corresponding regression model n is 90 ms.

In the delay prediction method provided by this embodiment, since the classification model and the multiple regression models can be trained through operations in the hardware delay lookup table, the trained classification model and regression models are used to predict the delay of any operation. Any operation delay is predicted through limited sampling, the delay of any model structure is estimated very efficiently, the delay of the model structure is not required to be estimated by directly connecting hardware equipment, the problem that a traditional hardware lookup table needs to be built for months or even years is avoided, and the technical problem that the delay testing efficiency is low due to the fact that the time for building the traditional hardware lookup table is consumed is solved.

In one embodiment, as shown in fig. 3, step S10 includes:

step S101: establishing a training set and a test set according to a hardware delay lookup table, wherein the training set and the test set are not intersected;

step S102: training by utilizing a training set to obtain a classification model;

step S103: evaluating the classification accuracy of the classification model by using the test set;

step S104: and under the condition that the classification accuracy is smaller than the first threshold, expanding the training set, training the classification model by using the expanded training set until the classification accuracy is larger than or equal to the first threshold, and finishing the training.

In one example, the training set and the test set each include a plurality of operations, and the plurality of operations in the training set and the plurality of operations in the test set do not intersect. For example, the training set includes a first operation, a second operation, and a third operation, where the convolution kernel size in the first operation is 3 × 3, the number of channels is 256, and the latency on the hardware device is 5 ms. The convolution kernel size in the second operation is 5 x 5, the number of channels is 28, and the latency on the hardware device is 100 ms. The convolution kernel size in the third operation is 9 x 9, the number of channels is 64, and the latency on the hardware device is 16 ms. The test set includes a fourth operation with a convolution kernel size of 3 x 3, a channel number of 16, and a delay on the hardware device of 0.1ms, and a fifth operation. In the fifth operation, the convolution kernel size is 9 × 9, the number of channels is 256, and the delay on the hardware device is 60 ms. The training set is used to train the classification model. The test set is used for testing the accuracy of the classification precision so as to adjust the classification model and the regression model.

As shown in fig. 4, the classification model is trained by using a hardware delay look-up table, and the classification accuracy of the trained classification model is evaluated on the test set. And judging whether the classification accuracy meets the requirement, and if so, stopping updating the classification model. If the operation is not satisfied, adding more operations into the training set, expanding the training set, training the classification model by using the expanded training set, and updating the classification model until the classification model obtained by final updating satisfies the classification accuracy. The first threshold value compared with the classification accuracy is adaptively adjusted according to the actual situation, and is within the protection scope of the present embodiment. And judging the classification accuracy of the classification model through the test set, and adjusting the classification accuracy of the classification model according to the expanded training set to improve the classification accuracy of the classification model.

In one embodiment, as shown in fig. 5, step S103 includes:

step S1031: acquiring a plurality of real categories corresponding to the test set;

step S1032: inputting the test set into a classification model to obtain a plurality of prediction categories;

step S1033: and comparing each prediction category with each real category to obtain the classification accuracy of the classification model.

In one example, real delays obtained by testing each operation in the test set in the hardware device are recorded, and are classified according to real delay distribution to obtain a plurality of real categories. And classifying the operation delays in the test set by using a classification model to obtain a plurality of prediction classes. For example, the real categories may include 1ms-30ms, 30ms-100ms, 100ms-500ms, while the predicted categories may include 1ms-5ms, 5ms-10ms, 10ms-40 ms. After comparison, a classification accuracy of 20% was obtained. The calculation method of the classification accuracy can be various, and the adaptive selection is performed according to the actual needs, which are all within the protection scope of the embodiment. And comparing the predicted category with the real category to determine the classification accuracy of the classification model, so that the classification result of the classification model is closer to the real classification condition.

In one embodiment, as shown in fig. 6, step S10 includes:

step S105: acquiring a plurality of subsets corresponding to the delay categories from the training set;

step S106: training by utilizing the subset to obtain a regression model corresponding to each delay category;

step S107: evaluating the prediction accuracy of each regression model by using the test set;

step S108: and expanding the subset under the condition that the prediction accuracy of each regression model is smaller than a second threshold, training each regression model by using the expanded subset until the prediction accuracy of each regression model is larger than or equal to the second threshold, and finishing the training.

In one example, as shown in fig. 7, in order to make the predicted delay more accurate, it is necessary to train corresponding regression models for different delay categories respectively. When training the regression model corresponding to each delay category, a subset corresponding to each delay category needs to be selected from the training set. For example, the training set includes the first to tenth operations with a delay category label of 1ms-5ms, the fiftieth to sixty operations with a delay category label of 5ms-10ms, and the thirtieth, thirty-third, and thirty-fifth operations with a delay category label of 10ms-40 ms. Training by using a first subset consisting of the first operation to the tenth operation to obtain a first regression model, training by using a second subset consisting of the fifty-th operation to the sixteenth operation to obtain a second regression model, and training by using a third subset consisting of the thirtieth operation, the thirty-third operation and the thirty-fifth operation to obtain a third regression model. And evaluating the prediction accuracy of the first regression model, the second regression model and the third regression model by using the test set. If the prediction accuracy of the first regression model is less than the second threshold, the first subset is expanded.

For example, sixty-sixth to seventy operations conforming to a latency class label of 1ms-5ms may be added to the first subset. And continuously training the first regression model by using the expanded first subset, updating the parameters of the first regression model until the prediction accuracy of the updated first regression model is greater than or equal to a second threshold value, and stopping updating. It should be noted that the second threshold compared with the prediction accuracy is adaptively adjusted according to actual conditions, and is within the protection scope of the present embodiment. And judging the regression accuracy of the regression model through the test set, adjusting the regression accuracy of the regression model according to the expanded training set, and improving the prediction accuracy of the regression model.

In one embodiment, as shown in fig. 8, step S107 includes:

step S1071: acquiring real time delay corresponding to the test set;

step S1072; inputting the test set into a regression model to obtain a prediction delay;

step S1073: and comparing the predicted delay with the real delay to obtain the prediction accuracy of the regression model.

In one example, the actual delay time of each operation in the test set tested in the hardware device is recorded. A certain operation in the test set may be input into the classification model to obtain a corresponding delay class label, for example, a delay class of 1ms to 5 ms. And inputting the operation of the delay category of 1ms-5ms into the first regression model corresponding to the category for prediction to obtain the predicted delay. And comparing the predicted delay with the real delay of the operation to obtain the prediction accuracy of the first regression model. And comparing the predicted delay with the real delay to determine the regression accuracy of the regression model, so that the predicted delay result of the regression model is closer to the real delay condition.

Example two

In another embodiment, as shown in fig. 9, there is provided a delay prediction apparatus 100 of a model structure, including:

the model training module 110 is configured to obtain a classification model and multiple regression models according to the training of the hardware delay lookup table;

a delay category obtaining module 120, configured to input an operand in the model structure into the classification model, so as to obtain a delay category tag corresponding to an operation;

and the delay prediction module 130 is configured to input the operation with the delay category label into the regression model corresponding to the delay category to obtain the predicted delay.

In one embodiment, as shown in fig. 10, a delay prediction apparatus 200 of a model structure is provided, wherein the model training module 110 includes:

the first building submodule 1101 is configured to build a training set and a test set according to the hardware delay lookup table, where the training set and the test set are disjoint;

a first training submodule 1102, configured to obtain a classification model by training with a training set;

a first test sub-module 1103, configured to evaluate a classification accuracy of the classification model by using the test set;

and the second training submodule 1104 is configured to expand the training set when the classification accuracy is smaller than the first threshold, train the classification model using the expanded training set, and terminate the training until the classification accuracy is greater than or equal to the first threshold.

In one embodiment, as shown in fig. 10, the first testing sub-module 1103 includes:

a first unit 11031, configured to obtain a plurality of real categories corresponding to the test set;

a second unit 11032, configured to input the test set into the classification model, so as to obtain a plurality of prediction categories;

a third unit 11033, configured to compare each prediction category with each real category, to obtain a classification accuracy of the classification model.

In one embodiment, as shown in fig. 11, a delay prediction apparatus 300 of a model structure is provided, wherein the model training module 110 further includes:

a second building submodule 1105, configured to obtain a plurality of subsets corresponding to the delay categories from the training set;

a third training submodule 1106, configured to obtain a regression model corresponding to the delay category through subset training;

a second testing submodule 1107, configured to evaluate the prediction accuracy of each regression model by using the test set;

the fourth training submodule 1108 is configured to, when the prediction accuracy of each regression model is smaller than the second threshold, expand the subset, train each regression model using the expanded subset, and terminate the training until the prediction accuracy of each regression model is greater than or equal to the second threshold.

In one embodiment, second test submodule 1107 comprises:

a fourth unit 11071, configured to obtain a real delay corresponding to the test set;

a fifth unit 11072, configured to input the test set into the regression model, so as to obtain a predicted delay;

a sixth unit 11073, configured to compare the predicted delay with the real delay, to obtain a prediction accuracy of the regression model.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 12 is a block diagram of an electronic device for a model-structured delay prediction method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 12, the electronic apparatus includes: one or more processors 1201, memory 1202, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of one processor 1201.

Memory 1202 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform a model structure delay prediction method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform a method for model structure latency prediction as provided herein.

Memory 1202, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to a model structure delay prediction method in the embodiments of the present application (e.g., model training module 110, delay class acquisition module 120, and delay prediction module 130 shown in fig. 9). The processor 1201 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1202, that is, a delay prediction method of implementing a model structure in the above method embodiments.

The memory 1202 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for delay prediction according to a model structure, and the like. Further, the memory 1202 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1202 may optionally include memory remotely located from the processor 1201 and such remote memory may be networked to a model architecture latency prediction electronics. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An electronic device of a method of time-lapse prediction of a model structure may further include: an input device 1203 and an output device 1204. The processor 1201, the memory 1202, the input device 1203, and the output device 1204 may be connected by a bus or other means, and the bus connection is exemplified in fig. 12.

The input device 1203 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for time-delayed prediction of a model structure, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output devices 1204 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD) such as a Cr12sta display 12, a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the classification model and the multiple regression models can be trained through operations in the hardware delay lookup table, and the trained classification model and regression models are used for predicting the delay of any operation. Any operation delay is predicted through limited sampling, the delay of any model structure is estimated very efficiently, the hardware equipment is not required to be directly connected to estimate the delay of the model structure, and the technical problem that the delay testing efficiency is low due to the fact that time is consumed for building a traditional hardware lookup table is solved. …

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for predicting delay of a model structure is characterized by comprising the following steps:

inputting the operation in the model structure into the classification model to obtain a delay class label corresponding to the operation;

inputting the operation with the delay category label into a regression model corresponding to the delay category to obtain the predicted delay;

wherein training input data of the regression model is determined from training output data of the classification model; the hardware delay lookup table comprises all operations in a search space and delay obtained by each operation in hardware equipment; the operation includes data of the size of the convolution kernel, the number of channels, and the step size.

2. The method of claim 1, wherein the training from the hardware delay look-up table yields a classification model and a plurality of regression models, including:

establishing a training set and a test set according to the hardware delay lookup table, wherein the training set and the test set are not intersected;

training by using the training set to obtain the classification model;

and under the condition that the classification accuracy is smaller than a first threshold value, expanding the training set, and training the classification model by utilizing the expanded training set until the classification accuracy is larger than or equal to the first threshold value, and finishing training.

3. The method of claim 2, wherein evaluating the classification accuracy of the classification model using the test set comprises:

acquiring a plurality of real categories corresponding to the test set;

inputting the test set into the classification model to obtain a plurality of prediction categories;

4. The method of claim 2, wherein training the classification model and the plurality of regression models according to a hardware delay look-up table comprises:

and under the condition that the prediction accuracy of each regression model is smaller than a second threshold value, expanding the subset, and training each regression model by using the expanded subset until the prediction accuracy of each regression model is larger than or equal to the second threshold value, and finishing the training.

5. The method of claim 4, wherein evaluating the prediction accuracy of each of the regression models using the test set comprises:

acquiring the real time delay corresponding to the test set;

inputting the test set into the regression model to obtain a prediction delay;

and comparing the prediction delay with the real delay to obtain the prediction accuracy of the regression model.

6. A model-structured delay prediction apparatus, comprising:

the delay prediction module is used for inputting the operation with the delay category label into a regression model corresponding to the delay category to obtain the predicted delay;

7. The apparatus of claim 6, wherein the model training module comprises:

the first training submodule is used for training by using the training set to obtain the classification model;

a first test sub-module for evaluating a classification accuracy of the classification model using the test set;

and the second training submodule is used for expanding the training set under the condition that the classification accuracy is smaller than a first threshold value, training the classification model by using the expanded training set until the classification accuracy is larger than or equal to the first threshold value, and finishing the training.

8. The apparatus of claim 7, wherein the first test submodule comprises:

a first unit, configured to obtain a plurality of real categories corresponding to the test set;

a second unit, configured to input the test set into the classification model to obtain a plurality of prediction categories;

9. The apparatus of claim 7, wherein the model training module further comprises:

a third training submodule, configured to train by using the subset to obtain a regression model corresponding to each delay category;

and the fourth training sub-module is used for expanding the subset under the condition that the prediction accuracy of each regression model is smaller than a second threshold value, training each regression model by using the expanded subset, and finishing the training under the condition that the prediction accuracy of each regression model is larger than or equal to the second threshold value.

10. The apparatus of claim 9, wherein the second test submodule comprises:

a fourth unit, configured to obtain a real delay corresponding to the test set;

a fifth unit, configured to input the test set into the regression model to obtain a predicted delay;

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.