CN114897126A

CN114897126A - Time delay prediction method and device, electronic equipment and storage medium

Info

Publication number: CN114897126A
Application number: CN202210535322.0A
Authority: CN
Inventors: 胡东方; 许鸿民
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-08-12

Abstract

The disclosed embodiment relates to a time delay prediction method and device, an electronic device and a storage medium, and relates to the technical field of computers, wherein the time delay prediction method comprises the following steps: performing mixed Gaussian sampling on a target model corresponding to target operation to obtain a plurality of subnet network structures of the target model; and performing convolution operation on the plurality of subnet network structures to perform time delay prediction, determining a prediction result of the time delay information of the target model, and performing the target operation on the object to be processed based on the prediction result. According to the technical scheme, the prediction result of the time delay information of the target model can be improved, and the universality can be realized.

Description

Time delay prediction method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a time delay prediction method, a time delay prediction apparatus, an electronic device, and a computer-readable storage medium.

Background

In order to accurately evaluate the performance of the machine learning model, its time delay parameters may be predicted.

In the related art, the delay of the overall operation of the model is generally fitted by means of hierarchical modeling. However, the hierarchical modeling approach cannot accurately fit the operating latency of the multi-scale parallel network. In addition, the layered modeling method needs to sample end measurement layer by layer, so that the method can only be applied to a simple scene of time delay modeling and has certain limitation. In addition, the uniform subnet sampling manner cannot accurately obtain the subnet structure, the sampling efficiency is low, and more computing resources are wasted.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a method and an apparatus for predicting latency, an electronic device, and a storage medium, which overcome, at least to some extent, the problem of inaccurate latency prediction due to limitations and disadvantages of the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided a time delay prediction method, including: performing mixed Gaussian sampling on a target model corresponding to target operation to obtain a plurality of subnet network structures of the target model; and performing convolution operation on the plurality of subnet network structures to perform time delay prediction, determining a prediction result of the time delay information of the target model, and performing the target operation on the object to be processed based on the prediction result.

According to a second aspect of the present disclosure, there is provided a delay prediction apparatus, including: the device comprises a subnet acquisition module, a data processing module and a data processing module, wherein the subnet acquisition module is used for carrying out Gaussian mixture sampling on a target model corresponding to target operation to acquire a plurality of subnet network structures of the target model; and the time delay information prediction module is used for performing convolution operation on the plurality of subnet network structures to perform time delay prediction, determining a prediction result of the time delay information of the target model, and performing the target operation on the object to be processed based on the prediction result.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the latency prediction method of the first aspect above and possible implementations thereof via execution of the executable instructions.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the latency prediction method of the first aspect described above and possible implementations thereof.

In the time delay prediction method, the time delay prediction device, the electronic device, and the computer-readable storage medium provided in the embodiments of the present disclosure, on one hand, by performing gaussian mixture sampling on the target model corresponding to the target operation, the distribution range of the subnet network structure can be increased, so that the limitation of the distribution range caused by the sampling manner in the related art is avoided, the comprehensiveness and accuracy of the subnet network structure are improved, and the computing resources can be saved. On the other hand, the time delay information prediction result of each sub-network structure is determined by performing convolution operation on a plurality of sub-network structures, so that the time delay information prediction result of the whole target model is obtained, the distribution range of the time delay information can be improved, the accuracy and the effectiveness of the predicted time delay information of the target model are improved, the end-side time delay prediction can be realized, the limitation is avoided, and the convenience and the application range are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 shows a schematic diagram of a system architecture to which the latency prediction method of the embodiment of the present disclosure may be applied.

Fig. 2 schematically illustrates a schematic diagram of a delay prediction method in an embodiment of the present disclosure.

Fig. 3 schematically illustrates a structural diagram of an object model in an embodiment of the present disclosure.

Fig. 4 schematically shows a flowchart for acquiring a subnet network structure in the embodiment of the present disclosure.

Fig. 5 schematically shows probability density function diagrams of different sampling modes in the embodiment of the present disclosure.

Fig. 6 schematically illustrates a flow chart of sampling in an embodiment of the present disclosure.

Fig. 7 schematically illustrates a flow chart for determining a prediction result in an embodiment of the present disclosure.

Fig. 8 schematically illustrates a comparison graph of time delays in an embodiment of the present disclosure.

Fig. 9 schematically shows a distribution diagram of the predicted delay in the embodiment of the present disclosure.

Fig. 10 schematically illustrates a block diagram of a latency prediction apparatus in an embodiment of the present disclosure.

Fig. 11 schematically illustrates a block diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the related art, the time delay of the whole operation can be fitted in a layered modeling mode. However, compared to the serial architecture common in the classification task, the layered modeling approach cannot accurately fit the operation delay of the multi-scale parallel network. The layered modeling method has to sample end measurement layer by layer, so the method can be only applied to a simple scene of time delay modeling. The uniform sub-network sampling approach cannot sample the sub-network structure with higher and lower resource consumption in a limited number of samples, which may result in invalid fitting.

In order to solve technical problems in the related art, the embodiment of the present disclosure provides a delay prediction method, which may be applied to an application scenario for implementing a target task. The target task may be various types of classification tasks, such as an image detection task, a voice recognition task, and so on.

Fig. 1 is a schematic diagram illustrating a system architecture to which the latency prediction method and apparatus according to the embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include a client 101, a server 102. The client 101 may be an intelligent device, for example, an intelligent device such as a smart phone, a computer, a tablet computer, and a smart speaker. The client 101 obtains the object to be processed and sends the object to be processed to the server 102, so that the server 102 processes the object to be processed according to the target model corresponding to the target operation. The object to be processed may include, for example, an image to be processed, a voice to be processed, a text to be processed, and the like, and may be determined according to a type of a target task represented by the target operation. The server 102 may be a background system providing a delay prediction related service in the embodiment of the present disclosure, and may include one electronic device or a cluster formed by multiple electronic devices having a computing function, such as a portable computer, a desktop computer, and a smart phone, for processing an object to be processed sent by a client and an object model corresponding to an object task. In addition, the client side does not need to send the object to be processed to the server, but only carries out time delay prediction through the client side, and carries out target operations such as classification, segmentation, detection and the like on the object to be processed through the target model based on the prediction result of the time delay information.

The time delay prediction method can be applied to an application scene of time delay prediction of a target model corresponding to target operation. Referring to fig. 1, a client samples an object model represented by a search space associated with an object operation to obtain a plurality of subnet network structures of the object model represented by the search space. Further, the plurality of subnet network structures can be input to the multilayer perceptron for convolution operation, the multilayer perceptron is used for carrying out time delay prediction on the plurality of subnet network structures, and a prediction result of time delay information of a target model comprising the plurality of subnet network structures is determined.

The server 102 may be the same as the client 101, i.e. both the client 101 and the server 102 are smart devices, which may be, for example, smartphones.

It should be noted that the latency prediction method provided by the embodiment of the present disclosure may be executed by the server 102. Accordingly, the delay prediction method may be provided in the server 102 by a program or the like. The latency prediction method provided by the embodiment of the present disclosure may also be executed by the client 101. Accordingly, the delay prediction method may be provided in the client 101 by a program or the like. In the embodiment of the present disclosure, a time delay prediction method is performed by a client as an example.

Next, a delay prediction method in the embodiment of the present disclosure is described in detail with reference to fig. 2, taking a client as an execution subject.

In step S210, a target model corresponding to a target operation is subjected to gaussian mixture sampling, and a plurality of subnet network structures of the target model are obtained.

In the embodiment of the present disclosure, the target operation may be a target task, and the target task may be various types of tasks, such as a classification task, a detection task, and a segmentation task, which may be determined according to an actual application scenario and an actual requirement. The target model may be the model used by the target operation and may be any type of machine learning model or deep learning model. For example, the target model may be a convolutional neural network, a recursive neural network, a generative countermeasure network, and the like, and is not particularly limited herein.

In some embodiments, a target model, i.e. a search space, may be determined first, e.g. an E2NAS model (E2NAS architecture) may be employed as the target model and the search space. The E2NAS is a hardware constraint-based scheme Mixer-Hard-aware NAS (MHANAS for short), and is more effective and faster by combining GDAS, FairDarts and FBNet. The search space can not only improve NAS based on hardware, but also improve NAS search without hardware constraint and relieve crash phenomenon.

Fig. 3 schematically shows a network structure diagram of the E2NAS model, and referring to fig. 3, the network structure of the E2NAS model mainly includes four stages, i.e., a first stage to a fourth stage. The four stages all comprise a plurality of basic models, the first stage only comprises the basic models, and the second stage to the fourth stage all comprise the basic models and the efficient fusion models. The number of basic models of each stage is 2 times of the stage value, wherein the stage value is used for representing the stage number. For example, the first stage includes 2 basic models, the second stage includes 4 basic models, the third stage includes 6 basic models, the fourth stage includes 8 basic models, and so on. It should be noted that, in the first stage to the fourth stage, the number of network layers of the basic model of each stage is sequentially increased. For example, the number of network layers of the basic model in the first stage is 1, the number of network layers of the basic model in the second stage is 2, the number of network layers of the basic model in the third stage is 3, and the number of network layers of the basic model in the fourth stage is 4. In the second stage to the fourth stage, the basic model is connected with the efficient fusion model, and the basic model and the efficient fusion model are connected in sequence. Referring to fig. 3, the basic model may include: an input layer, a plurality of downsampling modules, and an output layer.

As shown in fig. 3, the input of the current stage is the output of the previous stage, and there are two inputs in the input of the current stage with the same information, that is, the output of the previous stage is simultaneously used as the two inputs of the current stage, and the other inputs are in one-to-one correspondence with the remaining outputs of the previous stage. The current stage may be any stage in the network structure, for example, any one of the second stage, the third stage, and the fourth stage.

With continued reference to what is shown in fig. 3, the E2NAS network architecture mainly includes the following parts: an input layer 301, a feature extraction layer 302, a first stage1 to a fourth stage4(303-306), a task header 307, and an output layer 308. The feature extraction layer 302 is used to extract feature vectors. The task header may be used to indicate a prediction header of the target task, such as a detection task header, a division task header, or an arbitrary task header. Where the first phase includes a base model 309 and the remaining phases include a base model and an efficient fusion model 310. The base model may include a plurality of downsampling layers, and an efficient fusion model is used to fuse the outputs of the base model.

The subnet network structure refers to a network structure formed by samples obtained by sampling the target model, and may be a part of the target model. The size of the plurality of subnet networks may be the same or different. Based on the sampling mode, the plurality of sub-network structures can be the same or different, and the plurality of sub-network structures can be combined into a complete target model. Multiple subnet networks can be trained on the training data set and optimized with their accuracy on the validation data set as a target, and in general, a search algorithm can be used to obtain the optimal subnet network structure for the optimization target. In the embodiment of the present disclosure, the multiple subnet network structure refers to a network before optimization, that is, the multiple subnet network structure is not the optimal subnet network structure. For example, if the number of subnets of the target model a includes 3, the subnet net structure 1, the subnet net structure 2, and the subnet net structure 3 may constitute the target model a.

In some embodiments, the target model may be gaussian-blended based on a plurality of operators of the target model to obtain a plurality of subnet network structures of the target model. Fig. 4 schematically shows a flowchart for acquiring the subnet network structure, and referring to fig. 4, the method mainly includes the following steps:

in step S410, selecting a plurality of candidate operators from a plurality of operators corresponding to the target model;

in step S420, performing gaussian mixture sampling on the target model based on the candidate operators, and obtaining the plurality of subnet network structures.

In the embodiment of the present disclosure, the target model may include a plurality of operators, and the plurality of operators may be used to represent all types of operators of the target model, such as convolution operators, pooling operators, or other types of operators, and so on. To improve accuracy, a plurality of operators of the target model may be screened to determine a plurality of candidate operators from the plurality of operators. The candidate operators may be part of a plurality of operators, and the candidate operators may be of the same type, e.g. may each be a convolution operator, but the candidate operators may be of the same or different size. The sizes of the plurality of candidate operators are determined according to model parameters of the target model. The model parameters of the target model may include, for example, but are not limited to, the convolution kernel size of the target model, the depth of the block, the width of the target model network to select candidate operators. The convolution kernels of different sizes represent different views, and global information and local information can be considered. The larger the convolution kernel, the more types of candidate operators. The candidate operator may be a convolution operator, for example, any one or more operator combinations of 3 × 3 convolution operators, 5 × 5 convolution operators, 7 × 7 convolution operators, and so on.

Wherein each candidate operator corresponds to a number of the plurality of convolution kernels. When the convolution kernel is 3 × 3, there may be three choices for the number of convolution kernels, midchannel, for example, 1 times, 2 times, or 4 times. When the convolution kernel is 5 × 5, there may be three choices for the number of convolution kernels, for example, 1 times, 2 times, or 4 times. That is, the size of the convolution kernel is selectable, as is the number of convolution kernels. When a plurality of candidate operators are obtained by the search module, whether to connect the pooling module can also be selected after the search module, and whether to connect the pooling module can be determined according to the search strategy. The search strategy is used to determine how to accurately find the optimal network configuration parameters. The search strategy may be an iterative process, and the search strategy may be, for example, random search, bayesian optimization, evolutionary algorithm, reinforcement learning, gradient-based algorithm. If the pooling operation is needed in the search strategy, connecting a pooling module behind the search module; if the pooling operation is not required in the search strategy, the pooling module is not connected. When the pooling module is connected, the number of the convolution kernels can be selected from a plurality of numbers, and the specific manner is the same as the steps, and is not described herein again.

After the plurality of candidate operators are obtained, Gaussian mixture sampling can be performed on the target model through the plurality of candidate operators, and a plurality of subnet network structures are obtained. And randomly sampling operators in a search space according to the central limit theorem, wherein the operators obtained each time are random. Assuming that the structure of the target model is n layers, each layer randomly extracts one from k operators, and the condition of an empty set is not considered, the subnet network structure is composed of n layers, namely n random variables, and the time delay of the subnet network structure is the sum of the n random variables. The limit distribution when n approaches positive infinity is gaussian, i.e. the data pairs (network structure, time delay) conform to a single gaussian distribution, as shown in the schematic diagram of probability density functions for different sampling modes in fig. 5. If a wider distribution is desired to be obtained on the end-side task, the subnet network structure with a small delay can be fitted, but the variance of the gaussian distribution cannot be controlled in the related art, so that the subnet network structure with a small delay cannot be fitted, and therefore, the method has certain limitations.

In order to solve the above problem, in the embodiment of the present disclosure, the target model may be sampled in a gaussian mixture sampling manner, so as to obtain a plurality of subnet network structures of the target model. Fig. 6 schematically shows a flow chart of sampling, and referring to fig. 6, the method mainly includes the following steps S610 and S620:

in step S610, for each layer of the target model, randomly extracting a candidate operator from the multiple candidate operators and performing filtering processing to obtain multiple initial subnet network structures; the plurality of initial subnet network structures satisfy a Gaussian mixture distribution;

in step S620, the initial subnet network structures are randomly sampled, respectively, to obtain the subnet network structures.

In the embodiment of the present disclosure, the network structure of the target model is n layers, and it is assumed that each layer of the target model randomly extracts one operator from the candidate operators, so as to obtain random extraction results, that is, n random extraction results. Further, the random extraction result can be filtered. The filtering process may be to remove empty sets to avoid the impact of empty sets on accuracy. By randomly sampling and removing the empty sets, a plurality of initial subnet network structures of the target model can be obtained.

Further, each of the plurality of initial subnet network structures may be randomly sampled again until a subnet network structure satisfying the sampling condition is obtained. The condition that the sampling condition is satisfied may be that the distribution state of the delay information of the sampled subnet network structures satisfies a preset distribution state, or that the number of the sampled subnet network structures satisfies a preset number. The preset distribution state may be distributed in a desired area, for example, in an area with small time delay or in an area with large time delay, and the like. For example, if the time delay information of the sampled subnet network structure is required to be distributed in a region with small time delay and is actually distributed more in a lightweight region with small time delay, it is determined that the distribution state meets the preset distribution state. The preset number may be set according to actual requirements, and may be, for example, 1000 or 2000, and the preset number is described as 1000. When the preset number is 1000, assuming that the number of layers of the target model is 32, the candidate operator k of each layer is 5, and each layer randomly extracts one operator from the candidate operators, there are 31 initial subnet network structures in total. Further, each initial subnet network structure may be randomly sampled again to obtain the subnet of the initial subnet network structure, thereby obtaining a plurality of subnet network structures. When each initial subnet network structure is randomly sampled again, for each initial subnet network structure, 32 to 33 samples may be collected, and the samples of each initial subnet network structure may be different and may be adjusted according to actual requirements, for example, 32 or 33 samples may be collected, as long as all initial subnet network structures are sampled to obtain 1000 subnet network structures of a preset number, for example, until 1000 subnet network structures are sampled.

Illustratively, the number of the plurality of subnet network structures can be expressed as formula (1):

the time delay information of the plurality of subnet network structures meets Gaussian mixture distribution, and the time delay information of each subnet network structure meets single Gaussian distribution, so the Gaussian mixture distribution is obtained by fusing the plurality of single Gaussian distributions. Referring to fig. 5, since the delay information of each sub-network structure may be in a single gaussian distribution, in the process of acquiring multiple sub-network structures through sampling, a gaussian mixture distribution is obtained through fusion of the single gaussian distributions of the delay information of each sub-network structure, so that the model is more complex and diverse samples are generated. The gaussian mixture function used to represent the gaussian mixture distribution is:

wherein, mu _k Means, σ, representing time-delay information _k Is the variance of the delay information. When a plurality of sub-network structures distributed in a Gaussian mixture mode are sampled, the mean value and the variance of the time delay information of each sub-network structure can be sensed, so that the distribution condition of the sampled data can be controlled more conveniently, the distribution condition meets the actual requirement, and the limitation is avoided. Based on the method, the Gaussian mixture function is used for sampling, distribution wider than that of the uniform sampling in a search space can be obtained, the distribution range of the samples is increased, the distribution range of the sampled data can be timely and conveniently adjusted, and the sample division is improvedThe accuracy and comprehensiveness of the distribution also avoid the limitation that the related technology can only cover the time delay data of partial areas.

As shown in fig. 5, the delay information of each subnet network structure may be a single gaussian distribution 501, but the distribution range of the gaussian distribution is small, and a region with small delay cannot be fitted. In order to obtain the distribution state of the target area, the target area may be sampled. The target area may be a first area, which may be, for example, an area covered by 20-40, and a second area, which may be, for example, an area covered by 80-100. For the first region, random extraction may be performed near the peak of the first region to obtain a result 502 of the first region; for the second region, random extraction may be performed in the vicinity of the peak of the second region, and the result 503 of the second region may be obtained. Further, the result of the first region and the result of the second region may be merged to obtain a merged result 504, and the merged result satisfies a gaussian mixture distribution.

For example, the target model is set to be a 32-layer model, and the number k of candidate operators in each layer is 5, so that there are a total of 31 initial subnet network structures. And further randomly sampling each initial subnet network structure, wherein each initial subnet network structure is in single Gaussian distribution. Illustratively, 32-33 samples are collected per initial subnet network structure until a preset number of subnet network structures are collected, e.g., a total of 1000 subnet network structures are collected.

In the embodiment of the disclosure, by performing mixed gaussian sampling on the target model and sensing the variance and mean of the delay information of each subnet network structure, the distribution range of the delay information of the subnet network structure can be increased, the application range is increased, and the accuracy of the obtained subnet network structure is improved.

Continuing to refer to fig. 2, in step S220, performing convolution operation on the plurality of subnet network structures to perform delay prediction, determining a prediction result of the delay information of the target model, and performing the target operation on the object to be processed based on the prediction result.

In the embodiment of the present disclosure, in order to avoid the problems in the related art, delay prediction may be performed on multiple subnet network structures through an MLP (Multi-Layer perceptron) to obtain a prediction result of delay information of each subnet network structure, and the prediction results of delay information of each subnet network structure are combined to obtain a prediction result of delay information of the entire target model.

The multi-layer perceptron MLP is used for feature fusion. Illustratively, the multi-tier perceptron may include 2 fully-connected tiers and an activation tier. Wherein, the core operation of the full connection layer is still convolution operation, namely matrix vector product. The full connection layer is equivalent to a feature space transformation, and integrates all features to obtain global feature information. A fully-connected layer can also be considered as an extreme convolutional layer whose convolutional kernel size is the input matrix size, so that the output matrix has both height and width dimensions of 1. The activation layer may be a tanh function. The activation function can increase the nonlinearity of the model, and the tanh function can increase the convergence speed. In addition, the active layer may be other functions, and is not limited herein.

After the plurality of subnet network structures are obtained, the plurality of subnet network structures can be input to the multilayer perceptron MLP, so that the multilayer perceptron MLP performs convolution operation on the plurality of subnet network structures through the full connection layer and the activation layer, and a prediction result of the time delay information is obtained. For example, the prediction result of the delay information may be determined by combining the network parameters of the multi-layer perceptron and the attribute parameters of each subnet network structure. The network parameters of the multi-layer perceptron can be weight parameters of the multi-layer perceptron, and the attribute parameters of the subnet network structure can be variance and mean of the time delay information of the subnet network structure. The variance and the mean are used to indicate the degree of distribution or the degree of dispersion of the delay information.

Fig. 7 schematically shows a flow chart for determining the prediction result, which, with reference to fig. 7, mainly comprises the following steps:

in step S710, a multiplication operation is performed on the variance between the network parameters of the multilayer perceptron and the delay information of each subnet network structure to obtain a processing result;

in step S720, an addition operation is performed on the processing result and the average of the delay information of each subnet network structure, so as to determine a prediction result.

In the embodiment of the disclosure, the variance and the mean of the delay information of each subnet network structure can be obtained, and then the prediction result is obtained by combining the network parameters of the multilayer perceptron, the variance and the mean of the delay information of the subnet network structure. Further, the network parameters of the multi-layer perceptron and the variance of the delay information of each subnet network structure can be multiplied to obtain the product as the processing result. And the processing result and the average value of the time delay information of each subnet network structure can be added to determine the prediction result of the time delay information of each subnet network structure.

Illustratively, the prediction result of the time delay information may be calculated by a multi-layer perceptron:

MLP model＝nn.Sequential(

nn.Linear(self.in_features,self.mid_features),

nn.Tanh(),

nn.Linear(self.mid_features,self.out_features),

latency＝(MLP model*std)+mean)

and calculating the prediction result of the time delay information of each subnet network structure based on the multi-layer perceptron and the mean value and the variance of each subnet network structure. Because each subnet network structure is different, and the variance and the mean of the delay information of each subnet network structure are also different, the obtained prediction result of the delay information of each subnet network structure can be the same or different, and is specifically determined according to the variance and the mean of the subnet network structure.

Further, since a plurality of subnet network structures can be combined into a complete target model, the delay information of the plurality of subnet network structures can also be combined into the delay information of the target model. Based on this, the prediction results of the delay information of each subnet network structure contained in the target model can be merged and fused to obtain the prediction result of the delay information of the target model. For example, the prediction result of the delay information of each subnet network structure may be added to obtain the prediction result of the delay information of the entire target model. For example, if the number of the subnet network structures of the target model includes 1000, the prediction results of the delay information of the 1000 subnet network structures may be added, so as to obtain the prediction result of the delay information of the entire target model. Illustratively, the prediction results 1, 2 and up to the prediction result 1000 are added to obtain the prediction result of the time delay information of the whole target model.

It should be noted that the subnet network structure may be converted into a preset data format before being input to the multi-layer perceptron. The preset data format can be a tensor data format so as to improve the data processing efficiency.

After the prediction result of the time delay information of the target model is obtained, target operation can be performed on the object to be processed based on the prediction result. The object to be processed may be an image to be processed, a voice to be processed, a text to be processed, and the like, and may be specifically determined according to the type of the target operation and an actual application scenario. Based on the above, the target model with the prediction result of the time delay information determined can be used to perform target operation on the object to be processed, so as to realize the corresponding function. For example, classifying the object to be processed, or detecting the object to be processed, or segmenting the object to be processed, etc.

In the embodiment of the disclosure, the prediction result of the time delay information of the target model is calculated through the multi-layer perceptron MLP, the target model is subjected to a plurality of subnet network structures obtained by less sampling, and then the time delay information of the whole target model is fitted through the multi-layer perceptron MLP, so that the consumption of calculation resources and time is reduced, and the prediction efficiency of the time delay information of the target model is improved. In addition, the prediction result of the time delay information of the target model is calculated through the multilayer perceptron, so that the prediction of all types of target models can be realized at the end side, and the universality and the convenience are improved. In addition, the distribution state of the time delay information of the subnet network structure can be controlled through the variance and the mean of the time delay information of the subnet network structure, so that the accuracy and the rationality of the time delay information prediction are improved, and the prediction effect is improved.

In the embodiments of the present disclosure, the comparison may be performed by different modeling manners. Illustratively, 2500 subnet net structures may be sampled and the operating latency of the subnet net structures may be tested in the terminal. The 1000 subnet network structures are used for achieving MLP fitting, and the 1500 subnet network structures are used for verifying fitting effects of MLP modeling and layered modeling. Referring to the comparison graph of actual delay and predicted delay shown in fig. 8, the hierarchical modeling has significant bias in high-delay scenarios. Therefore, the uniform sub-network sampling mode cannot sample the sub-network structure with higher and lower time delay in the limited sampling quantity, so that the MLP fitting is invalid, and the accuracy is lower.

Illustratively, a sub-network structure with higher delay and lower delay can be sampled in the same sampling quantity by a mixed gaussian sampling mode, so that wider delay distribution is obtained, and the distribution range of delay information is increased. Specifically, the E2NAS is adopted as an architecture as a target model and a search space. The search module can independently select a plurality of operators from two different operators of 3 × 3Conv and 5 × 5Conv as candidate operators, and each candidate operator can select the number of the three convolution kernels. All search modules can choose whether to connect to Pooling Transition module, and the number of three convolution kernels can be chosen. And (3) respectively sampling 1000 subnet network structures by using two modes of Gaussian mixed sampling and uniform sampling, and testing the running time delay of the network in the mobile phone of the same type.

The distribution of the predicted delays of the subnet network structure obtained using different sampling patterns is schematically shown in fig. 9. Referring to a diagram a in fig. 9, 1000 subnet network configurations are respectively obtained to count the predicted delay information. The area covered by the prediction result of the time delay information obtained by the Gaussian mixture sampling mode is between [0,150], and the area covered by the prediction result of the time delay information obtained by the uniform subnet sampling mode is between [30 and 80 ]. Compared with the prior art, the prediction result of the time delay information obtained by the Gaussian mixture sampling mode has a larger coverage area, wherein more areas with small time delay are covered, and the area with small time delay is a required area. As can be seen from the graph B in fig. 9, the delay distribution obtained by the mixed gaussian sampling method does not fit to the uniform distribution.

In summary, according to the technical solution provided in the embodiment of the present disclosure, the sampling method may be applied to all tasks that simulate a large data volume behavior by using partial sampling data, including but not limited to all structure search tasks NAS. The target model is subjected to time delay prediction through the multilayer perceptron, so that the prediction efficiency and convenience of model time delay are improved, and the universality of time delay prediction is improved. The target model is sampled through Gaussian mixture sampling to obtain a plurality of subnet network structures, so that the sampling range can be increased, and the comprehensiveness and the accuracy are improved. Sampling is not required to be carried out layer by layer, so that the method is suitable for various application scenes, and the application range is enlarged. Furthermore, the problem that a uniform subnet sampling mode cannot sample samples with a wider distribution range in a limited sampling quantity is avoided, the range of a subnet network structure is increased, the accuracy is improved, and the computing resources are reduced.

In an embodiment of the present disclosure, a delay prediction apparatus is provided, and referring to fig. 10, the delay prediction apparatus 1000 may include:

a subnet obtaining module 1001, configured to perform gaussian mixture sampling on a target model corresponding to a target operation to obtain multiple subnet network structures of the target model;

the delay information prediction module 1002 is configured to perform convolution operation on the plurality of subnet network structures to perform delay prediction, determine a prediction result of the delay information of the target model, and perform the target operation on the object to be processed based on the prediction result.

In an exemplary embodiment of the present disclosure, the subnet acquisition module includes: the candidate operator determining module is used for selecting a plurality of candidate operators from a plurality of operators corresponding to the target model; and the subnet structure determining module is used for performing Gaussian mixture sampling on the target model based on the candidate operators to acquire the subnet network structures.

In an exemplary embodiment of the disclosure, the candidate operator determination module includes: and the operator selection module is used for determining the candidate operators according to the model parameters of the target model.

In an exemplary embodiment of the present disclosure, the subnet structure determining module includes: an initial acquisition module, configured to randomly extract a candidate operator from the multiple candidate operators for each layer of the target model, and perform filtering processing to acquire multiple initial subnet network structures; the plurality of initial subnet network structures form a Gaussian mixture distribution; and the random sampling module is used for respectively carrying out random sampling on the plurality of initial subnet network structures to obtain the plurality of subnet network structures.

In an exemplary embodiment of the present disclosure, the delay information prediction module includes: the prediction control module is used for carrying out convolution operation on the plurality of subnet network structures based on the multilayer perceptron so as to obtain the prediction results of the time delay information of the plurality of subnet network structures; and the fusion module is used for fusing the prediction results of the time delay information of the plurality of subnet network structures to obtain the prediction result of the time delay information of the target model.

In an exemplary embodiment of the present disclosure, the predictive control module includes: and the parameter prediction module is used for determining the prediction result of the time delay information by combining the network parameters of the multilayer perceptron and the attribute parameters of the network structures of the sub-networks.

In an exemplary embodiment of the present disclosure, the parameter prediction module includes: the first processing module is used for multiplying the network parameters of the multilayer perceptron and the variance of each subnet network structure to obtain a processing result; and the second processing module is used for adding the processing result and the average value of each subnet network structure to determine a prediction result.

It should be noted that, the specific details of each module in the delay prediction apparatus have been described in detail in the corresponding delay prediction method, and therefore are not described herein again.

FIG. 11 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. The terminal of the present disclosure may be configured in the form of an electronic device as shown in fig. 11, however, it should be noted that the electronic device shown in fig. 11 is only one example, and should not bring any limitation to the functions and the use range of the embodiment of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, cause the processor to implement the method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 11, the electronic device 1100 may include: the mobile terminal includes a processor 1110, an internal memory 1121, an external memory interface 1122, a Universal Serial Bus (USB) interface 1130, a charging management Module 1140, a power management Module 1141, a battery 1142, an antenna 1, an antenna 2, a mobile communication Module 1150, a wireless communication Module 1160, an audio Module 1170, a speaker 1171, a receiver 1172, a microphone 1173, an earphone interface 1174, a sensor Module 1180, a display 1190, a camera Module 1191, an indicator 1192, a motor 1193, a button 1194, a Subscriber Identity Module (SIM) card interface 1195, and the like. The sensor module 1180 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the electronic device 1100. In other embodiments of the present application, electronic device 1100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 1110 may include one or more processing units, such as: the processor 1110 may include an application processor, a modem processor, a graphics processor, an image signal processor, a controller, a video codec, a digital signal processor, a baseband processor, and/or a Neural-Network Processing Unit (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors. Additionally, a memory may be provided within processor 1110 for storing instructions and data. The latency prediction method in the present exemplary embodiment may be performed by an application processor, a graphics processor, or an image signal processor, and may be performed by the NPU when the method involves neural network related processing.

The internal memory 1121 may be used to store computer-executable program code, including instructions. The internal memory 1121 may include a program storage area and a data storage area. The external memory interface 1122 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 1100.

The communication function of the mobile terminal 1100 may be implemented by a mobile communication module, an antenna 1, a wireless communication module, an antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module may provide a mobile communication solution of 2G, 3G, 4G, 5G, etc. applied to the mobile terminal 1100. The wireless communication module may provide wireless communication solutions such as wireless lan, bluetooth, near field communication, etc. applied to the mobile terminal 200.

The display screen is used for realizing display functions, such as displaying user interfaces, images, videos and the like. The camera module is used for realizing shooting functions, such as shooting images, videos and the like. The audio module is used for realizing audio functions, such as playing audio, collecting voice and the like. The power module is used for realizing power management functions, such as charging a battery, supplying power to equipment, monitoring the state of the battery and the like.

The present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims. It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method for predicting delay, comprising:

performing mixed Gaussian sampling on a target model corresponding to target operation to obtain a plurality of subnet network structures of the target model;

and performing convolution operation on the plurality of subnet network structures to perform time delay prediction, determining a prediction result of the time delay information of the target model, and performing the target operation on the object to be processed based on the prediction result.

2. The delay prediction method according to claim 1, wherein the performing gaussian mixture sampling on the target model corresponding to the target operation to obtain a plurality of subnet network structures of the target model comprises:

selecting a plurality of candidate operators from a plurality of operators corresponding to the target model;

and performing Gaussian mixture sampling on the target model based on the candidate operators to obtain the plurality of subnet network structures.

3. The method according to claim 2, wherein the selecting a plurality of candidate operators from the plurality of operators corresponding to the target model comprises:

and determining the candidate operators according to the model parameters of the target model.

4. The delay prediction method of claim 2, wherein the obtaining the plurality of subnet network structures based on the gaussian mixture sampling with the plurality of candidate operators comprises:

for each layer of the target model, randomly extracting a candidate operator from the candidate operators and filtering the candidate operator to obtain a plurality of initial subnet network structures; the plurality of initial subnet network structures form a Gaussian mixture distribution;

and respectively carrying out random sampling on the plurality of initial subnet network structures to obtain the plurality of subnet network structures.

5. The latency prediction method of claim 1, wherein the convolving the plurality of subnet network structures to perform latency prediction and determining the prediction result of the latency information of the target model comprises:

performing convolution operation on the plurality of subnet network structures based on a multilayer perceptron to obtain the prediction results of the time delay information of the plurality of subnet network structures;

and fusing the prediction results of the time delay information of the plurality of subnet network structures to obtain the prediction result of the time delay information of the target model.

6. The latency prediction method of claim 1, wherein the performing convolution operation on the plurality of subnet network structures based on the multi-layer perceptron to obtain the prediction result of the latency information of the plurality of subnet network structures comprises:

and determining the prediction result of the time delay information of the plurality of subnet network structures by combining the network parameters of the multilayer perceptron and the attribute parameters of each subnet network structure.

7. The latency prediction method of claim 1, wherein the determining the prediction result of the latency information of the plurality of subnet network structures in combination with the network parameters of the multi-layer perceptron and the attribute parameters of each of the subnet network structures comprises:

multiplying the network parameters of the multilayer perceptron and the variance of each subnet network structure to obtain a processing result;

and adding the processing result and the average value of each subnet network structure to determine the prediction result of the time delay information.

8. A time delay prediction device applied to a server is characterized by comprising:

the device comprises a subnet acquisition module, a data processing module and a data processing module, wherein the subnet acquisition module is used for carrying out Gaussian mixture sampling on a target model corresponding to target operation to acquire a plurality of subnet network structures of the target model;

and the time delay information prediction module is used for performing convolution operation on the plurality of subnet network structures to perform time delay prediction, determining a prediction result of the time delay information of the target model, and performing the target operation on the object to be processed based on the prediction result.

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the latency prediction method of any one of claims 1-7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the latency prediction method according to any one of claims 1 to 7.