CN111860758A

CN111860758A - Operation method and device of deep learning model, electronic equipment and medium

Info

Publication number: CN111860758A
Application number: CN202010265726.3A
Authority: CN
Inventors: 靖远
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2020-10-30
Anticipated expiration: 2040-04-07
Also published as: CN111860758B

Abstract

The application provides an operation method, an operation device, electronic equipment and a medium of a deep learning model, wherein the method comprises the following steps: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator.

Description

Operation method and device of deep learning model, electronic equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an operation method and device of a deep learning model, electronic equipment and a medium.

Background

At present, with the explosive development of artificial intelligence, more and more deep learning models are deployed on intelligent devices, such as a face recognition model, a target detection model, and a target tracking model. An OCR (Optical Character Recognition) model, and the like. More CPU resources are occupied when the deep learning model is operated on the intelligent equipment, so that small CPU load is brought, the problem that the whole deep learning model is jammed or inaccurately operated in the operation process is caused, and even bad influence is brought to the system of the intelligent equipment.

In order to solve the above problems, the prior art proposes the following two technical solutions:

the first technical scheme is as follows: the CPU load when running a deep learning model is reduced by reducing the complexity of the model, and common approaches include model compression and model clipping. For models with lower complexity, less CPU resources are often occupied in running, thereby reducing the CPU load, but the low complexity of deep learning models often comes at the expense of the accuracy of the models. The accuracy of a deep learning model in a real AI application scene has higher requirements, and the application requirements are often difficult to meet by adopting the scheme.

The second technical scheme is as follows: the CPU load at runtime is reduced by optimizing the implementation of the underlying code of the deep learning model operator. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom layer code, but the effect after optimization is difficult to ensure. Meanwhile, intelligent devices are various, and related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In conclusion, the scheme is very complex and low in usability in practical application.

Disclosure of Invention

In view of this, an object of the present application is to provide an operating method, an operating apparatus, an electronic device, and a medium for a deep learning model, which can achieve the purpose of reducing the CPU load without reducing the complexity of the model and optimizing the underlying code of a model operator.

According to a first aspect of the present application, there is provided a method for operating a deep learning model, including:

acquiring a logical relation between operators in the deep learning model;

adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;

and operating the deep learning model according to the adjusted running time of each operator in the deep learning model.

In a possible implementation manner, adjusting the operation time of each operator in the deep learning model according to the logical relationship between each operator, so that the total operation time of at least two operators with consecutive logical relationship in the adjusted deep learning model is prolonged, includes:

setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In one possible embodiment, the delay time of both operators of the target operator pair is determined by:

and aiming at the target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.

In one possible embodiment, the calculation amount of each operator is determined by the following steps:

for each operator, acquiring a calculation parameter required by the operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;

and determining the calculation amount of each operator based on the calculation parameters required by the operator at the running time.

In one possible implementation, the delay times of both operators in any of the target operator pairs are the same.

solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;

Determining delay times of two operators in the target operator pair based on the solved values of the delay time variables of the target operator pair.

In one possible implementation, the accumulated value of the delay time variables of all target operator pairs in the deep learning model is determined by:

and aiming at the condition that the deep learning model runs in parallel, if the latter operators in the multiple target operator pairs are the same operator, the accumulated value of the delay time variables of the multiple target operator pairs is the maximum value in the values of the delay time variables of the multiple target operator pairs.

and aiming at the condition that the deep learning model operates in series, if the latter operators in the target operator pairs are the same operator, the accumulated values of the delay time variables of the target operator pairs are the accumulated values of the delay time variables of the target operator pairs.

In a possible implementation manner, obtaining a logical relationship between operators in the deep learning model includes:

Performing model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;

and acquiring the logical relation among operators in the deep learning model from the model data flow diagram.

According to a second aspect of the present application, there is provided an operating apparatus of a deep learning model, comprising:

the acquisition module is used for acquiring the logical relationship among operators in the deep learning model;

the adjusting module is used for adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;

and the operation module is used for operating the deep learning model according to the adjusted operation time of each operator in the deep learning model.

In one possible embodiment, the adjusting module comprises:

the setting unit is used for setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In a possible implementation, the adjusting module further includes:

and the first determining unit is used for determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair aiming at the target operator pair.

In a possible implementation, the adjusting module further includes:

the acquisition unit is used for acquiring a calculation parameter required by each operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;

and the second determination unit is used for determining the calculation amount of each operator based on the calculation parameters required by the operator in operation.

In a possible implementation, the adjusting module further includes:

the solving unit is used for solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;

A third determining unit, configured to determine delay times of two operators in the target operator pair based on the value of the delay time variable of the target operator pair that is solved.

In a possible implementation, the adjusting module further includes:

and a fourth determining unit, configured to determine, for a case where the deep learning model operates in parallel, a maximum value among values of delay time variables of the plurality of target operator pairs as an accumulated value of the delay time variables of the plurality of target operator pairs if all latter operators of the plurality of target operator pairs are the same operator.

In a possible implementation, the adjusting module further includes:

and a fifth determining unit, configured to determine, for a case where the deep learning model operates in series, an accumulated value of values of delay time variables of the plurality of target operator pairs as an accumulated value of delay time variables of the plurality of target operator pairs if a latter operator of the plurality of target operator pairs is the same operator.

In one possible implementation, the obtaining module includes:

the model analysis unit is used for carrying out model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;

And the relation acquisition unit is used for acquiring the logical relation among operators in the deep learning model from the model data flow diagram.

According to a third aspect of the present application, there is provided an electronic device comprising: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate with each other through the bus, and the processor executes the machine-readable instructions to perform the steps of the method in any one of the possible implementation manners of the first aspect and the first aspect.

According to a fourth aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method of the first aspect described above, in any one of the possible implementations of the first aspect.

The embodiment of the application provides an operation method, an operation device, electronic equipment and a medium of a deep learning model, wherein the method comprises the following steps: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. The method comprises the steps of firstly obtaining logic relations among operators in a deep learning model, then adjusting the running time of each operator, so that the running total time of at least two operators with continuous logic relations is prolonged, thereby prolonging the total running time of each operator in the deep learning model, and finally running the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flow chart illustrating a method for operating a deep learning model according to an embodiment of the present disclosure;

FIG. 2 illustrates a specific example of a model data flow graph of a deep learning model (VGG-19 model);

FIG. 3 shows a schematic diagram of a target operator pair;

FIG. 4 shows a schematic diagram of the allocation of delay times for two operators in a target operator pair;

FIG. 5 is a diagram illustrating the optimal allocation of delay times for two operators in a target operator pair in the case where deep learning models are run in parallel;

FIG. 6 is a diagram illustrating the optimal allocation of delay times for two operators in a target operator pair in the case of serial operation of a deep learning model;

FIG. 7 is a schematic structural diagram illustrating an operating apparatus of a deep learning model according to an embodiment of the present disclosure;

Fig. 8 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

In the first conventional scheme, the CPU load when the deep learning model is run is reduced by reducing the complexity of the model, and common means include model compression and model clipping. For models with lower complexity, less CPU resources are often occupied in running, thereby reducing the CPU load, but the low complexity of deep learning models often comes at the expense of the accuracy of the models. The accuracy of a deep learning model in a real AI application scene has higher requirements, and the application requirements are often difficult to meet by adopting the scheme.

In the second conventional scheme, the CPU load during the operation is reduced by optimizing the implementation of the bottom layer code of the deep learning model operator. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom layer code, but the effect after optimization is difficult to ensure. Meanwhile, intelligent devices are various, and related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In conclusion, the scheme is very complex and low in usability in practical application.

Based on the above technical problem, an embodiment of the present application provides an operation method of a deep learning model, which is described in detail below.

Referring to fig. 1, fig. 1 is a flowchart illustrating an operating method of a deep learning model according to an embodiment of the present disclosure. As shown in fig. 1, the operation method mainly includes the following steps:

s101, acquiring a logical relation between operators in a deep learning model;

step S102, adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;

and step S103, operating the deep learning model according to the adjusted running time of each operator in the deep learning model.

In step S101, the deep learning model refers to a neural network having "multiple hidden layers", where "multiple hidden layers" represents more than three hidden layers. Deep learning models typically have eight nine or more hidden layers. The number of hidden layers is increased, and parameters such as weight, threshold value and the like of corresponding neuron connection are increased. This means that the deep learning model can automatically extract many complex features. With the advent of cloud computing and big data era, massive training data are matched with a layer-by-layer pre-training and error inverse propagation fine-tuning method, so that the model training efficiency is greatly improved, and meanwhile, the risk of overfitting is reduced. Common deep learning models include: AlexNet model, VGG Net model, GoogleNet model, ResNet model, ResNeXt model, R-CNN model, SqueezeNet model, GAN model, and the like. The deep learning model in the embodiment is a delay insensitive deep learning model, that is, a deep learning model in which adverse effects caused by delay are not concerned in performance indexes.

The operator refers to a mapping O of the function space to the function space: x → X. The operator in the broad sense can be generalized to any space. In the present embodiment, the operator of the deep learning model refers to the operation of each layer in the deep learning model. Namely, the operation of each layer in the deep learning model is packaged into an operator.

A logical relationship, or "dependency," refers to a relationship in which a change in one of two operators will affect the other in a deep learning model. The logical relations are relative, and the operator a can be a pre-operator of the operator B, a post-operator of the operator C, or a parallel operator of the operator D. Therefore, the operator A and the operator B are in a front-end relation, the operator A and the operator C are in a rear-end relation, and the operator A and the operator D are in a parallel relation.

In a possible implementation manner, a model analysis tool is used to perform model analysis on the deep learning model, so as to obtain a model data flow graph. And then acquiring the logical relationship among operators in the deep learning model from the model data flow graph.

Regarding the model analysis tool, a general deep learning inference engine or framework (such as pitorch, tensoflow, product self-study by manufacturer, etc.) is provided with the model analysis tool, and some special model analysis tools are also available in the market for performing model analysis on the deep learning model.

The model data flow graph is obtained from the analyzed deep learning model file, and the logic relation is obtained from the model data flow graph. The model dataflow graph can be used to clearly show the operators of each layer of the deep learning model and the logical relationship among the operators. Referring to fig. 2, fig. 2 shows a specific example of a model data flow graph of a deep learning model (VGG-19 model). As shown in FIG. 2, the model data flow diagram of the VGG-19 model includes a two-dimensional convolution operator (conv), a pooling operator (pool), and a full join operator (fc). It should be noted that the model data flow diagram in fig. 2 is only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and may be other types of model data flow diagrams, such as a model data flow diagram with branches.

In step S102, the logical relationship succession refers to: the output result of one operator is the input condition of the other operator, and the logical relations of the two operators are continuous. As shown in fig. 2, the output result of the first two-dimensional convolution operator in the model flowchart is the input condition of the second two-dimensional convolution operator, and then the logical relations of the first two-dimensional convolution operators in the model flowchart are continuous.

The runtime of each operator is the length of time from the beginning of the operation of that operator to the beginning of the operation of the next operator. The total operation time length of the two operators with continuous logical relations is the time length from the operation beginning of the former operator to the operation ending of the latter operator in the two operators with continuous logical relations.

In the traditional scheme, when the deep learning model runs, each operator is sequentially executed without delay according to the logical relation of a model data flow diagram. In this embodiment, the total operating time of at least two operators with continuous logical relationships is prolonged by adjusting the operating time of each operator in the model data flow diagram. For example: the total operation time of the two-dimensional convolution operator and the pooling operator is prolonged by adjusting the operation time of the second operator (the two-dimensional convolution operator) and the third operator (the pooling operator) in fig. 2.

In step S103, the operation refers to deep reasoning for the operators in the deep learning model, and the deep learning reasoning engine or framework (such as pitorch, tenserflow, manufacturer self-study product, etc.) is generally used for deep reasoning.

For the case that the load of the mobile terminal is reduced, step S103 is implemented in the mobile terminal, and the above-mentioned step S101 and step S102 are implemented in the cloud, so as to reduce the calculation amount of the mobile terminal. Step S103 improves the function of the inference framework of the deep learning model of the mobile terminal, namely, improves the operation sequence of the deep learning model, and improves the operation of each operator from non-interval operation to the operation of each operator according to the adjusted operation time, thereby reducing the CPU load of the mobile terminal. The mobile terminal (mobile internet terminal) refers to a terminal device accessing internet through a wireless network technology. The mobile terminal can be a mobile internet terminal such as a mobile phone, a tablet personal computer, a vehicle event data recorder and an intelligent navigator.

For the case that the load of the cloud is reduced, the steps S101 to S103 are all implemented in the cloud, so that the load of the cloud CPU is reduced when the cloud CPU resources are abnormally tense.

In conclusion, the deep learning model is operated according to the operation time of each operator in the adjusted deep learning model, so that the utilization of CPU resources can be reduced, and the CPU load is reduced. By adopting the scheme provided by the embodiment of the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator. On the one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, and the application requirement of a real AI application scene can not be met. On the other hand, the optimization of the model operator bottom layer code to reduce the CPU load can lead to very complicated and low usability in practical application, the scheme provided by the application reduces the complexity of practical application, can be applied to various types of CPUs, and has universality.

The above step S102 will be described in detail.

The two operators in the target operator pair are two operators whose logical relationship is continuous. Each operator in the deep learning model may form a plurality of target operator pairs according to a logical relationship, please refer to fig. 3, and fig. 3 is a schematic diagram of the target operator pairs. As shown in fig. 3, the logical relationship of each operator in the deep learning model is operator a-operator B-operator C-operator D, so that operator a and operator B form a target operator pair, operator B and operator C form a target operator pair, operator C and operator D form a target operator pair, and operator D and the next operator to be operated form a target operator pair. The combination of the target operator pairs in fig. 3 is only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and one operator may point to multiple operators, and one operator and multiple operators respectively form multiple target operator pairs.

The delay time is used to characterize the execution interval time of the two operators in the target operator pair. For example: the operator a and the operator B in the existing target operator pair are not delayed in execution, and after the operator a in the target operator pair is executed, the operator a in the target operator pair is delayed for a certain time, and then the operator B in the target operator pair is continuously executed.

The total operation time length of the target operator pair is the time length from the beginning of the operation of the former operator to the end of the operation of the latter operator in the target operator pair. Assuming that the target operator pair comprises an operator A and an operator B, and a time length table from the beginning to the end of the operation of the operator AShown as T_AThe time length from the beginning to the end of the operation of the operator B is represented as T_B. The execution interval time of the operator A and the operator B, namely the time length from the end of the operation of the operator A to the start of the operation of the operator B, is represented as T_C. Then the total duration of operation of the target operator pair T_{General assembly}＝T_A+T_B+T_C。

In one possible embodiment, for at least one target operator pair, a predetermined delay time is set between two operators in the target operator pair, so that the total operating time of the target operator pair is extended. In this embodiment, two cases are included, the first case: setting a preset delay time between two operators in a part of target operator pairs; in the second case, a predetermined delay time is set between two operators in all the pairs of target operators.

How to determine the delay time of the two operators in the target operator pair is described in detail below.

(1) And aiming at the target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.

Regarding the calculation amount of the operator, it should be noted that, for each operator, the calculation parameters required by the operator during operation are obtained; the calculation parameters include any one or more of: the number of the called calculation units and the calling times of the calculation units. Then, the calculation amount of each operator is determined based on the calculation parameters required by the operator at the running time. For example, for a two-dimensional convolution operator, which includes a plurality of simple operations (e.g., multiplication, summation, etc.), each simple operation requires a corresponding computing unit to be called once at runtime, and thus the amount of computation of the two-dimensional convolution can be determined by the number of computing units called and/or the number of times the computing units are called.

It should be noted that the total delay time of all target operator pairs in the deep learning model needs to satisfy a condition lower than the tolerable total delay time of the deep learning model. That is, the present embodiment is suitable for a deep learning model that is not sensitive to delay, and is not suitable for a deep learning model that has a high requirement on real-time performance. And subtracting the actual running total time of all the operators of the deep learning model from the tolerable running total time of all the operators of the deep learning model to obtain the tolerable total delay time of the deep learning model. For example, the tolerable total operating time of all operators of the face recognition model is about 10 seconds, the actual total operating time of all operators of the face recognition model is about 2 seconds, and then the tolerable total delay time of the face recognition model is about 8 seconds.

For the first case, a predetermined delay time is set between the two operators in a partial target operator pair.

And distributing the tolerable total delay time of the deep learning model to the partial target operator pairs according to the calculated amount of the previous operator in the partial target operator pairs. In specific implementation, each operator is divided into a plurality of levels according to the calculation amount of each operator, the calculation amount is small, and the calculation amount is large in the calculation amount. If the calculation amount of the operator is small, the weight of the operator can be distributed to be 0; if the calculated amount of the operator is less than the threshold, the weight of the operator can be distributed to be 2; if the computation of an operator is high, the weight of the operator can be assigned to 3. Calculating the total weight ∑ ω of each operator_iCalculating the weight ω of each operator_iAnd the total weight ∑ ω_iRatio of (p)_iThe ratio ρ_iMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongs_i＝ρ_iT, which is the previous operator of the target operator pair to which it belongs. For the operator with small calculation amount, the delay time T of the target operator pair_i0 then corresponds to setting a non-zero delay time between only two operators in a partial target operator pair.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating allocation of delay times of two operators in a target operator pair. As shown in FIG. 4, assume that operator A is small in computation, ω₁0; in the calculation of operator B, ω ₂2; the operator C has small calculated amount, omega ₃2; operator D has large calculated amount, omega ₄3. And let T be 10 s.

Operator A corresponds toρ₁＝ω₁/(ω₁+ω₂+ω₃+ω₄)＝0/(0+2+2+3)＝0；

Rho corresponding to operator B₂＝ω₂/(ω₁+ω₂+ω₃+ω₄)＝2/(0+2+2+3)＝0.286；

Rho corresponding to operator C₃＝ω₃/(ω₁+ω₂+ω₃+ω₄)＝2/(0+2+2+3)＝0.286；

Rho corresponding to operator D₄＝ω₄/(ω₁+ω₂+ω₃+ω₄)＝3/(0+2+2+3)＝0.428。

Delay time T of target operator pair to which operator A belongs₁＝ρ₁*T＝0s；

Delay time T of target operator pair to which operator B belongs₂＝ρ₂*T＝0.286*10s＝2.86s；

Delay time T of target operator pair to which operator C belongs₃＝ρ₃*T＝0.286*10s＝2.86s；

Delay time T of target operator pair to which operator D belongs₄＝ρ₄*T＝0.428*10s＝4.28s。

The above-mentioned assigned values of operator weights are only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other values may be assigned to the operator weights in specific implementations. And, the tolerable total delay time of 10s for the deep learning model is only used to teach one skilled in the art how to implement the present invention, and different tolerable total delay times are set according to different depth learning models in specific implementations.

For the second case, a predetermined delay time is set between the two operators in all pairs of target operators.

And distributing the tolerable total delay time of the deep learning model to all target operator pairs according to the calculated amount of the previous operator in all the target operator pairs. In specific implementation, the calculated quantity Q of each operator is determined respectively _iAccording to the calculated quantity Q of each operator_iThe weight of the operator is determined. For example, in advanceThe minimum calculation amount to the maximum calculation amount of the operator are divided into a plurality of numerical value ranges. In the first numerical range, the weight of the operator is 1, in the second numerical range, the weight of the operator is 2, … and so on, thereby constructing a weight table of the operator. In the weight table, each weight corresponds to a numerical range of the calculation amount of the operator. Calculated quantity Q at known operator_iThen, the corresponding weight can be looked up through the weight table. Calculating the total weight ∑ ω of each operator_iCalculating the weight ω of each operator_iAnd the total weight ∑ ω_iRatio of (p)_iThe ratio ρ_iMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongs_i＝ρ_iT, which is the previous operator of the target operator pair to which it belongs.

As shown in fig. 4, it is assumed that the calculation amount of the operator a is in the first numerical range, the calculation amount of the operator B is in the second numerical range, the calculation amount of the operator C is in the fifth numerical range, and the calculation amount of the operator D is in the second numerical range. Then ω₁＝1，ω₂＝2，ω₃＝5，ω ₄2. And let T be 10 s.

Rho corresponding to operator A₁＝ω₁/(ω₁+ω₂+ω₃+ω₄)＝1/(1+2+5+2)＝0.1；

Rho corresponding to operator B ₂＝ω₂/(ω₁+ω₂+ω₃+ω₄)＝2/(1+2+5+2)＝0.2；

Rho corresponding to operator C₃＝ω₃/(ω₁+ω₂+ω₃+ω₄)＝5/(1+2+5+2)＝0.5；

Rho corresponding to operator D₄＝ω₄/(ω₁+ω₂+ω₃+ω₄)＝2/(1+2+5+2)＝0.2。

Delay time T of target operator pair to which operator A belongs₁＝ρ₁*T＝0.1*10s＝1s；

Delay time T of target operator pair to which operator B belongs₂＝ρ₂*T＝0.2*10s＝2s；

Delay time T of target operator pair to which operator C belongs₃＝ρ₃*T＝0.5*10s＝5s；

Delay time T of target operator pair to which operator D belongs₄＝ρ₄*T＝0.2*10s＝2s。

In summary, for the target operator pair, the delay time of two operators in the target operator pair is determined according to the calculated amount of the previous operator in the target operator pair. In the scheme, the set weight is great for operators with large calculation amount, so that the delay time of two operators in the corresponding target operator pair is long; for the operator with small calculation amount, the set weight is small and even 0, so that the delay time of the two operators in the corresponding target operator pair is short and even 0. In this way, in the model operation process, enough delay time is given to the operator with large calculation amount to reduce the CPU load of the operator in unit time, and the CPU load of each operator in unit time can be reduced.

(2) The delay times of both operators in any of the target operator pairs are the same.

And averagely distributing the tolerable total delay time of the deep learning model to partial target operator pairs. In specific implementation, a part of target operator pairs which need to set the delay time is determined first, and the target operator pairs generally occupy more CPU resources, such as various convolution operators. Next, the weight of the operator immediately preceding the selected partial target operator pair is set to 1, and the weights of the operators immediately preceding the other target operator pairs are set to 0. Calculate each calculationTotal weight of the sub-sum_iCalculating the weight ω of each operator_iAnd the total weight ∑ ω_iRatio of (p)_iThe ratio ρ_iMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongs_i＝ρ_iT, which is the previous operator of the target operator pair to which it belongs. In this way, the delay time of two operators in the selected partial target operator pairs is the same, and the delay time of two operators in other target operator pairs is 0.

As shown in FIG. 4, assume operators B and C as convolution operators, ω ₁＝0，ω₂＝1，ω₃＝1，ω₄0. And let T be 10 s.

Rho corresponding to operator A₁＝ω₁/(ω₁+ω₂+ω₃+ω₄)＝0/(0+1+1+0)＝0；

Rho corresponding to operator B₂＝ω₂/(ω₁+ω₂+ω₃+ω₄)＝1/(0+1+1+0)＝0.5；

Rho corresponding to operator C₃＝ω₃/(ω₁+ω₂+ω₃+ω₄)＝1/(0+1+1+0)＝0.5；

Rho corresponding to operator D₄＝ω₄/(ω₁+ω₂+ω₃+ω₄)＝0/(0+1+1+0)＝0。

Delay time T of target operator pair to which operator A belongs₁＝ρ₁*T＝0*10s＝0s；

Delay time T of target operator pair to which operator B belongs₂＝ρ₂*T＝0.5*10s＝5s；

Delay time T of target operator pair to which operator D belongs₄＝ρ₄*T＝0*10s＝0s。

And averagely distributing the tolerable total delay time of the deep learning model to all target operator pairs. In a specific implementation, the weight ω of each operator_iIdentical, e.g. weight ω of each operator_iAre all 1. Calculating the total weight ∑ ω of each operator_iCalculating the weight ω of each operator _iAnd the total weight ∑ ω_iRatio of (p)_iThe ratio ρ_iMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongs_i＝ρ_iT, which is the previous operator of the target operator pair to which it belongs. In this way, the delay times of both operators in all pairs of target operators are the same.

As shown in FIG. 4, assume that the weights of operator A, operator B, operator C, and operator D are the same, i.e., ω₁＝1，ω₂＝1，ω₃＝1，ω₄1. And let T be 10 s.

Rho corresponding to operator A₁＝ω₁/(ω₁+ω₂+ω₃+ω₄)＝1/(1+1+1+1)＝0.25；

Rho corresponding to operator B₂＝ω₂/(ω₁+ω₂+ω₃+ω₄)＝1/(1+1+1+1)＝0.25；

Rho corresponding to operator C₃＝ω₃/(ω₁+ω₂+ω₃+ω₄)＝1/(1+1+1+1)＝0.25；

Rho corresponding to operator D₄＝ω₄/(ω₁+ω₂+ω₃+ω₄)＝1/(1+1+1+1)＝0.25。

Extension of target operator pair to which operator A belongsLate time T₁＝ρ₁*T＝0.25*10s＝2.5s；

Delay time T of target operator pair to which operator B belongs₂＝ρ₂*T＝0.25*10s＝2.5s；

Delay time T of target operator pair to which operator C belongs₃＝ρ₃*T＝0.25*10s＝2.5s；

Delay time T of target operator pair to which operator D belongs₄＝ρ₄*T＝0.25*10s＝2.5s。

In summary, the tolerable total delay time of the deep learning model is averagely allocated to part or all of the target operator pairs, and the same weight is allocated to the previous operator in the part or all of the target operator pairs, so that the delay time of two operators in any target operator pair is the same, and the implementation process is simple and convenient.

(3) And determining the delay time of two operators in the target operator pair by an optimal solution method.

The delay time variable is an unknown variable of the delay time of the target operator pair. The accumulated value of delay time variables of all target operator pairs in the deep learning model refers to: and (3) total delay time variable of all target operator pairs in the deep learning model. The preset total delay time is the tolerable total delay time of the deep learning model. The average of the delay time variable for each target operator pair refers to: an average of multiple solutions of the delay time variable for each target operator pair.

The deep learning model runs in parallel, namely two parallel operators run simultaneously, so that the running time can be saved. The serial operation of the deep learning model refers to executing each operator in sequence, and even if two operators are parallel, the next operator is executed after one operator is executed.

For the condition that an operator in the model data flow graph has branches, an optimal allocation method can be adopted to determine the delay time. Specifically, the value of the delay time variable of each target operator pair is solved by using the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition. Determining delay times of two operators in the target operator pair based on the solved values of the delay time variables of the target operator pair.

1) And aiming at the condition that the deep learning model runs in parallel, if the latter operators in the multiple target operator pairs are the same operator, the accumulated value of the delay time variables of the multiple target operator pairs is the maximum value in the values of the delay time variables of the multiple target operator pairs.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating an optimal allocation of delay times of two operators in a target operator pair when deep learning models are operated in parallel. As shown in fig. 5, operator a and operator C are a first target operator pair, operator B and operator C are a second target operator pair, and operator C and the next operator are a third target operator pair. The delay time variable of two operators in the first target operator pair is T ₁The delay time variable of two operators in the second target operator pair is T₂The delay time variable of two operators in the third target operator pair is T₃. The intermediate variable of the accumulated delay time of the first target operator pair is N₁The intermediate variable of the cumulative delay time of the second target operator pair is N₂The intermediate variable of the accumulated delay time of the third target operator pair is N₃. And, the preset total delay time is T_max。

First target operator pair, delay time T₁，N₁＝T₁；

Second target operator pair, delay time T₂，N₂＝T₂；

Third target operator pair, extensionLate time T₃，N₃＝max(N₁,N₂)+T₃。

For the above variables, the following set of equations may be listed:

the above equation set is solved by a commonly used solver (e.g., MATLAB solver, parsio solver, hypermesh solver, cplex solver, etc.), so as to solve the value of the delay time variable of each target operator pair, i.e., the delay time of two operators in each target operator pair.

2) And aiming at the condition that the deep learning model operates in series, if the latter operators in the target operator pairs are the same operator, the accumulated values of the delay time variables of the target operator pairs are the accumulated values of the delay time variables of the target operator pairs.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating an optimal allocation of delay times of two operators in a target operator pair under the condition that a deep learning model runs in series. As shown in fig. 6, operator a and operator C are a first target operator pair, operator B and operator C are a second target operator pair, and operator C and the next operator are a third target operator pair. The delay time variable of two operators in the first target operator pair is T₁The delay time variable of two operators in the second target operator pair is T₂The delay time variable of two operators in the third target operator pair is T₃. The intermediate variable of the accumulated delay time of the first target operator pair is N₁The intermediate variable of the cumulative delay time of the second target operator pair is N₂The intermediate variable of the accumulated delay time of the third target operator pair is N₃. And, the preset total delay time is T_max。

First target operator pair, delay time T₁，N₁＝T₁；

Second target operator pair, delay time T₂，N₂＝T₂；

Third target operator pair, delay time T₃，N₃＝N₁+N₂+T₃。

For the above variables, the following set of equations may be listed:

In summary, the delay time of two operators in the target operator pair is determined by the optimal solution method, although the implementation is complex, a solver is required to solve the equation set, the solved delay time is an optimal solution, and the deep learning model is operated by using the solved delay time, so that the effect of reducing the load is optimal.

In summary, the embodiment of the present application provides an operating method of a deep learning model, where the method includes: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. The method comprises the steps of firstly obtaining logic relations among operators in a deep learning model, then adjusting the running time of each operator, so that the running total time of at least two operators with continuous logic relations is prolonged, thereby prolonging the total running time of each operator in the deep learning model, and finally running the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator. On the one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, and the application requirement of a real AI application scene can not be met. On the other hand, the optimization of the model operator bottom layer code to reduce the CPU load can lead to very complicated and low usability in practical application, the scheme provided by the application reduces the complexity of practical application, can be applied to various types of CPUs, and has universality.

Based on the same technical concept, embodiments of the present application further provide an operating apparatus of a deep learning model, an electronic device, a computer storage medium, and the like, which can be specifically referred to in the following embodiments.

Fig. 7 is a schematic structural diagram of an operating apparatus of a deep learning model according to an embodiment of the present disclosure. As shown in fig. 7, may include:

an obtaining module 701, configured to obtain a logical relationship between operators in the deep learning model;

an adjusting module 702, configured to adjust an operation time of each operator in the deep learning model according to a logical relationship between the operators, so that a total operation time of at least two operators having consecutive logical relationships in the adjusted deep learning model is prolonged;

an operation module 703, configured to operate the deep learning model according to the adjusted operation time of each operator in the deep learning model.

In one possible implementation, the adjustment module 702 includes: the setting unit is used for setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In one possible implementation, the adjusting module 702 further includes: and the first determining unit is used for determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair aiming at the target operator pair.

In one possible implementation, the adjusting module 702 further includes:

In one possible implementation, the obtaining module 701 includes:

the model analysis unit is used for carrying out model analysis on the deep learning model by a model analysis tool to obtain a model data flow diagram;

An embodiment of the present application discloses an electronic device, as shown in fig. 8, including: a processor 801, a memory 802, and a bus 803, the memory 802 storing machine readable instructions executable by the processor 801, the processor 801 communicating with the memory 802 via the bus 803 when the electronic device is in operation. The machine readable instructions are executed by the processor 801 to perform the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.

The computer program product of the deep learning model operation method provided in this embodiment of the present application includes a computer-readable storage medium storing a nonvolatile program code executable by the processor 801, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for operating a deep learning model, comprising:

acquiring a logical relation between operators in the deep learning model;

2. The method of claim 1, wherein adjusting the operation time of each operator in the deep learning model according to the logical relationship between the operators so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged, comprises:

3. The method of claim 2, wherein the delay time of both operators in the target operator pair is determined by:

4. The method of claim 3, wherein determining the computational effort for each operator by:

5. The method of claim 2, wherein the delay time of both operators in any of the target operator pairs is the same.

6. The method of claim 2, wherein the delay time of both operators in the target operator pair is determined by:

7. The method of claim 6, wherein the accumulated value of delay time variables for all pairs of target operators in the deep learning model is determined by:

8. The method of claim 6, wherein the accumulated value of delay time variables for all pairs of target operators in the deep learning model is determined by:

9. The method of claim 1, wherein obtaining the logical relationship between operators in the deep learning model comprises:

10. An apparatus for operating a deep learning model, comprising:

11. The apparatus of claim 10, wherein the adjustment module comprises:

12. The apparatus of claim 11, wherein the adjustment module further comprises:

13. The apparatus of claim 12, wherein the adjustment module further comprises:

14. The apparatus of claim 11, wherein the delay times of both operators in any of the target operator pairs are the same.

15. The apparatus of claim 11, wherein the adjustment module further comprises:

16. The apparatus of claim 15, wherein the adjustment module further comprises:

17. The apparatus of claim 15, wherein the adjustment module further comprises:

18. The apparatus of claim 10, wherein the obtaining module comprises:

19. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 9.

20. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 9.