CN111860758B

CN111860758B - Deep learning model operation method and device, electronic equipment and medium

Info

Publication number: CN111860758B
Application number: CN202010265726.3A
Authority: CN
Inventors: 靖远
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2024-05-03
Anticipated expiration: 2040-04-07
Also published as: CN111860758A

Abstract

The application provides an operation method, a device, electronic equipment and a medium of a deep learning model, wherein the method comprises the following steps: obtaining a logic relation among operators in the deep learning model; according to the logic relations among the operators, the running time of each operator in the deep learning model is adjusted, so that the total running time of operators with at least two continuous logic relations in the adjusted deep learning model is prolonged; and operating the deep learning model according to the operation time of each operator in the adjusted deep learning model. The scheme provided by the application can realize the purpose of reducing the CPU load under the condition of not reducing the complexity of the model and optimizing the bottom code of the model operator.

Description

Deep learning model operation method and device, electronic equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for operating a deep learning model, electronic equipment and a medium.

Background

Currently, with the explosive development of artificial intelligence, more and more deep learning models are deployed on intelligent devices, such as face recognition models, target detection models, and target tracking models. OCR (Optical Character Recognition ) models, and the like. When the deep learning model is operated on the intelligent equipment, more CPU resources are occupied, so that a small CPU load is brought, the problem of blocking or inaccurate operation in the operation process of the whole deep learning model is caused, and even the system of the intelligent equipment is seriously affected.

Aiming at the problems, the prior art provides the following two technical schemes:

The technical scheme is as follows: by reducing the complexity of the model to reduce the CPU load when running a deep learning model, common approaches include model compression and model clipping. For models of lower complexity, less CPU resources are often consumed at run time, reducing CPU load, but the low complexity of deep learning models tends to be at the expense of model accuracy. The real AI application scene has higher requirements on the precision of the deep learning model, and the application requirements are often difficult to meet by adopting the scheme.

The second technical scheme is as follows: the CPU load in the running process is reduced by optimizing the implementation of the deep learning model operator bottom code. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom code, but the optimized effect is difficult to ensure. Meanwhile, intelligent devices are various, and the related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In summary, the scheme is very complex and has low usability in practical application.

Disclosure of Invention

In view of the above, the present application aims to provide a method, an apparatus, an electronic device, and a medium for operating a deep learning model, which can achieve the purpose of reducing the CPU load without reducing the complexity of the model and optimizing the underlying code of the model operator.

According to a first aspect of the present application, there is provided a method of operating a deep learning model, comprising:

Obtaining a logic relation among operators in the deep learning model;

according to the logic relations among the operators, the running time of each operator in the deep learning model is adjusted, so that the total running time of operators with at least two continuous logic relations in the adjusted deep learning model is prolonged;

And operating the deep learning model according to the operation time of each operator in the adjusted deep learning model.

In a possible implementation manner, according to the logic relationship between the operators, the running time of each operator in the deep learning model is adjusted, so that the total running duration of at least two operators with continuous logic relationship in the adjusted deep learning model is prolonged, including:

Setting a predetermined delay time between two operators in at least one target operator pair so that the total running duration of the target operator pair is prolonged; the two operators in the target operator pair are two operators with continuous logical relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In one possible implementation, the delay time of both operators in the target operator pair is determined by:

And determining the delay time of the two operators in the target operator pair according to the calculated amount of the former operator in the target operator pair aiming at the target operator pair.

In one possible implementation, the calculation amount of each operator is determined by the following steps, including:

Aiming at each operator, acquiring calculation parameters required by the operator in running; the calculated parameters include any one or more of the following: the number of the called computing units and the calling times of the computing units;

the calculation amount of each operator is determined based on the calculation parameters required by the operator at the time of operation.

In one possible embodiment, the delay times of both operators in any of the target operator pairs are the same.

Solving the value of the delay time variable of each target operator pair by taking the constraint condition that the accumulated value of the delay time variable of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variable of each target operator pair is the maximum;

and determining the delay time of the two operators in the target operator pair based on the solved delay time variable value of the target operator pair.

In one possible implementation, the accumulated value of the delay time variable for all target operator pairs in the deep learning model is determined by:

aiming at the situation that the deep learning model runs in parallel, if the latter operator in the plurality of target operator pairs is the same operator, the accumulated value of the delay time variable in the plurality of target operator pairs is the maximum value in the values of the delay time variable in the plurality of target operator pairs.

Aiming at the situation that the deep learning model runs in series, if the latter operator in the plurality of target operator pairs is the same operator, the accumulated value of the delay time variable in the plurality of target operator pairs is the accumulated value of the delay time variable in the plurality of target operator pairs.

In one possible implementation, obtaining a logical relationship between operators in the deep learning model includes:

Carrying out model analysis on the deep learning model by using a model analysis tool to obtain a model data flow diagram;

And obtaining the logic relation among all operators in the deep learning model from the model data flow diagram.

According to a second aspect of the present application, there is provided an apparatus for running a deep learning model, comprising:

The acquisition module is used for acquiring the logic relation among all operators in the deep learning model;

The adjusting module is used for adjusting the running time of each operator in the deep learning model according to the logic relation among the operators so that the total running duration of the operators with at least two continuous logic relations in the adjusted deep learning model is prolonged;

and the operation module is used for operating the deep learning model according to the operation time of each operator in the adjusted deep learning model.

In one possible embodiment, the adjustment module includes:

A setting unit configured to set, for at least one target operator pair, a predetermined delay time between two operators in the target operator pair so that an operation total duration of the target operator pair is prolonged; the two operators in the target operator pair are two operators with continuous logical relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In one possible embodiment, the adjustment module further comprises:

A first determining unit, configured to determine, for the target operator pair, a delay time of two operators in the target operator pair according to a calculation amount of a previous operator in the target operator pair.

In one possible embodiment, the adjustment module further comprises:

The obtaining unit is used for obtaining calculation parameters required by each operator in the operation process of the operator; the calculated parameters include any one or more of the following: the number of the called computing units and the calling times of the computing units;

And the second determining unit is used for determining the calculated amount of each operator based on the calculation parameters required by the operator in the running process.

In one possible embodiment, the adjustment module further comprises:

The solving unit is used for solving the value of the delay time variable of each target operator pair by taking the maximum average value of the delay time variable of each target operator pair as a constraint condition when the accumulated value of the delay time variable of all the target operator pairs in the deep learning model is not more than the preset total delay time;

And a third determining unit, configured to determine delay times of two operators in the target operator pair based on the values of delay time variables of the target operator pair that are solved.

In one possible embodiment, the adjustment module further comprises:

And the fourth determining unit is used for determining the maximum value of the delay time variable of the plurality of target operator pairs as the accumulated value of the delay time variable of the plurality of target operator pairs if the latter operators in the plurality of target operator pairs are all the same operator according to the parallel running condition of the deep learning model.

In one possible embodiment, the adjustment module further comprises:

And a fifth determining unit, configured to determine, for the case where the deep learning model operates serially, an accumulated value of the values of the delay time variables of the plurality of target operator pairs as an accumulated value of the delay time variables of the plurality of target operator pairs if the latter operators in the plurality of target operator pairs are all the same operator.

In one possible implementation, the acquiring module includes:

The model analysis unit is used for carrying out model analysis on the deep learning model by using a model analysis tool to obtain a model data flow diagram;

and the relation acquisition unit is used for acquiring the logic relation among all operators in the deep learning model from the model data flow diagram.

According to a third aspect of the present application, there is provided an electronic device comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method of any of the possible implementations of the first aspect.

According to a fourth aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the possible implementations of the first aspect described above.

The embodiment of the application provides a method, a device, electronic equipment and a medium for operating a deep learning model, wherein the method comprises the following steps: obtaining a logic relation among operators in the deep learning model; according to the logic relations among the operators, the running time of each operator in the deep learning model is adjusted, so that the total running time of operators with at least two continuous logic relations in the adjusted deep learning model is prolonged; and operating the deep learning model according to the operation time of each operator in the adjusted deep learning model. The scheme provided by the application aims at solving the problems that the precision of a model is sacrificed due to the fact that the complexity of the model is reduced in the traditional scheme, and the problem that the complexity of the optimized bottom code is high and the universality is poor due to the fact that the implementation of the bottom code of an optimized model operator is reduced due to the fact that the complexity of the model is reduced in the traditional scheme, namely, firstly, the logic relations among operators in a deep learning model are obtained, then the running time of each operator is adjusted, so that the total running time of operators with at least two continuous logic relations is prolonged, the total running time of each operator in the deep learning model is prolonged, and finally, the deep learning model is operated according to the running time of each operator in the adjusted deep learning model. The scheme provided by the application can realize the purpose of reducing the CPU load under the condition of not reducing the complexity of the model and optimizing the bottom code of the model operator.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a flowchart of a method for operating a deep learning model according to an embodiment of the present application;

FIG. 2 shows a specific example of a model data flow graph of a deep learning model (VGG-19 model);

FIG. 3 shows a schematic diagram of a target operator pair;

FIG. 4 shows a schematic of the allocation of delay times for two operators in a target operator pair;

FIG. 5 shows a schematic diagram of optimal allocation of delay times for two operators in a target operator pair in the case where the deep learning model is running in parallel;

FIG. 6 shows a schematic diagram of optimal allocation of delay times for two operators in a target operator pair in the case where the deep learning model is run serially;

fig. 7 is a schematic structural diagram of an operation device of a deep learning model according to an embodiment of the present application;

fig. 8 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

In the first conventional scheme, the CPU load when the deep learning model is run is reduced by reducing the complexity of the model, and common means include model compression and model clipping. For models of lower complexity, less CPU resources are often consumed at run time, reducing CPU load, but the low complexity of deep learning models tends to be at the expense of model accuracy. The real AI application scene has higher requirements on the precision of the deep learning model, and the application requirements are often difficult to meet by adopting the scheme.

In the second traditional scheme, the CPU load in the running process is reduced by optimizing the implementation of the bottom code of the deep learning model operator. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom code, but the optimized effect is difficult to ensure. Meanwhile, intelligent devices are various, and the related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In summary, the scheme is very complex and has low usability in practical application.

Based on the above technical problems, the embodiment of the application provides an operation method of a deep learning model, which is described in detail below.

Referring to fig. 1, fig. 1 is a flowchart of an operation method of a deep learning model according to an embodiment of the application. As shown in fig. 1, the operation method mainly includes the following steps:

Step S101, obtaining a logic relation among operators in a deep learning model;

Step S102, according to the logic relations among operators, the running time of each operator in the deep learning model is adjusted, so that the total running time of operators with at least two continuous logic relations in the adjusted deep learning model is prolonged;

And step S103, operating the deep learning model according to the operation time of each operator in the adjusted deep learning model.

In step S101, the deep learning model refers to a neural network having "multiple hidden layers", where "multiple hidden layers" represents more than three hidden layers. Deep learning models typically have eighty-nine or even more hidden layers. More hidden layers and more corresponding parameters such as weights, thresholds and the like of the neuron connection. This means that the deep learning model can automatically extract many complex features. With the advent of cloud computing and big data age, massive training data is matched with a layer-by-layer pre-training and error back propagation fine tuning method, so that the training efficiency of a model is greatly improved, and meanwhile, the risk of over fitting is reduced. Common deep learning models include: alexNet model, VGG Net model, googleNet model, resNet model, resNeXt model, R-CNN model, squeezeNet model, GAN model, and the like. The deep learning model in this embodiment is a delay insensitive deep learning model, that is, a deep learning model in which adverse effects caused by delay are not considered in performance indexes.

Operator refers to a mapping O of function space to function space: X.fwdarw.X. Operators in a broad sense can be generalized to any space. In this embodiment, the operator of the deep learning model refers to the operation of each layer in the deep learning model. That is, the operations of each layer in the deep learning model are packaged into one operator.

A logical relationship, i.e. "dependency", refers to a relationship in which a change in one of two operators will affect the other operator in the deep learning model. Moreover, the logical relationship is relative, and for the operator A, the logical relationship can be a pre-operator of the operator B, a post-operator of the operator C or a parallel operator of the operator D. Therefore, the operator A and the operator B are in a prepositive relation, the operator A and the operator C are in a postpositive operator, and the operator A and the operator D are in a parallel relation.

In one possible implementation, the deep learning model is subjected to model analysis by using a model analysis tool to obtain a model data flow diagram. And then obtaining the logic relation among all operators in the deep learning model from the model data flow diagram.

Regarding model parsing tools, a general deep learning reasoning engine or framework (such as Pytorch, tensorflow, manufacturer's self-research products, etc.) is provided with model parsing tools, and some special model parsing tools are also available on the market for performing model parsing on a deep learning model.

The model data flow graph is obtained from the parsed deep learning model file, and the logical relationship is obtained from the model data flow graph. Model dataflow graphs can be used to clearly demonstrate the operators of each layer of the deep learning model and the logical relationships between the individual operators. Referring to fig. 2, fig. 2 shows a specific example of a model data flow diagram of a deep learning model (VGG-19 model). As shown in FIG. 2, the model dataflow graph of the VGG-19 model includes a two-dimensional convolution operator (conv), a pooling operator (pool), and a full join operator (fc). It should be noted that the model dataflow graph in fig. 2 is only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other types of model dataflow graphs, such as a model dataflow graph with branches, are also possible.

In step S102, the logical relationship phase is continuously referred to as: the output of one operator is the input condition of the other operator, and then the two operators are logically connected. As shown in FIG. 2, the output result of the first two-dimensional convolution operator in the model flow chart is the input condition of the second two-dimensional convolution operator, so that the logic relations of the first two-dimensional convolution operators in the model flow chart are continuous.

The runtime of each operator is the duration between the start of the operation of that operator and the start of the operation of the next operator. The total operation duration of the operators with the continuous two logical relations is the duration from the beginning operation of the former operator to the ending operation of the latter operator in the operators with the continuous two logical relations.

In the traditional scheme, when the deep learning model runs, each operator is sequentially executed without delay according to the logic relation of the model data flow diagram. The embodiment prolongs the total operation duration of operators with at least two continuous logical relations by adjusting the operation time of each operator in the model data flow diagram. For example: the total duration of operation of the two-dimensional convolution operator and the pooling operator is extended by adjusting the operation time of the second operator (two-dimensional convolution operator) and the third operator (pooling operator) in fig. 2.

In step S103, the operation refers to performing deep inference on operators in the deep learning model, and typically using a deep learning inference engine or framework (e.g. Pytorch, tensorflow, vendor self-research products, etc.).

For the situation that the load of the mobile terminal is reduced, the step S103 is implemented at the mobile terminal, and the step S101 and the step S102 are implemented at the cloud, so that the calculation amount of the mobile terminal is reduced. Step S103 improves the function of the inference framework of the deep learning model of the mobile terminal, that is, improves the operation sequence of the deep learning model, and improves the operation of each operator from the non-interval operation to the operation of each operator according to the adjusted operation time, thereby reducing the CPU load of the mobile terminal. The mobile terminal (mobile internet terminal) refers to a terminal device that accesses the internet by surfing the internet through wireless network technology. The mobile terminal can be a mobile internet terminal such as a mobile phone, a tablet personal computer, a vehicle recorder, an intelligent navigator and the like.

For the situation that the cloud end reduces the load, the steps S101 to S103 are all implemented in the cloud end, so that the load of the cloud end CPU is reduced under the situation that the cloud end CPU resources are abnormally strained.

In conclusion, according to the operation time of each operator in the adjusted deep learning model, the deep learning model is operated, so that the utilization of CPU resources can be reduced, and the CPU load is reduced. The scheme provided by the embodiment of the application can realize the purpose of reducing the CPU load under the condition of not reducing the complexity of the model and optimizing the bottom code of the model operator. On one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, so that the application requirements of the real AI application scene can not be met. On the other hand, the scheme provided by the application reduces the complexity of practical application because of optimizing the model operator bottom code to reduce the CPU load, and the scheme provided by the application can be applied to various types of CPUs and has universality.

The following describes the above step S102 in detail.

The two operators in the target operator pair are two operators with consecutive logical relations. Each operator in the deep learning model may form a plurality of target operator pairs according to a logical relationship, please refer to fig. 3, fig. 3 is a schematic diagram of the target operator pairs. As shown in fig. 3, the logical relationship of each operator in the deep learning model is operator a-operator B-operator C-operator D, so that the operator a and the operator B form a target operator pair, the operator B and the operator C form a target operator pair, the operator C and the operator D form a target operator pair, and the operator D and the operator to be operated next form a target operator pair. The combination of the target operator pairs in fig. 3 is only used to teach a person skilled in the art how to implement the present invention, and the present invention is not limited thereto, and one operator may point to a plurality of operators, and one operator and a plurality of operators form a plurality of target operator pairs respectively.

The delay time is used to characterize the execution interval time of both operators in the target operator pair. For example: the operator a and the operator B in the existing target operator pair have no delay in execution, and after the operator a in the target operator pair is executed, the operator B in the target operator pair is delayed for a certain time and then continues to be executed.

The total operation duration of the target operator pair is the duration between the beginning operation of the former operator and the ending operation of the latter operator in the target operator pair. Assuming that the target operator pair includes an operator a and an operator B, the time period from the start of the operation of the operator a to the end of the operation is denoted as T _A, and the time period from the start of the operation of the operator B to the end of the operation is denoted as T _B. The execution interval time between the operator a and the operator B, that is, the time period between the end of the operation of the operator a and the start of the operation of the operator B, is denoted as T _C. Then the total length of time T _{Total (S)}＝T_A+T_B+T_C of the run of the target operator pair.

In one possible embodiment, for at least one target operator pair, a predetermined delay time is set between two operators in the target operator pair, such that the total length of operation of the target operator pair is extended. In this embodiment, two cases are included, the first case: setting a preset delay time between two operators in a partial target operator pair; in the second case, a predetermined delay time is set between two operators in the entire target operator pair.

How to determine the delay times of the two operators in the target operator pair is described in detail below.

(1) For a target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.

Regarding the calculation amount of the operators, it is to be noted that, for each operator, the calculation parameters required by the operator in the operation are obtained; the calculated parameters include any one or more of the following: the number of calculation units called and the number of times of calling the calculation units. The computational effort of each operator is then determined based on the computational parameters that are required by that operator at runtime. For example, for a two-dimensional convolution operator, it includes a plurality of simple operations (e.g., product, sum, etc.), each of which requires a corresponding computational unit to be invoked at runtime, and thus can be determined by the number of computational units invoked and/or the number of times that computational units are invoked when determining the computational amount of the two-dimensional convolution.

It should be noted that, the total delay time of all the target operator pairs in the deep learning model needs to satisfy the condition that is lower than the total delay time tolerable by the deep learning model. That is, the present embodiment is applicable to a deep learning model that is not delay-sensitive, and is not applicable to a deep learning model that has a high requirement for real-time. The total tolerable running time of all operators of the deep learning model is subtracted by the actual running time of all operators of the deep learning model, so that the total tolerable delay time of the deep learning model can be obtained. For example, the total duration of the tolerable operation of all operators of the face recognition model is about 10 seconds, and the total duration of the actual operation of all operators of the face recognition model is about 2 seconds, so that the total delay time tolerable by the face recognition model is about 8 seconds.

For the first case, a predetermined delay time is set between two operators in the partial target operator pair.

And allocating the total delay time which can be tolerated by the deep learning model to the partial target operator pairs according to the calculated amount of the former operator in the partial target operator pairs. In the implementation, each operator is divided into a plurality of levels according to the calculated amount of each operator, the calculated amount is small, and the calculated amount is large. If the calculated amount of the operator is small, the weight of the operator can be distributed to be 0; if the calculated amount of the operator is in, the weight of the operator can be distributed as 2; if the computation amount of an operator is high, the weight of the operator can be assigned to 3. Calculating the total weight omega _i of each operator, calculating the ratio rho _i of the weight omega _i of each operator and the total weight omega _i, multiplying the ratio rho _i by the total delay time T tolerable by the deep learning model to obtain the delay time T _i＝ρ_i of the target operator pair to which each operator belongs, wherein the operator is the previous operator of the target operator pair to which each operator belongs. For an operator with small calculation amount, the delay time T _i of the target operator pair to which the operator belongs is 0, and the delay time T _i corresponds to that the delay time of which only is non-zero between two operators in part of the target operator pair.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating the allocation of delay time of two operators in the target operator pair. As shown in fig. 4, assuming that the operator a is small in calculation amount, ω ₁ =0; in the calculated amount of operator B, ω ₂ =2; the calculated amount of the operator C is small, and omega ₃ =2; the operator D is computationally intensive, ω ₄ =3. And assuming t=10s.

Ρ ₁＝ω₁/(ω₁+ω₂+ω₃+ω₄) =0/(0+2+2+3) =0 corresponding to operator a;

ρ ₂＝ω₂/(ω₁+ω₂+ω₃+ω₄) =2/(0+2+2+3) =0.286 corresponding to operator B;

ρ ₃＝ω₃/(ω₁+ω₂+ω₃+ω₄) =2/(0+2+2+3) =0.286 corresponding to operator C;

Operator D corresponds to ρ ₄＝ω₄/(ω₁+ω₂+ω₃+ω₄) =3/(0+2+2+3) =0.428.

The delay time T ₁＝ρ₁ of the target operator pair to which operator a belongs is t=0s;

the delay time T ₂＝ρ₂ x t=0.286 x 10s=2.86 s of the target operator pair to which the operator B belongs;

The delay time T ₃＝ρ₃ x t=0.286 x 10s=2.86 s of the target operator pair to which the operator C belongs;

The delay time T ₄＝ρ₄ x t=0.428 x 10s=4.28 s of the target operator pair to which the operator D belongs.

The above values assigned to operator weights are only used to teach one skilled in the art how to practice the invention, the invention is not limited thereto, and other values may be assigned to operator weights in a specific implementation. And, the 10s tolerable total delay time of the deep learning model is only used for teaching the person skilled in the art how to implement the invention, and different tolerable total delay times are set according to different deep learning models in the implementation.

For the second case, a predetermined delay time is set between two operators in the total target operator pair.

And distributing the total delay time which can be tolerated by the deep learning model to all target operator pairs according to the calculated amount of the previous operator in all target operator pairs. In particular embodiments, the calculated amount Q _i of each operator is determined separately, and the weight of each operator is determined from the calculated amount Q _i of that operator. For example, the minimum calculation amount to the maximum calculation amount of the operator are divided into a plurality of numerical ranges in advance. The weight of the operator in the first numerical range is 1, the weight of the operator in the second numerical range is 2, … and the like, so that a weight table of the operator is constructed. In the weight table, each weight corresponds to a numerical range of the calculated amount of the operator. When the calculated amount Q _i of the operator is known, the corresponding weight can be queried through the weight table. Calculating the total weight omega _i of each operator, calculating the ratio rho _i of the weight omega _i of each operator and the total weight omega _i, multiplying the ratio rho _i by the total delay time T tolerable by the deep learning model to obtain the delay time T _i＝ρ_i of the target operator pair to which each operator belongs, wherein the operator is the previous operator of the target operator pair to which each operator belongs.

As shown in fig. 4, assuming that the calculated amount of the operator a is in the first numerical range, the calculated amount of the operator B is in the second numerical range, the calculated amount of the operator C is in the fifth numerical range, and the calculated amount of the operator D is in the second numerical range. Then ω ₁＝1,ω₂＝2,ω₃＝5,ω₄ =2. And assuming t=10s.

Ρ ₁＝ω₁/(ω₁+ω₂+ω₃+ω₄) =1/(1+2+5+2) =0.1 corresponding to operator a;

ρ ₂＝ω₂/(ω₁+ω₂+ω₃+ω₄) =2/(1+2+5+2) =0.2 corresponding to operator B;

ρ ₃＝ω₃/(ω₁+ω₂+ω₃+ω₄) =5/(1+2+5+2) =0.5 corresponding to operator C;

the operator D corresponds to ρ ₄＝ω₄/(ω₁+ω₂+ω₃+ω₄) =2/(1+2+5+2) =0.2.

The delay time T ₁＝ρ₁ x t=0.1 x 10 s=1s of the target operator pair to which the operator a belongs;

The delay time T ₂＝ρ₂ x t=0.2 x 10 s=2s of the target operator pair to which the operator B belongs;

The delay time T ₃＝ρ₃ ×t=0.5×10s=5s of the target operator pair to which the operator C belongs;

The delay time T ₄＝ρ₄ x t=0.2 x 10 s=2s of the target operator pair to which the operator D belongs.

In summary, for the target operator pair, determining the delay time of the two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair. In the scheme, aiming at operators with large calculation amount, the weight is set to be large, so that the delay time of two operators in the corresponding target operator pair is long; for operators with small calculation amount, the set weight is small and even 0, so that the delay time of two operators in the corresponding target operator pair is short and even 0. In this way, in the running process of the model, the operator with large calculation amount is given enough delay time to reduce the CPU load of the operator in unit time, so that the CPU load of each operator in unit time can be reduced.

(2) The delay times of both operators in any of the target operator pairs are the same.

The total delay time tolerable by the deep learning model is distributed to partial target operator pairs. In a specific implementation, a part of target operator pairs needing to set delay time are first determined, and the target operator pairs generally occupy more CPU resources, such as various convolution operators. Second, the weights of the operators before the selected partial target operator pair are set to 1, and the weights of the operators before the other target operator pairs are set to 0. Calculating the total weight omega _i of each operator, calculating the ratio rho _i of the weight omega _i of each operator and the total weight omega _i, multiplying the ratio rho _i by the total delay time T tolerable by the deep learning model to obtain the delay time T _i＝ρ_i of the target operator pair to which each operator belongs, wherein the operator is the previous operator of the target operator pair to which each operator belongs. Thus, the delay time of two operators in the selected partial target operator pair is the same, and the delay time of two operators in other target operator pairs is 0.

As shown in fig. 4, it is assumed that the operators B and C are convolution operators, ω ₁＝0,ω₂＝1,ω₃＝1,ω₄ =0. And assuming t=10s.

Ρ ₁＝ω₁/(ω₁+ω₂+ω₃+ω₄) =0/(0+1+1+0) =0 corresponding to operator a;

ρ ₂＝ω₂/(ω₁+ω₂+ω₃+ω₄) =1/(0+1+1+0) =0.5 corresponding to operator B;

ρ ₃＝ω₃/(ω₁+ω₂+ω₃+ω₄) =1/(0+1+1+0) =0.5 corresponding to operator C;

the operator D corresponds to ρ ₄＝ω₄/(ω₁+ω₂+ω₃+ω₄) =0/(0+1+1+0) =0.

The delay time T ₁＝ρ₁ x t=0 x 10 s=0s of the target operator pair to which the operator a belongs;

the delay time T ₂＝ρ₂ x t=0.5 x 10 s=5s of the target operator pair to which the operator B belongs;

The delay time T ₄＝ρ₄ x t=0 x 10 s=0 s of the target operator pair to which the operator D belongs.

The total delay time tolerable by the deep learning model is distributed to all target operator pairs. In a specific implementation, the weight ω _i of each operator is the same, for example, the weight ω _i of each operator is 1. Calculating the total weight omega _i of each operator, calculating the ratio rho _i of the weight omega _i of each operator and the total weight omega _i, multiplying the ratio rho _i by the total delay time T tolerable by the deep learning model to obtain the delay time T _i＝ρ_i of the target operator pair to which each operator belongs, wherein the operator is the previous operator of the target operator pair to which each operator belongs. In this way, the delay times of both operators in all target operator pairs are the same.

As shown in fig. 4, it is assumed that the weights of the operator a, the operator B, the operator C, and the operator D are the same, i.e., ω ₁＝1,ω₂＝1,ω₃＝1,ω₄ =1. And assuming t=10s.

Ρ ₁＝ω₁/(ω₁+ω₂+ω₃+ω₄) =1/(1+1+1+1) =0.25 corresponding to operator a;

ρ ₂＝ω₂/(ω₁+ω₂+ω₃+ω₄) =1/(1+1+1+1) =0.25 corresponding to operator B;

ρ ₃＝ω₃/(ω₁+ω₂+ω₃+ω₄) =1/(1+1+1+1) =0.25 corresponding to operator C;

The operator D corresponds to ρ ₄＝ω₄/(ω₁+ω₂+ω₃+ω₄) =1/(1+1+1+1) =0.25.

The delay time T ₁＝ρ₁ x t=0.25 x 10 s=2.5 s of the target operator pair to which the operator a belongs;

the delay time T ₂＝ρ₂ x t=0.25 x 10 s=2.5 s of the target operator pair to which the operator B belongs;

The delay time T ₃＝ρ₃ ×t=0.25×10s=2.5 s of the target operator pair to which the operator C belongs;

the delay time T ₄＝ρ₄ x t=0.25 x 10 s=2.5 s of the target operator pair to which the operator D belongs.

In summary, the total delay time tolerable by the deep learning model is distributed to part or all of the target operator pairs, and the same weight is distributed to the former operator in the part or all of the target operator pairs, so that the delay time of two operators in any of the target operator pairs is the same, and the implementation process is simple and convenient.

(3) And determining the delay time of the two operators in the target operator pair through an optimal solving method.

The delay time variable is an unknown variable of the delay time of the target operator pair. The accumulated value of the delay time variable of all target operator pairs in the deep learning model refers to: the total delay time variable of all target operator pairs in the deep learning model. The preset total delay time is the total delay time which can be tolerated by the deep learning model. The average value of the delay time variable for each target operator pair refers to: an average of a plurality of solutions for the delay time variable for each target operator pair.

The parallel operation of the deep learning model refers to the simultaneous operation of two operators in parallel, so that the operation time can be saved. The deep learning model serial operation refers to that each operator is executed in sequence, and even if two operators are juxtaposed, the next operator is executed after one operator is executed.

For the case where there is a branch for an operator in the model dataflow graph, an optimal allocation method may be employed to determine the delay time. Specifically, solving the value of the delay time variable of each target operator pair by taking the constraint condition that the accumulated value of the delay time variable of all the target operator pairs in the deep learning model is not larger than the preset total delay time and the average value of the delay time variable of each target operator pair is the maximum. And determining the delay time of the two operators in the target operator pair based on the solved delay time variable value of the target operator pair.

1) Aiming at the situation that the deep learning model runs in parallel, if the latter operator in the plurality of target operator pairs is the same operator, the accumulated value of the delay time variable in the plurality of target operator pairs is the maximum value in the values of the delay time variable in the plurality of target operator pairs.

Referring to fig. 5, fig. 5 is a schematic diagram of optimal allocation of delay time of two operators in a target operator pair in the case where the deep learning model runs in parallel. As shown in fig. 5, the operator a and the operator C are a first target operator pair, the operator B and the operator C are a second target operator pair, and the operator C and the next operator are a third target operator pair. The delay time variable of the two operators in the first target operator pair is T ₁, the delay time variable of the two operators in the second target operator pair is T ₂, and the delay time variable of the two operators in the third target operator pair is T ₃. The intermediate variable of the accumulated time delay of the first target operator pair is N ₁, the intermediate variable of the accumulated time delay of the second target operator pair is N ₂, and the intermediate variable of the accumulated time delay of the third target operator pair is N ₃. And, the preset total delay time is T _max.

A first target operator pair, delay time T ₁,N₁＝T₁;

a second target operator pair, delay time T ₂,N₂＝T₂;

And a third target operator pair, delay time T ₃,N₃＝max(N₁,N₂)+T₃.

For the variables mentioned above, the following system of equations can be listed:

The above equation set is solved by a commonly used solver (e.g., MATLAB solver, pardiso solver, hypermesh solver, cplex solver, etc.), so as to solve the values of the delay time variables of each target operator pair, i.e., the delay times of the two operators in each target operator pair.

2) Aiming at the situation that the deep learning model runs in series, if the latter operator in the plurality of target operator pairs is the same operator, the accumulated value of the delay time variable in the plurality of target operator pairs is the accumulated value of the delay time variable in the plurality of target operator pairs.

Referring to fig. 6, fig. 6 is a schematic diagram of optimal allocation of delay time of two operators in a target operator pair in the case where the deep learning model is run in series. As shown in fig. 6, the operator a and the operator C are a first target operator pair, the operator B and the operator C are a second target operator pair, and the operator C and the next operator are a third target operator pair. The delay time variable of the two operators in the first target operator pair is T ₁, the delay time variable of the two operators in the second target operator pair is T ₂, and the delay time variable of the two operators in the third target operator pair is T ₃. The intermediate variable of the accumulated time delay of the first target operator pair is N ₁, the intermediate variable of the accumulated time delay of the second target operator pair is N ₂, and the intermediate variable of the accumulated time delay of the third target operator pair is N ₃. And, the preset total delay time is T _max.

A first target operator pair, delay time T ₁,N₁＝T₁;

a second target operator pair, delay time T ₂,N₂＝T₂;

And a third target operator pair, delay time T ₃,N₃＝N₁+N₂+T₃.

In summary, the delay time of the two operators in the target operator pair is determined by the optimal solution method, and although the implementation is complex, a solver is needed to solve the equation set, the solved delay time is the optimal solution, and the solved delay time is used for running the deep learning model, so that the effect of reducing the load is optimal.

In summary, an embodiment of the present application provides a method for operating a deep learning model, where the method includes: obtaining a logic relation among operators in the deep learning model; according to the logic relations among the operators, the running time of each operator in the deep learning model is adjusted, so that the total running time of operators with at least two continuous logic relations in the adjusted deep learning model is prolonged; and operating the deep learning model according to the operation time of each operator in the adjusted deep learning model. The scheme provided by the application aims at solving the problems that the accuracy of a model is sacrificed due to the fact that the complexity of the model is reduced in the prior art, and the problem that the complexity of the optimized bottom code is high and the universality is poor due to the fact that the implementation of the bottom code of an optimized model operator is reduced due to the fact that the complexity of the model is reduced in the prior art, namely, firstly, the logic relations among operators in a deep learning model are obtained, then the running time of each operator is adjusted, so that the total running time of operators with at least two continuous logic relations is prolonged, the total running time of each operator in the deep learning model is prolonged, and finally, the deep learning model is operated according to the running time of each operator in the adjusted deep learning model. The scheme provided by the application can realize the purpose of reducing the CPU load under the condition of not reducing the complexity of the model and optimizing the bottom code of the model operator. On one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, so that the application requirements of the real AI application scene can not be met. On the other hand, the scheme provided by the application reduces the complexity of practical application because of optimizing the model operator bottom code to reduce the CPU load, and the scheme provided by the application can be applied to various types of CPUs and has universality.

Based on the same technical concept, the embodiment of the application also provides a deep learning model running device, electronic equipment, a computer storage medium and the like, and particularly can be seen in the following embodiments.

Fig. 7 is a schematic structural diagram of an operation device of a deep learning model according to an embodiment of the present application. As shown in fig. 7, may include:

An obtaining module 701, configured to obtain a logical relationship between each operator in the deep learning model;

The adjustment module 702 is configured to adjust the running time of each operator in the deep learning model according to the logical relationship between each operator, so that the total running duration of the operators with at least two consecutive logical relationships in the adjusted deep learning model is prolonged;

And the operation module 703 is configured to operate the deep learning model according to the adjusted operation time of each operator in the deep learning model.

In one possible implementation, the adjustment module 702 includes: a setting unit configured to set, for at least one target operator pair, a predetermined delay time between two operators in the target operator pair so that an operation total duration of the target operator pair is prolonged; the two operators in the target operator pair are two operators with continuous logical relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

In one possible implementation, the adjustment module 702 further includes: a first determining unit, configured to determine, for the target operator pair, a delay time of two operators in the target operator pair according to a calculation amount of a previous operator in the target operator pair.

In one possible implementation, the adjustment module 702 further includes:

In one possible implementation, the obtaining module 701 includes:

the model analysis unit is used for carrying out model analysis on the deep learning model by a model analysis tool to obtain a model data flow diagram;

The embodiment of the application discloses an electronic device, as shown in fig. 8, comprising: a processor 801, a memory 802, and a bus 803, the memory 802 storing machine readable instructions executable by the processor 801, the processor 801 and the memory 802 communicating via the bus 803 when the electronic device is operating. The machine readable instructions, when executed by the processor 801, perform the methods described in the method embodiments above, and the specific implementation may refer to the method embodiments and are not described herein.

The computer program product of the method for operating a deep learning model according to the embodiment of the present application includes a computer readable storage medium storing non-volatile program code executable by the processor 801, where the program code includes instructions for executing the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A data processing method of a processor for running a deep learning model, comprising:

Obtaining a logic relation among operators in the deep learning model;

operating the deep learning model according to the operation time of each operator in the adjusted deep learning model;

according to the logical relationship between the operators, the operation time of each operator in the deep learning model is adjusted, so that the total operation duration of operators with at least two continuous logical relationships in the adjusted deep learning model is prolonged, and the method comprises the following steps:

For at least one target operator pair, setting a preset delay time between two operators in the target operator pair so as to prolong the running total time length of the target operator pair and reduce the CPU load of the operators in unit time; the two operators in the target operator pair are two operators with continuous logical relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

2. The method of claim 1, wherein the delay time for both operators in the target operator pair is determined by:

3. The method of claim 2, wherein determining the calculated amount for each operator comprises:

4. The method of claim 1, wherein the delay times of both operators in any of the target operator pairs are the same.

5. The method of claim 1, wherein the delay time for both operators in the target operator pair is determined by:

6. The method of claim 5, wherein the accumulated value of delay time variables for all target operator pairs in the deep learning model is determined by:

7. The method of claim 5, wherein the accumulated value of delay time variables for all target operator pairs in the deep learning model is determined by:

8. The method of claim 1, wherein obtaining the logical relationship between the operators in the deep learning model comprises:

9. A data processing apparatus for a processor, comprising:

the operation module is used for operating the deep learning model according to the operation time of each operator in the adjusted deep learning model;

The adjustment module includes:

A setting unit, configured to set a predetermined delay time between two operators in at least one target operator pair, so that a total running duration of the target operator pair is prolonged, so as to reduce a CPU load of the operators in a unit time; the two operators in the target operator pair are two operators with continuous logical relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.

10. The apparatus of claim 9, wherein the adjustment module further comprises:

11. The apparatus of claim 10, wherein the adjustment module further comprises:

12. The apparatus of claim 9, wherein delay times of both operators in any of the target operator pairs are the same.

13. The apparatus of claim 9, wherein the adjustment module further comprises:

14. The apparatus of claim 13, wherein the adjustment module further comprises:

15. The apparatus of claim 13, wherein the adjustment module further comprises:

16. The apparatus of claim 9, wherein the acquisition module comprises:

17. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method of any one of claims 1 to 8.

18. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 8.