CN111860758A - Operation method and device of deep learning model, electronic equipment and medium - Google Patents

Operation method and device of deep learning model, electronic equipment and medium Download PDF

Info

Publication number
CN111860758A
CN111860758A CN202010265726.3A CN202010265726A CN111860758A CN 111860758 A CN111860758 A CN 111860758A CN 202010265726 A CN202010265726 A CN 202010265726A CN 111860758 A CN111860758 A CN 111860758A
Authority
CN
China
Prior art keywords
operator
operators
deep learning
learning model
delay time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010265726.3A
Other languages
Chinese (zh)
Other versions
CN111860758B (en
Inventor
靖远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN202010265726.3A priority Critical patent/CN111860758B/en
Publication of CN111860758A publication Critical patent/CN111860758A/en
Application granted granted Critical
Publication of CN111860758B publication Critical patent/CN111860758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Feedback Control In General (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides an operation method, an operation device, electronic equipment and a medium of a deep learning model, wherein the method comprises the following steps: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator.

Description

Operation method and device of deep learning model, electronic equipment and medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an operation method and device of a deep learning model, electronic equipment and a medium.
Background
At present, with the explosive development of artificial intelligence, more and more deep learning models are deployed on intelligent devices, such as a face recognition model, a target detection model, and a target tracking model. An OCR (Optical Character Recognition) model, and the like. More CPU resources are occupied when the deep learning model is operated on the intelligent equipment, so that small CPU load is brought, the problem that the whole deep learning model is jammed or inaccurately operated in the operation process is caused, and even bad influence is brought to the system of the intelligent equipment.
In order to solve the above problems, the prior art proposes the following two technical solutions:
the first technical scheme is as follows: the CPU load when running a deep learning model is reduced by reducing the complexity of the model, and common approaches include model compression and model clipping. For models with lower complexity, less CPU resources are often occupied in running, thereby reducing the CPU load, but the low complexity of deep learning models often comes at the expense of the accuracy of the models. The accuracy of a deep learning model in a real AI application scene has higher requirements, and the application requirements are often difficult to meet by adopting the scheme.
The second technical scheme is as follows: the CPU load at runtime is reduced by optimizing the implementation of the underlying code of the deep learning model operator. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom layer code, but the effect after optimization is difficult to ensure. Meanwhile, intelligent devices are various, and related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In conclusion, the scheme is very complex and low in usability in practical application.
Disclosure of Invention
In view of this, an object of the present application is to provide an operating method, an operating apparatus, an electronic device, and a medium for a deep learning model, which can achieve the purpose of reducing the CPU load without reducing the complexity of the model and optimizing the underlying code of a model operator.
According to a first aspect of the present application, there is provided a method for operating a deep learning model, including:
acquiring a logical relation between operators in the deep learning model;
adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;
and operating the deep learning model according to the adjusted running time of each operator in the deep learning model.
In a possible implementation manner, adjusting the operation time of each operator in the deep learning model according to the logical relationship between each operator, so that the total operation time of at least two operators with consecutive logical relationship in the adjusted deep learning model is prolonged, includes:
setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.
In one possible embodiment, the delay time of both operators of the target operator pair is determined by:
and aiming at the target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.
In one possible embodiment, the calculation amount of each operator is determined by the following steps:
for each operator, acquiring a calculation parameter required by the operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;
and determining the calculation amount of each operator based on the calculation parameters required by the operator at the running time.
In one possible implementation, the delay times of both operators in any of the target operator pairs are the same.
In one possible embodiment, the delay time of both operators of the target operator pair is determined by:
solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;
Determining delay times of two operators in the target operator pair based on the solved values of the delay time variables of the target operator pair.
In one possible implementation, the accumulated value of the delay time variables of all target operator pairs in the deep learning model is determined by:
and aiming at the condition that the deep learning model runs in parallel, if the latter operators in the multiple target operator pairs are the same operator, the accumulated value of the delay time variables of the multiple target operator pairs is the maximum value in the values of the delay time variables of the multiple target operator pairs.
In one possible implementation, the accumulated value of the delay time variables of all target operator pairs in the deep learning model is determined by:
and aiming at the condition that the deep learning model operates in series, if the latter operators in the target operator pairs are the same operator, the accumulated values of the delay time variables of the target operator pairs are the accumulated values of the delay time variables of the target operator pairs.
In a possible implementation manner, obtaining a logical relationship between operators in the deep learning model includes:
Performing model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;
and acquiring the logical relation among operators in the deep learning model from the model data flow diagram.
According to a second aspect of the present application, there is provided an operating apparatus of a deep learning model, comprising:
the acquisition module is used for acquiring the logical relationship among operators in the deep learning model;
the adjusting module is used for adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;
and the operation module is used for operating the deep learning model according to the adjusted operation time of each operator in the deep learning model.
In one possible embodiment, the adjusting module comprises:
the setting unit is used for setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.
In a possible implementation, the adjusting module further includes:
and the first determining unit is used for determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair aiming at the target operator pair.
In a possible implementation, the adjusting module further includes:
the acquisition unit is used for acquiring a calculation parameter required by each operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;
and the second determination unit is used for determining the calculation amount of each operator based on the calculation parameters required by the operator in operation.
In one possible implementation, the delay times of both operators in any of the target operator pairs are the same.
In a possible implementation, the adjusting module further includes:
the solving unit is used for solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;
A third determining unit, configured to determine delay times of two operators in the target operator pair based on the value of the delay time variable of the target operator pair that is solved.
In a possible implementation, the adjusting module further includes:
and a fourth determining unit, configured to determine, for a case where the deep learning model operates in parallel, a maximum value among values of delay time variables of the plurality of target operator pairs as an accumulated value of the delay time variables of the plurality of target operator pairs if all latter operators of the plurality of target operator pairs are the same operator.
In a possible implementation, the adjusting module further includes:
and a fifth determining unit, configured to determine, for a case where the deep learning model operates in series, an accumulated value of values of delay time variables of the plurality of target operator pairs as an accumulated value of delay time variables of the plurality of target operator pairs if a latter operator of the plurality of target operator pairs is the same operator.
In one possible implementation, the obtaining module includes:
the model analysis unit is used for carrying out model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;
And the relation acquisition unit is used for acquiring the logical relation among operators in the deep learning model from the model data flow diagram.
According to a third aspect of the present application, there is provided an electronic device comprising: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate with each other through the bus, and the processor executes the machine-readable instructions to perform the steps of the method in any one of the possible implementation manners of the first aspect and the first aspect.
According to a fourth aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method of the first aspect described above, in any one of the possible implementations of the first aspect.
The embodiment of the application provides an operation method, an operation device, electronic equipment and a medium of a deep learning model, wherein the method comprises the following steps: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. The method comprises the steps of firstly obtaining logic relations among operators in a deep learning model, then adjusting the running time of each operator, so that the running total time of at least two operators with continuous logic relations is prolonged, thereby prolonging the total running time of each operator in the deep learning model, and finally running the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flow chart illustrating a method for operating a deep learning model according to an embodiment of the present disclosure;
FIG. 2 illustrates a specific example of a model data flow graph of a deep learning model (VGG-19 model);
FIG. 3 shows a schematic diagram of a target operator pair;
FIG. 4 shows a schematic diagram of the allocation of delay times for two operators in a target operator pair;
FIG. 5 is a diagram illustrating the optimal allocation of delay times for two operators in a target operator pair in the case where deep learning models are run in parallel;
FIG. 6 is a diagram illustrating the optimal allocation of delay times for two operators in a target operator pair in the case of serial operation of a deep learning model;
FIG. 7 is a schematic structural diagram illustrating an operating apparatus of a deep learning model according to an embodiment of the present disclosure;
Fig. 8 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
In the first conventional scheme, the CPU load when the deep learning model is run is reduced by reducing the complexity of the model, and common means include model compression and model clipping. For models with lower complexity, less CPU resources are often occupied in running, thereby reducing the CPU load, but the low complexity of deep learning models often comes at the expense of the accuracy of the models. The accuracy of a deep learning model in a real AI application scene has higher requirements, and the application requirements are often difficult to meet by adopting the scheme.
In the second conventional scheme, the CPU load during the operation is reduced by optimizing the implementation of the bottom layer code of the deep learning model operator. The implementation of the scheme has high technical requirements on developers, and needs to spend more time for optimizing the bottom layer code, but the effect after optimization is difficult to ensure. Meanwhile, intelligent devices are various, and related CPU architectures are different. The optimization scheme on a certain CPU has no universality for other CPUs. In conclusion, the scheme is very complex and low in usability in practical application.
Based on the above technical problem, an embodiment of the present application provides an operation method of a deep learning model, which is described in detail below.
Referring to fig. 1, fig. 1 is a flowchart illustrating an operating method of a deep learning model according to an embodiment of the present disclosure. As shown in fig. 1, the operation method mainly includes the following steps:
s101, acquiring a logical relation between operators in a deep learning model;
step S102, adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;
and step S103, operating the deep learning model according to the adjusted running time of each operator in the deep learning model.
In step S101, the deep learning model refers to a neural network having "multiple hidden layers", where "multiple hidden layers" represents more than three hidden layers. Deep learning models typically have eight nine or more hidden layers. The number of hidden layers is increased, and parameters such as weight, threshold value and the like of corresponding neuron connection are increased. This means that the deep learning model can automatically extract many complex features. With the advent of cloud computing and big data era, massive training data are matched with a layer-by-layer pre-training and error inverse propagation fine-tuning method, so that the model training efficiency is greatly improved, and meanwhile, the risk of overfitting is reduced. Common deep learning models include: AlexNet model, VGG Net model, GoogleNet model, ResNet model, ResNeXt model, R-CNN model, SqueezeNet model, GAN model, and the like. The deep learning model in the embodiment is a delay insensitive deep learning model, that is, a deep learning model in which adverse effects caused by delay are not concerned in performance indexes.
The operator refers to a mapping O of the function space to the function space: x → X. The operator in the broad sense can be generalized to any space. In the present embodiment, the operator of the deep learning model refers to the operation of each layer in the deep learning model. Namely, the operation of each layer in the deep learning model is packaged into an operator.
A logical relationship, or "dependency," refers to a relationship in which a change in one of two operators will affect the other in a deep learning model. The logical relations are relative, and the operator a can be a pre-operator of the operator B, a post-operator of the operator C, or a parallel operator of the operator D. Therefore, the operator A and the operator B are in a front-end relation, the operator A and the operator C are in a rear-end relation, and the operator A and the operator D are in a parallel relation.
In a possible implementation manner, a model analysis tool is used to perform model analysis on the deep learning model, so as to obtain a model data flow graph. And then acquiring the logical relationship among operators in the deep learning model from the model data flow graph.
Regarding the model analysis tool, a general deep learning inference engine or framework (such as pitorch, tensoflow, product self-study by manufacturer, etc.) is provided with the model analysis tool, and some special model analysis tools are also available in the market for performing model analysis on the deep learning model.
The model data flow graph is obtained from the analyzed deep learning model file, and the logic relation is obtained from the model data flow graph. The model dataflow graph can be used to clearly show the operators of each layer of the deep learning model and the logical relationship among the operators. Referring to fig. 2, fig. 2 shows a specific example of a model data flow graph of a deep learning model (VGG-19 model). As shown in FIG. 2, the model data flow diagram of the VGG-19 model includes a two-dimensional convolution operator (conv), a pooling operator (pool), and a full join operator (fc). It should be noted that the model data flow diagram in fig. 2 is only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and may be other types of model data flow diagrams, such as a model data flow diagram with branches.
In step S102, the logical relationship succession refers to: the output result of one operator is the input condition of the other operator, and the logical relations of the two operators are continuous. As shown in fig. 2, the output result of the first two-dimensional convolution operator in the model flowchart is the input condition of the second two-dimensional convolution operator, and then the logical relations of the first two-dimensional convolution operators in the model flowchart are continuous.
The runtime of each operator is the length of time from the beginning of the operation of that operator to the beginning of the operation of the next operator. The total operation time length of the two operators with continuous logical relations is the time length from the operation beginning of the former operator to the operation ending of the latter operator in the two operators with continuous logical relations.
In the traditional scheme, when the deep learning model runs, each operator is sequentially executed without delay according to the logical relation of a model data flow diagram. In this embodiment, the total operating time of at least two operators with continuous logical relationships is prolonged by adjusting the operating time of each operator in the model data flow diagram. For example: the total operation time of the two-dimensional convolution operator and the pooling operator is prolonged by adjusting the operation time of the second operator (the two-dimensional convolution operator) and the third operator (the pooling operator) in fig. 2.
In step S103, the operation refers to deep reasoning for the operators in the deep learning model, and the deep learning reasoning engine or framework (such as pitorch, tenserflow, manufacturer self-study product, etc.) is generally used for deep reasoning.
For the case that the load of the mobile terminal is reduced, step S103 is implemented in the mobile terminal, and the above-mentioned step S101 and step S102 are implemented in the cloud, so as to reduce the calculation amount of the mobile terminal. Step S103 improves the function of the inference framework of the deep learning model of the mobile terminal, namely, improves the operation sequence of the deep learning model, and improves the operation of each operator from non-interval operation to the operation of each operator according to the adjusted operation time, thereby reducing the CPU load of the mobile terminal. The mobile terminal (mobile internet terminal) refers to a terminal device accessing internet through a wireless network technology. The mobile terminal can be a mobile internet terminal such as a mobile phone, a tablet personal computer, a vehicle event data recorder and an intelligent navigator.
For the case that the load of the cloud is reduced, the steps S101 to S103 are all implemented in the cloud, so that the load of the cloud CPU is reduced when the cloud CPU resources are abnormally tense.
In conclusion, the deep learning model is operated according to the operation time of each operator in the adjusted deep learning model, so that the utilization of CPU resources can be reduced, and the CPU load is reduced. By adopting the scheme provided by the embodiment of the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator. On the one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, and the application requirement of a real AI application scene can not be met. On the other hand, the optimization of the model operator bottom layer code to reduce the CPU load can lead to very complicated and low usability in practical application, the scheme provided by the application reduces the complexity of practical application, can be applied to various types of CPUs, and has universality.
The above step S102 will be described in detail.
The two operators in the target operator pair are two operators whose logical relationship is continuous. Each operator in the deep learning model may form a plurality of target operator pairs according to a logical relationship, please refer to fig. 3, and fig. 3 is a schematic diagram of the target operator pairs. As shown in fig. 3, the logical relationship of each operator in the deep learning model is operator a-operator B-operator C-operator D, so that operator a and operator B form a target operator pair, operator B and operator C form a target operator pair, operator C and operator D form a target operator pair, and operator D and the next operator to be operated form a target operator pair. The combination of the target operator pairs in fig. 3 is only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and one operator may point to multiple operators, and one operator and multiple operators respectively form multiple target operator pairs.
The delay time is used to characterize the execution interval time of the two operators in the target operator pair. For example: the operator a and the operator B in the existing target operator pair are not delayed in execution, and after the operator a in the target operator pair is executed, the operator a in the target operator pair is delayed for a certain time, and then the operator B in the target operator pair is continuously executed.
The total operation time length of the target operator pair is the time length from the beginning of the operation of the former operator to the end of the operation of the latter operator in the target operator pair. Assuming that the target operator pair comprises an operator A and an operator B, and a time length table from the beginning to the end of the operation of the operator AShown as TAThe time length from the beginning to the end of the operation of the operator B is represented as TB. The execution interval time of the operator A and the operator B, namely the time length from the end of the operation of the operator A to the start of the operation of the operator B, is represented as TC. Then the total duration of operation of the target operator pair TGeneral assembly=TA+TB+TC
In one possible embodiment, for at least one target operator pair, a predetermined delay time is set between two operators in the target operator pair, so that the total operating time of the target operator pair is extended. In this embodiment, two cases are included, the first case: setting a preset delay time between two operators in a part of target operator pairs; in the second case, a predetermined delay time is set between two operators in all the pairs of target operators.
How to determine the delay time of the two operators in the target operator pair is described in detail below.
(1) And aiming at the target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.
Regarding the calculation amount of the operator, it should be noted that, for each operator, the calculation parameters required by the operator during operation are obtained; the calculation parameters include any one or more of: the number of the called calculation units and the calling times of the calculation units. Then, the calculation amount of each operator is determined based on the calculation parameters required by the operator at the running time. For example, for a two-dimensional convolution operator, which includes a plurality of simple operations (e.g., multiplication, summation, etc.), each simple operation requires a corresponding computing unit to be called once at runtime, and thus the amount of computation of the two-dimensional convolution can be determined by the number of computing units called and/or the number of times the computing units are called.
It should be noted that the total delay time of all target operator pairs in the deep learning model needs to satisfy a condition lower than the tolerable total delay time of the deep learning model. That is, the present embodiment is suitable for a deep learning model that is not sensitive to delay, and is not suitable for a deep learning model that has a high requirement on real-time performance. And subtracting the actual running total time of all the operators of the deep learning model from the tolerable running total time of all the operators of the deep learning model to obtain the tolerable total delay time of the deep learning model. For example, the tolerable total operating time of all operators of the face recognition model is about 10 seconds, the actual total operating time of all operators of the face recognition model is about 2 seconds, and then the tolerable total delay time of the face recognition model is about 8 seconds.
For the first case, a predetermined delay time is set between the two operators in a partial target operator pair.
And distributing the tolerable total delay time of the deep learning model to the partial target operator pairs according to the calculated amount of the previous operator in the partial target operator pairs. In specific implementation, each operator is divided into a plurality of levels according to the calculation amount of each operator, the calculation amount is small, and the calculation amount is large in the calculation amount. If the calculation amount of the operator is small, the weight of the operator can be distributed to be 0; if the calculated amount of the operator is less than the threshold, the weight of the operator can be distributed to be 2; if the computation of an operator is high, the weight of the operator can be assigned to 3. Calculating the total weight ∑ ω of each operatoriCalculating the weight ω of each operatoriAnd the total weight ∑ ωiRatio of (p)iThe ratio ρiMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongsi=ρiT, which is the previous operator of the target operator pair to which it belongs. For the operator with small calculation amount, the delay time T of the target operator pairi0 then corresponds to setting a non-zero delay time between only two operators in a partial target operator pair.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating allocation of delay times of two operators in a target operator pair. As shown in FIG. 4, assume that operator A is small in computation, ω10; in the calculation of operator B, ω 22; the operator C has small calculated amount, omega 32; operator D has large calculated amount, omega 43. And let T be 10 s.
Operator A corresponds toρ1=ω1/(ω1234)=0/(0+2+2+3)=0;
Rho corresponding to operator B2=ω2/(ω1234)=2/(0+2+2+3)=0.286;
Rho corresponding to operator C3=ω3/(ω1234)=2/(0+2+2+3)=0.286;
Rho corresponding to operator D4=ω4/(ω1234)=3/(0+2+2+3)=0.428。
Delay time T of target operator pair to which operator A belongs1=ρ1*T=0s;
Delay time T of target operator pair to which operator B belongs2=ρ2*T=0.286*10s=2.86s;
Delay time T of target operator pair to which operator C belongs3=ρ3*T=0.286*10s=2.86s;
Delay time T of target operator pair to which operator D belongs4=ρ4*T=0.428*10s=4.28s。
The above-mentioned assigned values of operator weights are only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other values may be assigned to the operator weights in specific implementations. And, the tolerable total delay time of 10s for the deep learning model is only used to teach one skilled in the art how to implement the present invention, and different tolerable total delay times are set according to different depth learning models in specific implementations.
For the second case, a predetermined delay time is set between the two operators in all pairs of target operators.
And distributing the tolerable total delay time of the deep learning model to all target operator pairs according to the calculated amount of the previous operator in all the target operator pairs. In specific implementation, the calculated quantity Q of each operator is determined respectively iAccording to the calculated quantity Q of each operatoriThe weight of the operator is determined. For example, in advanceThe minimum calculation amount to the maximum calculation amount of the operator are divided into a plurality of numerical value ranges. In the first numerical range, the weight of the operator is 1, in the second numerical range, the weight of the operator is 2, … and so on, thereby constructing a weight table of the operator. In the weight table, each weight corresponds to a numerical range of the calculation amount of the operator. Calculated quantity Q at known operatoriThen, the corresponding weight can be looked up through the weight table. Calculating the total weight ∑ ω of each operatoriCalculating the weight ω of each operatoriAnd the total weight ∑ ωiRatio of (p)iThe ratio ρiMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongsi=ρiT, which is the previous operator of the target operator pair to which it belongs.
As shown in fig. 4, it is assumed that the calculation amount of the operator a is in the first numerical range, the calculation amount of the operator B is in the second numerical range, the calculation amount of the operator C is in the fifth numerical range, and the calculation amount of the operator D is in the second numerical range. Then ω1=1,ω2=2,ω3=5,ω 42. And let T be 10 s.
Rho corresponding to operator A1=ω1/(ω1234)=1/(1+2+5+2)=0.1;
Rho corresponding to operator B 2=ω2/(ω1234)=2/(1+2+5+2)=0.2;
Rho corresponding to operator C3=ω3/(ω1234)=5/(1+2+5+2)=0.5;
Rho corresponding to operator D4=ω4/(ω1234)=2/(1+2+5+2)=0.2。
Delay time T of target operator pair to which operator A belongs1=ρ1*T=0.1*10s=1s;
Delay time T of target operator pair to which operator B belongs2=ρ2*T=0.2*10s=2s;
Delay time T of target operator pair to which operator C belongs3=ρ3*T=0.5*10s=5s;
Delay time T of target operator pair to which operator D belongs4=ρ4*T=0.2*10s=2s。
The above-mentioned assigned values of operator weights are only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other values may be assigned to the operator weights in specific implementations. And, the tolerable total delay time of 10s for the deep learning model is only used to teach one skilled in the art how to implement the present invention, and different tolerable total delay times are set according to different depth learning models in specific implementations.
In summary, for the target operator pair, the delay time of two operators in the target operator pair is determined according to the calculated amount of the previous operator in the target operator pair. In the scheme, the set weight is great for operators with large calculation amount, so that the delay time of two operators in the corresponding target operator pair is long; for the operator with small calculation amount, the set weight is small and even 0, so that the delay time of the two operators in the corresponding target operator pair is short and even 0. In this way, in the model operation process, enough delay time is given to the operator with large calculation amount to reduce the CPU load of the operator in unit time, and the CPU load of each operator in unit time can be reduced.
(2) The delay times of both operators in any of the target operator pairs are the same.
For the first case, a predetermined delay time is set between the two operators in a partial target operator pair.
And averagely distributing the tolerable total delay time of the deep learning model to partial target operator pairs. In specific implementation, a part of target operator pairs which need to set the delay time is determined first, and the target operator pairs generally occupy more CPU resources, such as various convolution operators. Next, the weight of the operator immediately preceding the selected partial target operator pair is set to 1, and the weights of the operators immediately preceding the other target operator pairs are set to 0. Calculate each calculationTotal weight of the sub-sumiCalculating the weight ω of each operatoriAnd the total weight ∑ ωiRatio of (p)iThe ratio ρiMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongsi=ρiT, which is the previous operator of the target operator pair to which it belongs. In this way, the delay time of two operators in the selected partial target operator pairs is the same, and the delay time of two operators in other target operator pairs is 0.
As shown in FIG. 4, assume operators B and C as convolution operators, ω 1=0,ω2=1,ω3=1,ω40. And let T be 10 s.
Rho corresponding to operator A1=ω1/(ω1234)=0/(0+1+1+0)=0;
Rho corresponding to operator B2=ω2/(ω1234)=1/(0+1+1+0)=0.5;
Rho corresponding to operator C3=ω3/(ω1234)=1/(0+1+1+0)=0.5;
Rho corresponding to operator D4=ω4/(ω1234)=0/(0+1+1+0)=0。
Delay time T of target operator pair to which operator A belongs1=ρ1*T=0*10s=0s;
Delay time T of target operator pair to which operator B belongs2=ρ2*T=0.5*10s=5s;
Delay time T of target operator pair to which operator C belongs3=ρ3*T=0.5*10s=5s;
Delay time T of target operator pair to which operator D belongs4=ρ4*T=0*10s=0s。
The above-mentioned assigned values of operator weights are only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other values may be assigned to the operator weights in specific implementations. And, the tolerable total delay time of 10s for the deep learning model is only used to teach one skilled in the art how to implement the present invention, and different tolerable total delay times are set according to different depth learning models in specific implementations.
For the second case, a predetermined delay time is set between the two operators in all pairs of target operators.
And averagely distributing the tolerable total delay time of the deep learning model to all target operator pairs. In a specific implementation, the weight ω of each operatoriIdentical, e.g. weight ω of each operatoriAre all 1. Calculating the total weight ∑ ω of each operatoriCalculating the weight ω of each operator iAnd the total weight ∑ ωiRatio of (p)iThe ratio ρiMultiplying the delay time T by the tolerable total delay time T of the deep learning model to obtain the delay time T of the target operator pair to which each operator belongsi=ρiT, which is the previous operator of the target operator pair to which it belongs. In this way, the delay times of both operators in all pairs of target operators are the same.
As shown in FIG. 4, assume that the weights of operator A, operator B, operator C, and operator D are the same, i.e., ω1=1,ω2=1,ω3=1,ω41. And let T be 10 s.
Rho corresponding to operator A1=ω1/(ω1234)=1/(1+1+1+1)=0.25;
Rho corresponding to operator B2=ω2/(ω1234)=1/(1+1+1+1)=0.25;
Rho corresponding to operator C3=ω3/(ω1234)=1/(1+1+1+1)=0.25;
Rho corresponding to operator D4=ω4/(ω1234)=1/(1+1+1+1)=0.25。
Extension of target operator pair to which operator A belongsLate time T1=ρ1*T=0.25*10s=2.5s;
Delay time T of target operator pair to which operator B belongs2=ρ2*T=0.25*10s=2.5s;
Delay time T of target operator pair to which operator C belongs3=ρ3*T=0.25*10s=2.5s;
Delay time T of target operator pair to which operator D belongs4=ρ4*T=0.25*10s=2.5s。
The above-mentioned assigned values of operator weights are only used to teach one skilled in the art how to implement the present invention, and the present invention is not limited thereto, and other values may be assigned to the operator weights in specific implementations. And, the tolerable total delay time of 10s for the deep learning model is only used to teach one skilled in the art how to implement the present invention, and different tolerable total delay times are set according to different depth learning models in specific implementations.
In summary, the tolerable total delay time of the deep learning model is averagely allocated to part or all of the target operator pairs, and the same weight is allocated to the previous operator in the part or all of the target operator pairs, so that the delay time of two operators in any target operator pair is the same, and the implementation process is simple and convenient.
(3) And determining the delay time of two operators in the target operator pair by an optimal solution method.
The delay time variable is an unknown variable of the delay time of the target operator pair. The accumulated value of delay time variables of all target operator pairs in the deep learning model refers to: and (3) total delay time variable of all target operator pairs in the deep learning model. The preset total delay time is the tolerable total delay time of the deep learning model. The average of the delay time variable for each target operator pair refers to: an average of multiple solutions of the delay time variable for each target operator pair.
The deep learning model runs in parallel, namely two parallel operators run simultaneously, so that the running time can be saved. The serial operation of the deep learning model refers to executing each operator in sequence, and even if two operators are parallel, the next operator is executed after one operator is executed.
For the condition that an operator in the model data flow graph has branches, an optimal allocation method can be adopted to determine the delay time. Specifically, the value of the delay time variable of each target operator pair is solved by using the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition. Determining delay times of two operators in the target operator pair based on the solved values of the delay time variables of the target operator pair.
1) And aiming at the condition that the deep learning model runs in parallel, if the latter operators in the multiple target operator pairs are the same operator, the accumulated value of the delay time variables of the multiple target operator pairs is the maximum value in the values of the delay time variables of the multiple target operator pairs.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating an optimal allocation of delay times of two operators in a target operator pair when deep learning models are operated in parallel. As shown in fig. 5, operator a and operator C are a first target operator pair, operator B and operator C are a second target operator pair, and operator C and the next operator are a third target operator pair. The delay time variable of two operators in the first target operator pair is T 1The delay time variable of two operators in the second target operator pair is T2The delay time variable of two operators in the third target operator pair is T3. The intermediate variable of the accumulated delay time of the first target operator pair is N1The intermediate variable of the cumulative delay time of the second target operator pair is N2The intermediate variable of the accumulated delay time of the third target operator pair is N3. And, the preset total delay time is Tmax
First target operator pair, delay time T1,N1=T1
Second target operator pair, delay time T2,N2=T2
Third target operator pair, extensionLate time T3,N3=max(N1,N2)+T3
For the above variables, the following set of equations may be listed:
Figure BDA0002441089030000191
the above equation set is solved by a commonly used solver (e.g., MATLAB solver, parsio solver, hypermesh solver, cplex solver, etc.), so as to solve the value of the delay time variable of each target operator pair, i.e., the delay time of two operators in each target operator pair.
2) And aiming at the condition that the deep learning model operates in series, if the latter operators in the target operator pairs are the same operator, the accumulated values of the delay time variables of the target operator pairs are the accumulated values of the delay time variables of the target operator pairs.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating an optimal allocation of delay times of two operators in a target operator pair under the condition that a deep learning model runs in series. As shown in fig. 6, operator a and operator C are a first target operator pair, operator B and operator C are a second target operator pair, and operator C and the next operator are a third target operator pair. The delay time variable of two operators in the first target operator pair is T1The delay time variable of two operators in the second target operator pair is T2The delay time variable of two operators in the third target operator pair is T3. The intermediate variable of the accumulated delay time of the first target operator pair is N1The intermediate variable of the cumulative delay time of the second target operator pair is N2The intermediate variable of the accumulated delay time of the third target operator pair is N3. And, the preset total delay time is Tmax
First target operator pair, delay time T1,N1=T1
Second target operator pair, delay time T2,N2=T2
Third target operator pair, delay time T3,N3=N1+N2+T3
For the above variables, the following set of equations may be listed:
Figure BDA0002441089030000201
the above equation set is solved by a commonly used solver (e.g., MATLAB solver, parsio solver, hypermesh solver, cplex solver, etc.), so as to solve the value of the delay time variable of each target operator pair, i.e., the delay time of two operators in each target operator pair.
In summary, the delay time of two operators in the target operator pair is determined by the optimal solution method, although the implementation is complex, a solver is required to solve the equation set, the solved delay time is an optimal solution, and the deep learning model is operated by using the solved delay time, so that the effect of reducing the load is optimal.
In summary, the embodiment of the present application provides an operating method of a deep learning model, where the method includes: acquiring a logical relation between operators in the deep learning model; adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged; and operating the deep learning model according to the adjusted running time of each operator in the deep learning model. The method comprises the steps of firstly obtaining logic relations among operators in a deep learning model, then adjusting the running time of each operator, so that the running total time of at least two operators with continuous logic relations is prolonged, thereby prolonging the total running time of each operator in the deep learning model, and finally running the deep learning model according to the adjusted running time of each operator in the deep learning model. By adopting the scheme provided by the application, the purpose of reducing the CPU load can be realized under the condition of not reducing the model complexity and optimizing the bottom code of the model operator. On the one hand, the model accuracy can be reduced by reducing the complexity of the model to reduce the CPU load, and the application requirement of a real AI application scene can not be met. On the other hand, the optimization of the model operator bottom layer code to reduce the CPU load can lead to very complicated and low usability in practical application, the scheme provided by the application reduces the complexity of practical application, can be applied to various types of CPUs, and has universality.
Based on the same technical concept, embodiments of the present application further provide an operating apparatus of a deep learning model, an electronic device, a computer storage medium, and the like, which can be specifically referred to in the following embodiments.
Fig. 7 is a schematic structural diagram of an operating apparatus of a deep learning model according to an embodiment of the present disclosure. As shown in fig. 7, may include:
an obtaining module 701, configured to obtain a logical relationship between operators in the deep learning model;
an adjusting module 702, configured to adjust an operation time of each operator in the deep learning model according to a logical relationship between the operators, so that a total operation time of at least two operators having consecutive logical relationships in the adjusted deep learning model is prolonged;
an operation module 703, configured to operate the deep learning model according to the adjusted operation time of each operator in the deep learning model.
In one possible implementation, the adjustment module 702 includes: the setting unit is used for setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.
In one possible implementation, the adjusting module 702 further includes: and the first determining unit is used for determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair aiming at the target operator pair.
In one possible implementation, the adjusting module 702 further includes:
the acquisition unit is used for acquiring a calculation parameter required by each operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;
and the second determination unit is used for determining the calculation amount of each operator based on the calculation parameters required by the operator in operation.
In one possible implementation, the delay times of both operators in any of the target operator pairs are the same.
In one possible implementation, the adjusting module 702 further includes:
the solving unit is used for solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;
A third determining unit, configured to determine delay times of two operators in the target operator pair based on the value of the delay time variable of the target operator pair that is solved.
In one possible implementation, the adjusting module 702 further includes:
and a fourth determining unit, configured to determine, for a case where the deep learning model operates in parallel, a maximum value among values of delay time variables of the plurality of target operator pairs as an accumulated value of the delay time variables of the plurality of target operator pairs if all latter operators of the plurality of target operator pairs are the same operator.
In one possible implementation, the adjusting module 702 further includes:
and a fifth determining unit, configured to determine, for a case where the deep learning model operates in series, an accumulated value of values of delay time variables of the plurality of target operator pairs as an accumulated value of delay time variables of the plurality of target operator pairs if a latter operator of the plurality of target operator pairs is the same operator.
In one possible implementation, the obtaining module 701 includes:
the model analysis unit is used for carrying out model analysis on the deep learning model by a model analysis tool to obtain a model data flow diagram;
And the relation acquisition unit is used for acquiring the logical relation among operators in the deep learning model from the model data flow diagram.
An embodiment of the present application discloses an electronic device, as shown in fig. 8, including: a processor 801, a memory 802, and a bus 803, the memory 802 storing machine readable instructions executable by the processor 801, the processor 801 communicating with the memory 802 via the bus 803 when the electronic device is in operation. The machine readable instructions are executed by the processor 801 to perform the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
The computer program product of the deep learning model operation method provided in this embodiment of the present application includes a computer-readable storage medium storing a nonvolatile program code executable by the processor 801, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

1. A method for operating a deep learning model, comprising:
acquiring a logical relation between operators in the deep learning model;
adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;
and operating the deep learning model according to the adjusted running time of each operator in the deep learning model.
2. The method of claim 1, wherein adjusting the operation time of each operator in the deep learning model according to the logical relationship between the operators so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged, comprises:
Setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.
3. The method of claim 2, wherein the delay time of both operators in the target operator pair is determined by:
and aiming at the target operator pair, determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair.
4. The method of claim 3, wherein determining the computational effort for each operator by:
for each operator, acquiring a calculation parameter required by the operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;
and determining the calculation amount of each operator based on the calculation parameters required by the operator at the running time.
5. The method of claim 2, wherein the delay time of both operators in any of the target operator pairs is the same.
6. The method of claim 2, wherein the delay time of both operators in the target operator pair is determined by:
solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;
determining delay times of two operators in the target operator pair based on the solved values of the delay time variables of the target operator pair.
7. The method of claim 6, wherein the accumulated value of delay time variables for all pairs of target operators in the deep learning model is determined by:
and aiming at the condition that the deep learning model runs in parallel, if the latter operators in the multiple target operator pairs are the same operator, the accumulated value of the delay time variables of the multiple target operator pairs is the maximum value in the values of the delay time variables of the multiple target operator pairs.
8. The method of claim 6, wherein the accumulated value of delay time variables for all pairs of target operators in the deep learning model is determined by:
And aiming at the condition that the deep learning model operates in series, if the latter operators in the target operator pairs are the same operator, the accumulated values of the delay time variables of the target operator pairs are the accumulated values of the delay time variables of the target operator pairs.
9. The method of claim 1, wherein obtaining the logical relationship between operators in the deep learning model comprises:
performing model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;
and acquiring the logical relation among operators in the deep learning model from the model data flow diagram.
10. An apparatus for operating a deep learning model, comprising:
the acquisition module is used for acquiring the logical relationship among operators in the deep learning model;
the adjusting module is used for adjusting the operation time of each operator in the deep learning model according to the logical relationship among the operators, so that the total operation time of at least two operators with continuous logical relationship in the adjusted deep learning model is prolonged;
and the operation module is used for operating the deep learning model according to the adjusted operation time of each operator in the deep learning model.
11. The apparatus of claim 10, wherein the adjustment module comprises:
the setting unit is used for setting a preset delay time between two operators in at least one target operator pair so as to prolong the total operation time of the target operator pair; the two operators in the target operator pair are two operators with continuous logic relations, and the delay time is used for representing the execution interval time of the two operators in the target operator pair.
12. The apparatus of claim 11, wherein the adjustment module further comprises:
and the first determining unit is used for determining the delay time of two operators in the target operator pair according to the calculated amount of the previous operator in the target operator pair aiming at the target operator pair.
13. The apparatus of claim 12, wherein the adjustment module further comprises:
the acquisition unit is used for acquiring a calculation parameter required by each operator in operation; the calculation parameters include any one or more of: the number of the called computing units and the calling times of the computing units;
and the second determination unit is used for determining the calculation amount of each operator based on the calculation parameters required by the operator in operation.
14. The apparatus of claim 11, wherein the delay times of both operators in any of the target operator pairs are the same.
15. The apparatus of claim 11, wherein the adjustment module further comprises:
the solving unit is used for solving the value of the delay time variable of each target operator pair according to the condition that the accumulated value of the delay time variables of all the target operator pairs in the deep learning model is not more than the preset total delay time and the average value of the delay time variables of each target operator pair is the maximum of the constraint condition;
a third determining unit, configured to determine delay times of two operators in the target operator pair based on the value of the delay time variable of the target operator pair that is solved.
16. The apparatus of claim 15, wherein the adjustment module further comprises:
and a fourth determining unit, configured to determine, for a case where the deep learning model operates in parallel, a maximum value among values of delay time variables of the plurality of target operator pairs as an accumulated value of the delay time variables of the plurality of target operator pairs if all latter operators of the plurality of target operator pairs are the same operator.
17. The apparatus of claim 15, wherein the adjustment module further comprises:
and a fifth determining unit, configured to determine, for a case where the deep learning model operates in series, an accumulated value of values of delay time variables of the plurality of target operator pairs as an accumulated value of delay time variables of the plurality of target operator pairs if a latter operator of the plurality of target operator pairs is the same operator.
18. The apparatus of claim 10, wherein the obtaining module comprises:
the model analysis unit is used for carrying out model analysis on the deep learning model by using a model analysis tool to obtain a model data flow graph;
and the relation acquisition unit is used for acquiring the logical relation among operators in the deep learning model from the model data flow diagram.
19. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 9.
20. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 9.
CN202010265726.3A 2020-04-07 2020-04-07 Deep learning model operation method and device, electronic equipment and medium Active CN111860758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010265726.3A CN111860758B (en) 2020-04-07 2020-04-07 Deep learning model operation method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010265726.3A CN111860758B (en) 2020-04-07 2020-04-07 Deep learning model operation method and device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111860758A true CN111860758A (en) 2020-10-30
CN111860758B CN111860758B (en) 2024-05-03

Family

ID=72986013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010265726.3A Active CN111860758B (en) 2020-04-07 2020-04-07 Deep learning model operation method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111860758B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862109A (en) * 2021-02-09 2021-05-28 上海商汤智能科技有限公司 Deep learning model execution method and device, electronic equipment and storage medium
CN112862109B (en) * 2021-02-09 2024-05-24 上海商汤智能科技有限公司 Deep learning model execution method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515739A (en) * 2019-10-23 2019-11-29 上海燧原智能科技有限公司 Deep learning neural network model load calculating method, device, equipment and medium
CN110750342A (en) * 2019-05-23 2020-02-04 北京嘀嘀无限科技发展有限公司 Scheduling method, scheduling device, electronic equipment and readable storage medium
KR20200023660A (en) * 2018-08-13 2020-03-06 인천대학교 산학협력단 Electronic device for controlling performance of at least one processor when providing inference service through deep learning model and operating method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200023660A (en) * 2018-08-13 2020-03-06 인천대학교 산학협력단 Electronic device for controlling performance of at least one processor when providing inference service through deep learning model and operating method thereof
CN110750342A (en) * 2019-05-23 2020-02-04 北京嘀嘀无限科技发展有限公司 Scheduling method, scheduling device, electronic equipment and readable storage medium
CN110515739A (en) * 2019-10-23 2019-11-29 上海燧原智能科技有限公司 Deep learning neural network model load calculating method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马艳军;于佃海;吴甜;王海峰;: "飞桨:源于产业实践的开源深度学习平台", 数据与计算发展前沿, no. 05, 15 October 2019 (2019-10-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862109A (en) * 2021-02-09 2021-05-28 上海商汤智能科技有限公司 Deep learning model execution method and device, electronic equipment and storage medium
CN112862109B (en) * 2021-02-09 2024-05-24 上海商汤智能科技有限公司 Deep learning model execution method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111860758B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
TWI825596B (en) Circuit, method and non-transitory machine-readable storage devices for performing neural network computations
US11216721B2 (en) Method for calculating a neuron layer of a multi-layer perceptron model with simplified activation function
CN107146015A (en) Multivariate Time Series Forecasting Methodology and system
US20150019464A1 (en) method and apparatus for supplying interpolation point data for a data-based function model calculation unit
US11580194B2 (en) Information processing apparatus, information processing method, and program
CN111459993B (en) Configuration updating method, device, equipment and storage medium based on behavior analysis
CN110491124B (en) Vehicle flow prediction method, device, equipment and storage medium
CN110347724A (en) Abnormal behaviour recognition methods, device, electronic equipment and medium
US20240005164A1 (en) Neural Network Training Method and Related Device
US11636712B2 (en) Dynamic gesture recognition method, device and computer-readable storage medium
US20230092453A1 (en) Parameter updating method and apparatus and storage medium
CN111340240A (en) Method and device for realizing automatic machine learning
CN114612791B (en) Target detection method and device based on improved attention mechanism
CN108667877B (en) Method and device for determining recommendation information, computer equipment and storage medium
CN116126346A (en) Code compiling method and device of AI model, computer equipment and storage medium
CN104749953B (en) Method and device for providing a sparse gaussian process model for calculation in a motor control system
CN111160049A (en) Text translation method, device, machine translation system and storage medium
CN114004335A (en) Data processing method and device, electronic equipment and storage medium
CN114492742A (en) Neural network structure searching method, model issuing method, electronic device, and storage medium
CN114154622A (en) Algorithm model for traffic operation system flow data acquisition missing completion
CN115699044A (en) Software project risk assessment method and device, computer equipment and storage medium
CN111860758A (en) Operation method and device of deep learning model, electronic equipment and medium
CN113554164A (en) Neural network model optimization method, neural network model data processing method, neural network model optimization device, neural network model data processing device and storage medium
CN111027670B (en) Feature map processing method and device, electronic equipment and storage medium
CN115687764A (en) Training method of vehicle track evaluation model, and vehicle track evaluation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant