CN116187404A - Residual network memory optimization method, device, equipment and medium - Google Patents

Residual network memory optimization method, device, equipment and medium Download PDF

Info

Publication number
CN116187404A
CN116187404A CN202310077606.4A CN202310077606A CN116187404A CN 116187404 A CN116187404 A CN 116187404A CN 202310077606 A CN202310077606 A CN 202310077606A CN 116187404 A CN116187404 A CN 116187404A
Authority
CN
China
Prior art keywords
deep learning
type
add
node
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310077606.4A
Other languages
Chinese (zh)
Inventor
陈其宾
段强
姜凯
李锐
胡雷钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Science Research Institute Co Ltd
Original Assignee
Shandong Inspur Science Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Science Research Institute Co Ltd filed Critical Shandong Inspur Science Research Institute Co Ltd
Priority to CN202310077606.4A priority Critical patent/CN116187404A/en
Publication of CN116187404A publication Critical patent/CN116187404A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses a residual error network memory optimization method, a device, equipment and a medium, which relate to the technical field of residual error networks and embedded equipment and comprise the following steps: obtaining a trained deep learning model, setting the input quantization type of each Add node in a residual error network as an INT32 type, and building a deep learning reasoning frame of the embedded equipment; reasoning the deep learning model by using a deep learning reasoning frame so as to determine a target Add node which causes memory bottleneck from all the Add nodes; setting the input quantization type of a target Add node as an INT7 type, updating corresponding quantization factors in a deep learning reasoning framework, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework. And the memory bottleneck problem of the residual error network on the embedded device is solved.

Description

Residual network memory optimization method, device, equipment and medium
Technical Field
The present invention relates to the field of residual error networks and embedded devices, and in particular, to a method, an apparatus, a device, and a medium for optimizing a residual error network memory.
Background
In recent years, deep learning neural network models are widely used in many fields, and have achieved very good effects. Model reasoning is used as a model, has close relation with hardware, environment and the like of running the model, and presents a model reasoning framework adapting to different hardware. In addition, the embedded equipment has low power consumption and limited computing capacity and memory resources, and high requirements are put on the deployment of the deep learning model. In the model reasoning stage, the most prominent problem is that the memory occupation is large, especially the memory occupation of the activation value is large, and the conflict is formed between the activation value and the limited memory resource of the embedded equipment. Residual networks are an important module in convolutional neural networks, and occur in many important convolutional neural networks, including convolutional neural networks commonly used by mobilenet v2, res net and the like. In embedded devices, memory resources are very limited, and Add operators of the residual network tend to be a bottleneck for memory usage, because two output tensors are needed to Add, and the output tensors are the highest part of memory usage in model pushing. In order to reduce the memory occupied by the output tensor, a quantization mode is generally used, namely, in each layer of convolution nodes, the quantized weight and the input are adopted to calculate convolution, and the convolution is inversely quantized to an INT8 type as output, so that compared with a 32-bit floating point number type, the 8-bit integer greatly reduces the memory occupation condition. However, for Add nodes in the residual network, if the two convolution node INT8 outputs are added, the write will result in a data overflow situation. Therefore, for the input of the Add node, only 32-bit data can be adopted, and the front convolution node can be inversely quantized to INT32 data type to be used as output, so that the overflow of the data is avoided, and the bottleneck problem of memory occupation is caused.
In summary, how to avoid the memory bottleneck of the residual network when reasoning on the embedded device is a problem to be solved at present.
Disclosure of Invention
Therefore, the present invention aims to provide a method, apparatus, device and medium for optimizing a residual error network memory, which can avoid memory bottleneck of the residual error network when reasoning on an embedded device. The specific scheme is as follows:
in a first aspect, the present application discloses a method for optimizing a residual network memory, which is applied to an embedded device, and includes:
obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and building a deep learning reasoning frame based on embedded equipment;
reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from all the Add nodes;
setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
Optionally, the updating the corresponding quantization factor in the deep learning reasoning framework based on the INT7 type includes:
and updating a quantization factor corresponding to the input of the target Add node in the deep learning reasoning framework based on the INT7 type, and updating an inverse quantization factor corresponding to the output of an upper node of the target Add node.
Optionally, after the optimized deep learning reasoning framework is obtained, the method further includes:
and reasoning the deep learning model by using the optimized deep learning reasoning framework.
Optionally, in the process of reasoning the deep learning model by using the optimized deep learning reasoning framework, the method further includes:
acquiring target data output by an upper node based on an input quantization type set by a current Add node in a residual error network, so as to take the target data as input data of the current Add node;
and determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator to obtain an output result.
Optionally, the determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator, so as to obtain an output result, includes:
if the input quantization type of the current Add node is an INT32 type, the corresponding target Add operator is a first Add operator with an INT32 type data adding function, and the first Add operator is utilized to process the input data so as to obtain an INT32 type output result;
if the input quantization type of the current Add node is an INT7 type, the corresponding target Add operator is a second Add operator with an INT8 type data adding function, and the second Add operator is utilized to process the input data so as to obtain an INT8 type output result.
Optionally, after the processing the input data by using the target Add operator to obtain an output result, the method further includes:
if the output result is INT32 type, carrying out quantization processing on the output result according to IN8 type, and inputting the quantized output result as a lower node;
and if the output result is INT8 type, directly inputting the output result as a lower node.
Optionally, the step of utilizing the deep learning inference framework to infer the deep learning model on the server to determine a target Add node that causes a memory bottleneck from the Add nodes includes:
the deep learning reasoning framework is utilized on a server to infer the deep learning model, so that Add nodes with higher occupied memory and corresponding memory occupation information are determined from all the Add nodes;
and comparing the memory occupation information with the memory information of the embedded equipment to determine a target Add node which causes a memory bottleneck from the Add nodes which occupy higher memory.
In a second aspect, the present application discloses a residual network memory optimization device, applied to an embedded device, including:
the framework building module is used for obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model to be an INT32 type, and building a deep learning reasoning framework based on embedded equipment;
the model reasoning module is used for reasoning the deep learning model by utilizing the deep learning reasoning framework on the server so as to determine a target Add node which causes memory bottleneck from all the Add nodes;
the framework optimizing module is used for setting the input quantization type of the target Add node to be an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the steps of the disclosed residual network memory optimization method.
In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; the method comprises the steps of implementing the disclosed residual network memory optimization method when the computer program is executed by a processor.
As can be seen, the application discloses a residual error network memory optimization method applied to embedded equipment, which is implemented by acquiring a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and constructing a deep learning reasoning framework based on the embedded equipment; reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from all the Add nodes; setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework. Therefore, the method and the device acquire the trained deep learning model, and set the input quantization type of each Add node of the residual error network in the deep learning model as an INT32 type so as to avoid the overflow problem of data; establishing a deep learning reasoning frame of the embedded equipment, and reasoning the deep learning model on a server by utilizing the deep learning reasoning frame so as to determine a target Add node which causes a memory bottleneck from all the Add nodes, and setting the input quantization type of the target Add node as an INT7 type, namely, the method adopts INT7 quantization not for all the Add nodes, but INT7 quantization for the target Add node which occupies higher memory and causes the memory bottleneck, thereby avoiding the memory bottleneck and avoiding the problem of precision reduction caused by INT7 quantization; in addition, the corresponding quantization factors are updated in the deep learning reasoning framework based on the INT7 type, and a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function are added in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework, so that the deep learning model is deduced by using the optimized deep learning reasoning framework. Through the scheme, the memory bottleneck problem of the residual error network when reasoning on the embedded equipment can be effectively solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a method for optimizing residual network memory disclosed in the present application;
FIG. 2 is a flowchart of a specific residual network memory optimization method disclosed in the present application;
fig. 3 is a schematic diagram of a specific residual network memory optimization flow disclosed in the present application;
fig. 4 is a schematic structural diagram of a residual network memory optimization device disclosed in the present application;
fig. 5 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Currently, in order to reduce the memory occupied by the output tensor of Add nodes in the residual error network, a quantization mode is generally used, that is, in each layer of convolution nodes, the convolution is calculated by adopting quantized weights and inputs, and the convolution is inversely quantized to an INT8 type as output, so that compared with a 32-bit floating point number type, the memory occupied condition is greatly reduced by 8-bit integer. However, for Add nodes in the residual network, if the two convolution node INT8 outputs are added, the write will result in a data overflow situation. Therefore, for the input of the Add node, only 32-bit data can be adopted, and the front convolution node can be inversely quantized to INT32 data type to be used as output, so that the overflow of the data is avoided, and the bottleneck problem of memory occupation is caused. Therefore, the embodiment of the application discloses a method, a device, equipment and a medium for optimizing the residual network memory, which can avoid memory bottleneck of the residual network when reasoning on embedded equipment.
Referring to fig. 1, an embodiment of the present application discloses a method for optimizing a residual network memory, which is applied to an embedded device, and the method includes:
step S11: obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and building a deep learning reasoning frame based on embedded equipment.
In this embodiment, a trained deep learning model is first obtained, and the input quantization type of each Add node of the residual error network in the deep learning model is set to be an INT32 type, so as to avoid overflow caused by adding data of two INT8 types, so that the Add node of the residual error network adopts inverse quantization to data calculation of the INT32 type, and then builds a deep learning reasoning framework based on embedded equipment. In addition, the obtained and trained deep learning model can also be obtained by training the deep learning model currently.
Step S12: and reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from the Add nodes.
In this embodiment, the memory occupation condition of Add nodes is further evaluated, specifically, a deep learning reasoning framework is utilized on a server to infer a deep learning model, so as to determine a target Add node that causes a memory bottleneck from all Add nodes.
Step S13: setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
In this embodiment, for the target Add node of the memory occupation bottleneck, the input quantization type is set to be the INT7 type. That is, instead of using INT7 quantization for all Add nodes, INT7 quantization is used for the target Add node which occupies higher memory and causes memory bottleneck, so that not only memory bottleneck but also the problem of precision reduction caused by INT7 quantization can be avoided. It should be noted that INT7 here only represents quantization range [ -63, 63], but the data type still employs INT8 data type to support a generic embedded device. And then updating the corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, namely updating the corresponding quantization factors of nodes with the operator type of Add and the INT7 quantization type. In a specific embodiment, updating the corresponding quantization factor in the deep learning reasoning framework based on the INT7 type includes: and updating a quantization factor corresponding to the input of the target Add node in the deep learning reasoning framework based on the INT7 type, and updating an inverse quantization factor corresponding to the output of an upper node of the target Add node. It can be understood that when updating the corresponding quantization factor, the quantization factor corresponding to the input of the target Add node needs to be updated, and the inverse quantization factor corresponding to the output of the upper node of the target Add node needs to be updated. Specifically, the quantization factor input by the Add node is divided by 2, quantized and converted from INT8 to INT7, and the inverse quantization factor of the node having the Add node input as output, that is, the inverse quantization factor of the upper node is updated, and multiplied by 2.
Further, a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function are added to the deep learning reasoning framework to obtain an optimized deep learning reasoning framework. That is, in addition to the first Add operator supporting the addition of the INT32 type data, a second Add operator supporting the addition of the INT8 type data needs to be added in the deep learning reasoning framework to obtain the optimized deep learning reasoning framework. It should be noted that care should be taken to check whether the input data is within INT7 data range to avoid data overflow.
As can be seen, the application discloses a residual error network memory optimization method applied to embedded equipment, which is implemented by acquiring a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and constructing a deep learning reasoning framework based on the embedded equipment; reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from all the Add nodes; setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework. Therefore, the method and the device acquire the trained deep learning model, and set the input quantization type of each Add node of the residual error network in the deep learning model as an INT32 type so as to avoid the overflow problem of data; establishing a deep learning reasoning frame of the embedded equipment, and reasoning the deep learning model on a server by utilizing the deep learning reasoning frame so as to determine a target Add node which causes a memory bottleneck from all the Add nodes, and setting the input quantization type of the target Add node as an INT7 type, namely, the method adopts INT7 quantization not for all the Add nodes, but INT7 quantization for the target Add node which occupies higher memory and causes the memory bottleneck, thereby avoiding the memory bottleneck and avoiding the problem of precision reduction caused by INT7 quantization; in addition, the corresponding quantization factors are updated in the deep learning reasoning framework based on the INT7 type, and a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function are added in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework, so that the deep learning model is deduced by using the optimized deep learning reasoning framework. Through the scheme, the memory bottleneck problem of the residual error network when reasoning on the embedded equipment can be effectively solved.
Referring to fig. 2 and fig. 3, a specific method for optimizing a residual network memory is disclosed in the embodiment of the present application, and compared with the previous embodiment, the technical solution is further described and optimized in the present embodiment. The method specifically comprises the following steps:
step S21: obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and building a deep learning reasoning frame based on embedded equipment.
Step S22: and reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine the Add node with higher occupied memory and corresponding memory occupation information from all the Add nodes.
In this embodiment, the memory occupation bottleneck is evaluated, specifically, the deep learning model is inferred by using the deep learning inference framework on the server, so as to output Add nodes with higher occupied memory and corresponding memory occupation information, that is, the Add nodes with higher occupied memory and the corresponding memory occupation information are determined from all Add nodes.
Step S23: and comparing the memory occupation information with the memory information of the embedded equipment to determine a target Add node which causes a memory bottleneck from the Add nodes which occupy higher memory.
In this embodiment, memory information of the currently selected embedded device is obtained and compared with the memory occupation information, so as to determine a target Add node that causes a memory bottleneck. It can be understood that in the embodiment of the present application, an Add node occupying a higher memory is determined first, and then, a target Add node causing a memory bottleneck therein is determined according to the memory information of the embedded device. That is, the present application only adopts INT7 quantization for a small amount of target Add nodes with high memory occupation and memory bottleneck, thereby avoiding the problem of precision degradation caused by INT7 quantization and avoiding memory bottleneck. It will be appreciated that in practical use, the Add nodes that become bottlenecks tend to be few, such as in MobileNetV2, where only the first Add node occupies very high memory.
Step S24: setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
In this embodiment, the method is used for optimizing the quantization module and the Add operator implementation module in the deep learning reasoning framework, i.e. updating the quantization factor and adding the Add operator.
Step S25: and reasoning the deep learning model by using the optimized deep learning reasoning framework.
In this embodiment, after the optimized deep learning reasoning framework is obtained, the optimized deep learning reasoning framework is used to reason the deep learning model. In a specific embodiment, the above process of reasoning the deep learning model by using the optimized deep learning reasoning framework further includes: acquiring target data output by an upper node based on an input quantization type set by a current Add node in a residual error network, so as to take the target data as input data of the current Add node; and determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator to obtain an output result. It can be understood that in the convolutional neural network, the output of the upper layer convolutional node is the input of the current convolutional node, in this embodiment, the target data output by the upper layer node is obtained based on the input quantization type set by the current Add node in the residual error network, so that the target data is used as the input data of the current Add node, that is, the type of the input data of the current Add node can be determined through the input quantization type set by the current Add node in the residual error network, and meanwhile, the type of the output data of the upper layer node is also determined. And then determining a corresponding target Add operator based on the input quantization type of the current Add node, and processing input data by using the target Add operator to obtain an output result.
Further, determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator to obtain an output result, including: if the input quantization type of the current Add node is an INT32 type, the corresponding target Add operator is a first Add operator with an INT32 type data adding function, and the first Add operator is utilized to process the input data so as to obtain an INT32 type output result; if the input quantization type of the current Add node is an INT7 type, the corresponding target Add operator is a second Add operator with an INT8 type data adding function, and the second Add operator is utilized to process the input data so as to obtain an INT8 type output result. That is, if the input quantization type set by the current Add node is an INT32 type, the corresponding target Add operator is a first Add operator with an INT32 type data adding function, and the output result obtained after the first Add operator is used for processing the input data is the INT32 type data, so as to ensure the overall model accuracy; if the input quantization type of the current Add node is an INT7 type, the corresponding target Add operator is a second Add operator with an INT8 type data adding function, and the output result obtained after the second Add operator is used for processing the input data is the INT8 type data.
In addition, after the processing the input data by using the target Add operator to obtain an output result, the method further includes: if the output result is INT32 type, carrying out quantization processing on the output result according to IN8 type, and inputting the quantized output result as a lower node; and if the output result is INT8 type, directly inputting the output result as a lower node. That is, for the lower node of the target Add node, if the output data type of the target Add node is INT32, quantization to INT8 is required, and then calculation is performed; if the type of the data output by the Add node is INT8, the calculation is directly performed.
For more specific processing procedures in the steps S21 and S24, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no detailed description is given here.
As can be seen, when determining the target Add node that causes the memory bottleneck, the embodiment of the application specifically uses the deep learning inference framework to infer the deep learning model to output the Add node that occupies higher memory and the corresponding memory occupation information, and then compares the memory occupation information with the memory information of the currently selected embedded device to determine the target Add node that causes the memory bottleneck, so as to only adopt INT7 quantization for the target Add node, thereby avoiding the problem of precision degradation caused by INT7 quantization. In addition, after the optimized deep learning reasoning frame is obtained, the optimized deep learning reasoning frame is used for reasoning the deep learning model, firstly, target data output by an upper node is obtained based on an input quantization type set by a current Add node, the target data is used as input data of the current Add node, and then a target Add operator corresponding to the input quantization type is used for processing the input data, so that an output result is obtained. If the INT7 quantization is set, the second Add operator added by the INT8 type data is used, otherwise, the first Add operator added by the INT32 type data is continuously used, and the overall model precision is ensured. Through the scheme, the memory bottleneck problem of the residual error network when reasoning on the embedded equipment can be effectively solved.
Referring to fig. 4, an embodiment of the present application discloses a residual network memory optimization device, which is applied to an embedded device, and the device includes:
the framework building module 11 is used for obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model to be an INT32 type, and building a deep learning reasoning framework based on embedded equipment;
the model reasoning module 12 is configured to use the deep learning reasoning framework to reason the deep learning model on a server, so as to determine a target Add node that causes a memory bottleneck from the Add nodes;
a framework optimizing module 13, configured to set an input quantization type of the target Add node to an INT7 type, update a corresponding quantization factor in the deep learning inference framework based on the INT7 type, and Add a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning inference framework to obtain an optimized deep learning inference framework.
As can be seen, the application discloses a residual error network memory optimization method applied to embedded equipment, which is implemented by acquiring a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and constructing a deep learning reasoning framework based on the embedded equipment; reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from all the Add nodes; setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework. Therefore, the method and the device acquire the trained deep learning model, and set the input quantization type of each Add node of the residual error network in the deep learning model as an INT32 type so as to avoid the overflow problem of data; establishing a deep learning reasoning frame of the embedded equipment, and reasoning the deep learning model on a server by utilizing the deep learning reasoning frame so as to determine a target Add node which causes a memory bottleneck from all the Add nodes, and setting the input quantization type of the target Add node as an INT7 type, namely, the method adopts INT7 quantization not for all the Add nodes, but INT7 quantization for the target Add node which occupies higher memory and causes the memory bottleneck, thereby avoiding the memory bottleneck and avoiding the problem of precision reduction caused by INT7 quantization; in addition, the corresponding quantization factors are updated in the deep learning reasoning framework based on the INT7 type, and a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function are added in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework, so that the deep learning model is deduced by using the optimized deep learning reasoning framework. Through the scheme, the memory bottleneck problem of the residual error network when reasoning on the embedded equipment can be effectively solved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Specifically, the method comprises the following steps: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program, where the computer program is loaded and executed by the processor 21 to implement relevant steps in the residual network memory optimization method performed by the electronic device as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also comprise a main processor, which is a processor for processing data in an awake state, also called CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the mass data 223 in the memory 22, which may be Windows, unix, linux. The computer program 222 may further comprise a computer program capable of performing other specific tasks in addition to the computer program capable of performing the residual network memory optimization method performed by the electronic device 20 as disclosed in any of the previous embodiments. The data 223 may include, in addition to data received by the electronic device and transmitted by the external device, data collected by the input/output interface 25 itself, and so on.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the method steps executed in the residual network memory optimization process disclosed in any embodiment are realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing describes in detail a method, apparatus, device and storage medium for optimizing a residual network memory, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the examples is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. The residual network memory optimization method is characterized by being applied to embedded equipment and comprising the following steps of:
obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model as an INT32 type, and building a deep learning reasoning frame based on embedded equipment;
reasoning the deep learning model on a server by utilizing the deep learning reasoning framework so as to determine a target Add node which causes a memory bottleneck from all the Add nodes;
setting the input quantization type of the target Add node as an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
2. The method of claim 1, wherein updating the corresponding quantization factor in the deep learning inference framework based on the INT7 type comprises:
and updating a quantization factor corresponding to the input of the target Add node in the deep learning reasoning framework based on the INT7 type, and updating an inverse quantization factor corresponding to the output of an upper node of the target Add node.
3. The method for optimizing the residual network memory according to claim 1, further comprising, after the obtaining the optimized deep learning inference framework:
and reasoning the deep learning model by using the optimized deep learning reasoning framework.
4. The method for optimizing the residual network memory according to claim 3, wherein in the process of reasoning the deep learning model by using the optimized deep learning reasoning framework, the method further comprises:
acquiring target data output by an upper node based on an input quantization type set by a current Add node in a residual error network, so as to take the target data as input data of the current Add node;
and determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator to obtain an output result.
5. The method for optimizing the residual network memory according to claim 4, wherein determining a target Add operator based on the input quantization type of the current Add node, and processing the input data by using the target Add operator, to obtain an output result, includes:
if the input quantization type of the current Add node is an INT32 type, the corresponding target Add operator is a first Add operator with an INT32 type data adding function, and the first Add operator is utilized to process the input data so as to obtain an INT32 type output result;
if the input quantization type of the current Add node is an INT7 type, the corresponding target Add operator is a second Add operator with an INT8 type data adding function, and the second Add operator is utilized to process the input data so as to obtain an INT8 type output result.
6. The method for optimizing the residual network memory according to claim 5, wherein after the processing the input data by using the target Add operator to obtain an output result, further comprising:
if the output result is INT32 type, carrying out quantization processing on the output result according to IN8 type, and inputting the quantized output result as a lower node;
and if the output result is INT8 type, directly inputting the output result as a lower node.
7. The method for optimizing a residual network memory according to any one of claims 1 to 6, wherein the reasoning the deep learning model on the server by using the deep learning reasoning framework to determine a target Add node causing a memory bottleneck from the Add nodes includes:
the deep learning reasoning framework is utilized on a server to infer the deep learning model, so that Add nodes with higher occupied memory and corresponding memory occupation information are determined from all the Add nodes;
and comparing the memory occupation information with the memory information of the embedded equipment to determine a target Add node which causes a memory bottleneck from the Add nodes which occupy higher memory.
8. A residual network memory optimization device, applied to an embedded device, comprising:
the framework building module is used for obtaining a trained deep learning model, setting the input quantization type of each Add node of a residual error network in the deep learning model to be an INT32 type, and building a deep learning reasoning framework based on embedded equipment;
the model reasoning module is used for reasoning the deep learning model by utilizing the deep learning reasoning framework on the server so as to determine a target Add node which causes memory bottleneck from all the Add nodes;
the framework optimizing module is used for setting the input quantization type of the target Add node to be an INT7 type, updating corresponding quantization factors in the deep learning reasoning framework based on the INT7 type, and adding a first Add operator with an INT32 type data adding function and a second Add operator with an INT8 type data adding function in the deep learning reasoning framework to obtain an optimized deep learning reasoning framework.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the residual network memory optimization method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the residual network memory optimization method according to any one of claims 1 to 7.
CN202310077606.4A 2023-01-30 2023-01-30 Residual network memory optimization method, device, equipment and medium Pending CN116187404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310077606.4A CN116187404A (en) 2023-01-30 2023-01-30 Residual network memory optimization method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310077606.4A CN116187404A (en) 2023-01-30 2023-01-30 Residual network memory optimization method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116187404A true CN116187404A (en) 2023-05-30

Family

ID=86451793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310077606.4A Pending CN116187404A (en) 2023-01-30 2023-01-30 Residual network memory optimization method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116187404A (en)

Similar Documents

Publication Publication Date Title
US20200293838A1 (en) Scheduling computation graphs using neural networks
US20210397901A1 (en) Method and apparatus for optimizing recommendation system, device and computer storage medium
CN114330699A (en) Neural network structure searching method and device
CN112329948A (en) Multi-agent strategy prediction method and device
CN111143039B (en) Scheduling method and device of virtual machine and computer storage medium
CN111125519B (en) User behavior prediction method, device, electronic equipment and storage medium
US11373062B1 (en) Model training method, data processing method, electronic device, and program product
CN109871982A (en) A kind of reconstruction method of power distribution network and Cloud Server, electronic equipment
CN114090108B (en) Method and device for executing computing task, electronic equipment and storage medium
CN117236805B (en) Power equipment control method, device, electronic equipment and computer readable medium
WO2023089350A1 (en) An architecture for a self-adaptive computation management in edge cloud
CN114330689A (en) Data processing method and device, electronic equipment and storage medium
CN116388112B (en) Abnormal supply end power-off method, device, electronic equipment and computer readable medium
CN116187404A (en) Residual network memory optimization method, device, equipment and medium
CN117473616A (en) Railway BIM data edge caching method based on multi-agent reinforcement learning
CN115795114B (en) Calculation graph representation and storage method for deep learning reasoning and related components thereof
CN114880079A (en) Kubernetes cluster scale adjustment method, system and equipment based on reinforcement learning
CN116822259B (en) Evaluation information generation method and device based on scene simulation and electronic equipment
CN113162780B (en) Real-time network congestion analysis method, device, computer equipment and storage medium
CN116341633B (en) Model deployment method, device, equipment and storage medium
CN112766698B (en) Application service pressure determining method and device
CN115942498A (en) Route distribution method, device, equipment and storage medium for computing power awareness network
CN117033261A (en) Cache allocation method, device, equipment, medium and program product
CN118332932A (en) Digital twin implementation method and device, electronic equipment and storage medium
CN116150624A (en) Model training method, device, equipment and medium based on edge end

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination