CN117009093B - Recalculation method and system for reducing memory occupation amount required by neural network reasoning - Google Patents

Recalculation method and system for reducing memory occupation amount required by neural network reasoning Download PDF

Info

Publication number
CN117009093B
CN117009093B CN202311278523.8A CN202311278523A CN117009093B CN 117009093 B CN117009093 B CN 117009093B CN 202311278523 A CN202311278523 A CN 202311278523A CN 117009093 B CN117009093 B CN 117009093B
Authority
CN
China
Prior art keywords
recalculation
attribute
steps
tensor
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311278523.8A
Other languages
Chinese (zh)
Other versions
CN117009093A (en
Inventor
李超
孙啸峰
刁博宇
姜宏旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311278523.8A priority Critical patent/CN117009093B/en
Publication of CN117009093A publication Critical patent/CN117009093A/en
Application granted granted Critical
Publication of CN117009093B publication Critical patent/CN117009093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The recalculation method and system for reducing the memory occupation amount required by neural network reasoning comprises the following steps: building directed acyclic graphs from computational graphs of neural network modelsThe method comprises the steps of carrying out a first treatment on the surface of the Based on directed acyclic graphsObtaining a set of recalculation pairsThe method comprises the steps of carrying out a first treatment on the surface of the Computing a set of pairsAnd performing power set operation. According to each element in the power setTo the picturePerforming additional attribute to obtain corresponding graphThe method comprises the steps of carrying out a first treatment on the surface of the Based on the graphPerforming a scheduling algorithm involving a recalculation policy; at allAnd selecting the scheduling with the lowest memory occupation amount from the scheduling results. Under the background that the intelligent demand of the edge equipment is gradually increased, the method and the device acquire the data in a mode of adding extra calculation cost under the scene that the running memory of the edge equipment is preciousThe smaller memory occupation amount in the neural network reasoning process is obtained, and the method contributes to the universality of the edge intelligence.

Description

Recalculation method and system for reducing memory occupation amount required by neural network reasoning
Technical Field
The invention belongs to the technical field of neural network application, and particularly relates to a recalculation method and a recalculation system for reducing the memory occupation amount required by neural network reasoning.
Background
In recent years, with the rapid development of the neural network field, the application range of the related art is expanding, including but not limited to the fields of face recognition, image recognition, voice recognition, natural language processing, machine translation, automatic driving, and the like. The widespread use of these technologies has profoundly changed lifestyle while also providing more convenience for life.
Among these, edge computing technology plays an increasingly important role in neural network applications. Edge computing is a distributed computing model that integrates computing and storage functions into the vicinity of physical locations that act as data sources to improve data analysis performance. In the field of neural networks, edge intelligence mainly refers to that a neural network model is deployed on edge equipment to perform reasoning so as to realize the advantages of locally processing data, reducing data transmission, improving instantaneity, reducing cost and the like.
However, edge intelligence still presents some challenges, most of which is the memory footprint problem. Edge devices typically have a small memory capacity, e.g., the microcontroller STM32F7, which is widely used in the industry, has a maximum on-chip memory of only 512KB. The amount of memory required for a common neural network model typically far exceeds this limit, such as 1536KB of memory required for the MobileNetv3 reasoning used by mobile devices. Therefore, how to reduce the memory occupation amount in the neural network model reasoning becomes one of the key technical challenges for realizing the edge intelligence.
Document 1 (Han, song, huizi Mao, and William j. Dally. "Deep expression: compressing Deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015)) implements the size reduction of the model according to the following three steps to solve the problem of memory occupation in reasoning:
(1) Pruning the model by removing unimportant connections in the model: firstly, normal network training is carried out, after the training, connection with all weights smaller than a certain threshold value is deleted, and the process is repeated.
(2) The weights of the pruned models are quantized so that the weights share the same value and only the meta information, the effective weights and the indexes, need to be stored.
(3) Huffman coding of the biased distribution of weights and their indices takes full advantage of the distribution characteristics to reduce storage. While this approach can reduce the storage and computational consumption of the model, it can compromise the performance of the model, i.e., reduce the recognition accuracy of the model.
Document 2 (Hinton, geoffrey, oriol vitamins, and Jeff dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015)) proposes a method of knowledge distillation. It trains a network with smaller data volume by training a complex network model and then using the output of the complex network model and the real labels of the data together. The method thus comprises a complex model (teacher network) and a small model (student network), which can be understood as the result of the teacher network being compressed by the method. However, this approach not only requires long training while losing the performance of the teacher's network.
Document 3 (Paliwal, aditya, et al, "Reinforced genetic algorithm learning for optimizing computation graphs," arXiv preprint arXiv:1905.02494 (2019)) employs a reinforcement learning algorithm to optimize memory usage during model reasoning. The method comprises the steps of firstly transmitting a calculation graph of a model to the GNN, generating discrete embedments corresponding to the scheduling priority of each node in the calculation graph, then transmitting the embedments to a genetic algorithm BRKGA, and determining the layout and scheduling of each node by the algorithm. But this approach takes a significant amount of training time before use.
Document 4 (Ahn, byung Hoon, et al, "organic changes: memory-aware scheduling of irregularly wired neural networks for edge devices" Proceedings of Machine Learning and Systems (2020): 44-57.) adopts a method of searching for a schedule with the lowest run-time Memory occupation in a possible operator scheduling space. The scheduling algorithm based on dynamic programming is designed, and the result scheduling can be obtained by searching on exponential time complexity. Further presented herein are techniques for optimizing acceleration methods and equivalent model map transformation methods that can adjust soft thresholds. However, this approach limits each operator to only one time, and thus limits the possible operator scheduling space, resulting in a lower execution scheme not being available. The present invention removes this limitation and enables lower memory usage scheduling in an efficient manner without losing the performance of the model.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, reduce the memory occupation amount during neural network reasoning, further provide favorable technical support for deploying edge-end reasoning, and provide a recalculation method and a recalculation system for reducing the memory occupation amount required by the neural network reasoning.
The aim of the invention is realized by the following technical scheme: the recalculation method for reducing the memory occupation amount required by neural network reasoning comprises the following steps:
(1) Building directed acyclic graphs from computational graphs of neural network models
(2) Based on directed acyclic graphsObtaining a set of recalculation pairs>
(3) Computing a set of pairsAnd performing power set operation. According to each element in the power set +.>For the picture->Carrying out additional attributes to obtain a corresponding diagram +.>
(4) Based on the graphA scheduling algorithm involving a recalculation policy is performed. In all->And selecting the scheduling with the lowest memory occupation amount from the scheduling results.
Further, the directed acyclic graph in step (1)Comprises dot set->Sum of edges->I.e. +.>Wherein, the method comprises the steps of, wherein,
each element in the set is a node and represents an operator; edge set->Each element->Represents->One side of (C)>Representation->Node computing usage->Calculation result of node,/->For the source node of the edge, +.>For the target node of the edge, define the function +.>To obtain the memory footprint of an edge or an operator output tensor.
Still further, the operator comprises convolution operation and connection operation.
Further, the obtaining of the set of recalculation pairs in step (2)Comprises the following substeps:
(2.1) obtaining a mapA set of all branch operators in->When the output tensor of one operator is used by more than 1 operator, it is called a branch operator.
(2.2) definition of Single Strand as a mapThe input degree and the output degree of the rest nodes except the last node are 1.
(2.3) for the setSearching all single chains meeting that the memory occupation amount of the input tensor of the first node is smaller than the memory occupation amount of the output tensor of the branch operator s. Let s's output tensor be +.>The input tensor of the first node of single chain t is +.>Obtain the recalculation pair as +.>The recalculation path is t. The recalculation table T is formed by all the obtained recalculation pairs and the corresponding single chains and is used for the subsequent algorithm.
Further, the step (3) of performing the power set operation refers to taking out, for a given set, the set formed by all the subsets. For example, for the set {1,2,3}, its power set includes: {}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}. Wherein the empty set { } and the set itself {1,2,3} are also counted as a subset thereof.
Further, in the step (3), for each element in the power setFor the picture->Additional properties get graph->This is achieved by the following sub-steps:
(3.1) alignment chartAnd (3) setting the rc attribute as the number of operators taking the tensor as input.
(3.2) pairIs calculated +.>Setting->Is->Attribute is true, set +.>A kind of electronic deviceThe attribute is true. The rc value of the input tensor for all operators in t is increased to +.>The number of operators-1 for the input tensor.
(3.3) for t, the first operator is removed and the remaining operators are input tensorsThe attribute is set to true.
(3.4) obtaining an element based on the above stepsCorresponding diagram->
Further, the step (4) is implemented by the following substeps:
(4.1) for oneInitial, initial/>,/>Is [ is ]]Ts is->Input tensor, < >>Is->,For a preset maximum value, z is the graph +.>Coding of the current Properties, state Table-> . The mentioned coding operation is: the tensor number in ts at this time and the rc attribute of each tensor are recorded to form a tuple. The result z of the coding, called a state, is used as a state table +.>Is a key of (c).
(4.2) obtainingAll items of->Empty state table->
(4.3) for all itemsEach item of->For the->Decoding is performed on key z of (c). The decoding operation mentioned is: the tuples resulting from the encoding operation are restored to the ts and rc properties of each tensor in this state.
(4.4) obtaining an alternative set of next execution operators based on the restored rc value and ts. The method for obtaining the catalyst comprises the following steps: and removing tensors in the memory, and adding an operator with the degree of 0 in the rest operators. While the output tensor of the operator should not be the same as the tensor in ts, and the rc attribute of the output tensor is not 0.
(4.5) ifIf the set is empty, it indicates that the state has completed a reasoning, and then makes a judgment ifLess than->Update->
(4.6) ifIf the set is not empty, the intermediate process of the state output once reasoning is described. For->Each operator in the set +.>:/> ,(/>Output tensor of (c).
(4.7) pairTo determine whether or not release is possible: for->Is +.>Rc attribute-1 of (a). If->Is>None of the properties is true and the rc property is 0, or +.>Is->The attribute being true, orThe attribute is true, or +.>Is->The attribute is true and the rc attribute is 0. Then the +.>Is a memory footprint of the computer.And remove +.>. Encoding the new attribute to obtain +.>
(4.8) judgmentWhether or not it is->If yes, judge +.>And->Is->And updating the item if the value is smaller than the value. If it is not->In->The state is added. Restoring the decoded state in sub-step (4.3), repeating sub-steps (4.6) to (4.8) until each +.>And finishing execution.
(4.9) ifIs empty, get->Is the minimum memory occupation amount +.>For a corresponding optimal schedule involving recalculation. If->If not, repeating the processes from the sub-step (4.2) to the sub-step (4.9).
(4.10) at each ofScheduling results of->In (3) selecting the least +.>Corresponding->As scheduling at the time of reasoning of the model.
A second aspect of the present invention is directed to a recalculation system that reduces the memory footprint required for neural network reasoning, comprising:
directed acyclic graphThe construction module is used for constructing a directed acyclic graph through a calculation graph of the neural network model>
Recalculating a set of pairsAcquisition module based on directed acyclic graph +.>Obtaining a set of recalculation pairs>
Power set operation module, recomputing pair setPerforming a power set operation according to each element +.>For the picture->The additional attribute gets the corresponding map->
The recalculation module is based on a graphA scheduling algorithm involving a recalculation policy is performed. In all->And selecting the scheduling with the lowest memory occupation amount from the scheduling results.
A third aspect of the present invention relates to a recalculating apparatus for reducing the memory usage required for neural network reasoning, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors are configured to implement a recalculating method for reducing the memory usage required for neural network reasoning according to the present invention when executing the executable code.
A fourth aspect of the present invention is directed to a computer readable storage medium having a program stored thereon, which when executed by a processor, implements a recalculation method of the present invention that reduces the memory footprint required for neural network reasoning.
A fifth aspect of the present invention relates to a computing device, including a memory and a processor, wherein the memory has executable code stored therein, and the processor, when executing the executable code, implements the recalculation method for reducing memory occupation required for neural network reasoning of the present invention.
A sixth aspect of the invention relates to a computer program product comprising a computer program which, when executed by a processor, implements a recalculation method of the invention for reducing the memory footprint required for neural network reasoning.
The innovation point of the invention is to introduce a recalculation idea on the operator scheduling problem in the neural network reasoning process, and design an operator scheduling scheme with the lowest memory occupation amount by an algorithm. The recalculation idea is embodied in the intermediate results for temporary storage in memoryIntermediate result with smaller memory consumption +.>Performing alternative storage, and middle result->When needed, the intermediate result is stored +.>Calculating intermediate result->. The intermediate structure for storing the higher memory occupation amount is replaced by the intermediate result of the lower memory occupation amount and additional calculation cost, so that the memory occupation amount in reasoning is reduced.
The beneficial effects of the invention are as follows: the memory occupation amount in the neural network model reasoning process can be effectively reduced, and the method has important significance for intelligent development of edge equipment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is an image processing of the present inventionDirected acyclic graph composed of neural networks
FIG. 2 is an alternative of the present inventionDirected acyclic graph->Graph->
FIG. 3 is an alternative of the present inventionDirected acyclic graph->Graph->
FIG. 4 is a diagram of one embodiment of the present inventionAnd (2) a picture->Corresponding->The graphs are compared.
Fig. 5 is a schematic diagram of the system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The features of the following examples and embodiments may be combined with each other without any conflict.
Example 1:
taking into account the process of minimum memory footprint scheduling in the reasoning of an image processing neural network whose graph describes a directed acyclic graphComprises dot set->Sum of edges->I.e. +.>As described below with respect to fig. 1. The computational nodes of the graph are formed by convolution operators, and the edge weights of the graph are used to describe the tensor size. Because the weight of the computing node does not need to occupy memory space for the edge devices represented by microcontrollers, the memory problem of the weight of the operator is not considered.
As shown in fig. 1-4, the recalculation method for reducing the memory occupation amount required by neural network reasoning of the present invention comprises the following steps:
(1) Building directed acyclic graphs from computational graphs of neural network models
The constructed deep learning model is saved in tflite, pb, onnx format by a save function in the framework using a mainstream deep learning framework such as tensorflow, pytorch, MXNet. The file formats include a description of the neural networkAs shown in the sample network of fig. 1.
(2) Based on step (1)Directed acyclic graph of arrivalObtaining a set of recalculation pairs>. The method specifically comprises the following substeps:
(2.1) obtaining a mapA set of all branch operators in->It is obvious that +.>
(2.2) obtaining all pairs of recalculation. Because of the collectionThere is only one element L1, so for L1, find all single chains that satisfy the input tensor of the first node with a memory footprint smaller than the output tensor memory footprint of L1. It can be found that single-chain t is +.>The single-chain corresponding recalculation pair is +.>. Here +.>Representing the input tensor, i.e. the tensor with memory occupancy of 1 in the figure, +.>The output tensor of the L1 operator is the tensor with the memory occupation amount of 10 in the figure. Since there is only one element, the pair set is recalculated>
(3) Counterweight calculationPair aggregationPerforming power set operation to obtain a power set of +.>. According to each element in the power set +.>For the picture->The additional attribute gets the corresponding map->. The elements can be seen as { } and +.>Two.
(3.1) first of all forPerforming additional attribute to obtain->. For picture->And (3) setting the rc attribute as the number of operators taking the tensor as input.
(3.2) pairIs calculated for +.>Setting->Is->Attribute is true, set +.>Is->The attribute is true. The rc value of the input tensor for all operators in t is increased to +.>The number of operators-1 for the input tensor. Since no recalculation pairs are included in r, i.e. no modification is required.
(3.3) for t, the first operator is removed and the remaining operators are input tensorsThe attribute is set to true. Due to the->No recalculation pairs are included, i.e. no modification is required. Obtained map->As shown in fig. 2.
(4) Based on the graph obtained in the step (3)A scheduling algorithm involving a recalculation policy is performed. The method specifically comprises the following substeps:
(4.1) initialIs [ is ]], />Is->, />Is a maximum value that can be set according to a particular operating system. z is currently +.>Editing of current attributesCode, which can be of the following formThe last tuple in the tuple represents ts, the first tuple represents the tensor of rc=0, the second represents the tensor of rc=1, and the third represents the tensor of rc=2. Building a state table
(4.2) obtainingAll->I.e. +.>. Empty status table->
(4.3) forEach item of->Decoding, i.e. fromDecode->Is a property of (a). It can be seen that->In the ts, the term "about",rc attribute of 1, < >>Is 2.
(4.4) obtaining the next calculation to be performedAlternative sets of childrenAccording to the method, know->Set->. Judging and knowing->Not empty.
(4.5) forOperators in the set perform. Select->Calculate->Obtain->,/>,/>. ts is newly added with->Output tensor of (1), at this point->
(4.6) pairInput tensor +.>Judging whether release is possible: first of all to->Rc attribute-1 of (a). Because of->All of (3)None of the properties is true and the rc property is 0, so +.>. Calculate->. And remove +.>At this time->
Encoding the current attribute to obtain
(4.7) judgmentWhether or not it is->In (1) because it is not->In, so to->Adds the state to
(4.8) recovery (4.3)Cycling through substep (4.5) to substep (4.7)) Up to->Each operator in the set has been executed. Cycling sub-steps (4.3) to (4.7) until +.>Each item->Are decoded.
(4.9) ifAnd (3) not being empty, circulating the processes from the substep (4.2) to the substep (4.8).
(4.10) ifIs empty, get->Is->Minimum memory occupation of->And corresponding optimal schedule->. It can be seen that->30->Is->
(5) For the purpose ofPerforming additional attribute to obtain->: for picture->And (3) setting the rc attribute as the number of operators taking the tensor as input.
(5.1) pairIs calculated for +.>Setting->Is->Attribute is true, set upIs->The attribute is true. The rc value of the input tensor for all operators in t is increased to +.>The number of operators-1 for the input tensor.
(5.2) for t, the first operator is removed and the remaining operators' input tensors are removedThe attribute is set to true. Obtained map->As shown in fig. 3.
(6) Based on the graphCarrying out the same method as in the step (4) to obtain +.>Is->Minimum memory occupation of->And corresponding optimal schedule->. It can be seen that->23->Is->
(7) At allIn the scheduling result of (2), selecting the scheduling with the lowest memory occupation amount, namely +.>23->Is that. It can be seen that the optimally scheduled memory footprint that does not involve recalculation: figure->Is->Memory occupation amount greater than that related to recalculation scheduling>Is->As shown in fig. 4.
Example 2
This example relates to the experimental effect of the present invention. To better demonstrate the effectiveness of the present invention, 20 network architecture modules similar to the acceptance module are customized. Each module is named BxSy, where Bx (in this experiment, x=2, 3,4, 5) represents the branching number of the structure. Sy (in this experiment, y=3, 4,5, 6) represents the number of operators on each branch. For each structure there is a convolution operator at its entry, and finally a join operator. Therefore, the number of operators in each structure is xy+2. Because deep learning networks are typically built by stacking network modules in sequence, their peak memory footprint is the same as the peak memory footprint of the modules, and we focus on the peak memory footprints of these modules.
Details of these modules are listed in table 1. For example, for BxS3, the output dimensions of the entry convolution operator are 24 and the output dimensions of the 3 operators on the branches are 8, 16,4, respectively. Furthermore, the input dimension of the structure is 3. The arrangement of the structure and dimensions is common in practical application scenarios.
TABLE 1
Tensorflow Lite Micro (TFLM) and seretity were selected for comparison of effects. TFLM is a widely used edge device reasoning framework, and server is currently the most advanced task of reducing peak memory occupation by scheduling algorithms. In this example, the peak memory occupancy of the present invention was compared to the peak memory occupancy of the two methods, and the results are shown in Table 2.
TABLE 2
The effect of the invention can be more intuitively observed by comparing the peak memory reduction rate, wherein the formula of the memory reduction rate is as follows:
as can be seen from the data in table 2, the present invention can achieve a peak memory reduction of 1.17 to 1.43 times based on the peak memory occupancy of TFLM, and can further achieve a peak memory occupancy reduction increase of 0.09 to 0.28 times compared to the seretity, which demonstrates the effectiveness of the present invention.
Example 3
Referring to fig. 5, the present embodiment relates to a recalculation system for reducing the memory occupation amount required for neural network reasoning, so as to implement the method of embodiment 1, including:
directed acyclic graphThe construction module is used for constructing a directed acyclic graph through a calculation graph of the neural network model>
Recalculating a set of pairsAcquisition module based on directed acyclic graph +.>Obtaining a set of recalculation pairs>
Power set operation module, recomputing pair setPerforming a power set operation according to each element +.>To the pictureThe additional attribute gets the corresponding map->
The recalculation module is based on a graphA scheduling algorithm involving a recalculation policy is performed. In all->And selecting the scheduling with the lowest memory occupation amount from the scheduling results.
Example 4
The present invention relates to a recalculation apparatus for reducing the memory occupation amount required by neural network reasoning, which includes a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing a recalculation method for reducing the memory occupation amount required by neural network reasoning in embodiment 1 when executing the executable codes.
Example 5
The present embodiment relates to a computer-readable storage medium having a program stored thereon, which when executed by a processor, implements a recalculation method for reducing the memory footprint required for neural network reasoning as described in embodiment 1.
The present embodiments relate to a computer program product comprising a computer program which, when executed by a processor, implements a recalculation method for reducing the memory footprint required for neural network reasoning as described in embodiment 1.
Example 6
The embodiment relates to a computing device, which comprises a memory and a processor, wherein executable codes are stored in the memory, and when the processor executes the executable codes, the recalculation method for reducing the memory occupation amount required by neural network reasoning is realized.
At the hardware level, the computing device includes a processor, internal bus, network interface, memory, and non-volatile storage, although other services may be required. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs to implement the method of embodiment 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

Claims (8)

1. The recalculation method for reducing the memory occupation amount required by neural network reasoning comprises the following steps:
(1) Building directed acyclic graphs from computational graphs of neural network models
(2) Based on directed acyclic graphsObtaining a set of recalculation pairs>The method comprises the steps of carrying out a first treatment on the surface of the The obtained recalculation pair set +.>Comprises the following substeps:
(2.1) obtaining a mapA set of all branch operators in->When the output tensor of one operator is used by more than 1 operator, the operator is called a branch operator;
(2.2) definition of Single Strand as a mapThe input degree and the output degree of the rest nodes except the last node are 1;
(2.3) for the setEach branch operator of (a)>Searching that the memory occupation amount of all input tensors meeting the first node is smaller than the branching operator +.>Single chain of output tensor memory occupation amount; is provided with->The output tensor of (2) is +.>Single strand->The input tensor of the first node of (2) is +.>Obtain the recalculation pair as +.>The recalculation Path is +.>The method comprises the steps of carrying out a first treatment on the surface of the From all the obtained recalculation pairs and the corresponding single chains constitute a recalculation Table>
(3) Computing a set of pairsPerforming a power set operation according to each element +.>For the picture->The additional attribute gets the corresponding map->The method comprises the steps of carrying out a first treatment on the surface of the Each element in the set according to powers +.>For the picture->Additional properties get graph->This is achieved by the following sub-steps:
(3.1) alignment chartIs set to->The attribute is the number of operators taking the tensor as input;
(3.2) pairIs calculated for each pair of pairs (>) Setting->Is->Attribute is true, set +.>Is->The attribute is true; for a pair of/>Input tensor of all operators in (a)>Value is increased to +.>The operator number-1 for the input tensor;
(3.3) forRemoving the first operator and the input tensor of the remaining operators>The attribute is set to true;
(3.4) obtaining an element based on the above stepsCorresponding diagram->
(4) Based on the graphPerforming a scheduling algorithm involving a recalculation strategy, in total +.>Selecting the schedule with the lowest memory occupation amount from the schedule results, and specifically comprising the following substeps:
(4.1) for oneInitially-> Is [ is ]], />Is->Input tensor, < >>Is->,/>Is at a preset maximum value>Is->Coding of the current Properties, state Table->The method comprises the steps of carrying out a first treatment on the surface of the The mentioned coding operation is: this time is +.>Tensor number in (a), and +/for each tensor>The attribute is recorded to form a tuple; coding results->Called a state, as a state table +.>Is a bond to (a);
(4.2) obtainingAll items of->Empty state table->
(4.3) for all itemsEach item of->For the->Key of->Decoding is carried out; the decoding operation mentioned is: the tuple resulting from the encoding operation is restored to the state +.>And +/for each tensor>An attribute;
(4.4) based on the reducedValue sum->Obtaining an alternative set of next execution operators +.>The method comprises the steps of carrying out a first treatment on the surface of the The method for obtaining the catalyst comprises the following steps: removing tensors in memory, and calculating the degree of 0 in residual operatorsA seed; at the same time the output tensor of the operator should not be equal to +.>The tensors in (a) are the same, and +.>An attribute other than 0;
(4.5) ifIf the set is empty, it indicates that the state has completed a reasoning, and then a judgment is made that +.>Less thanUpdate->
(4.6) ifIf the set is not empty, the intermediate process of the state output once reasoning is described; for->Each operator in the set +.>:/>Output tensor +.>
(4.7) pairTo determine whether or not release is possible: for->Is +.>Is->Attribute-1, ifIs>None of the attributes are true and +.>Attribute is 0, or->Is->The attribute is true, or +.>The attribute is true, or +.>Is->Attribute true and +.>The attribute is 0, the +.>Is occupied by the memory; />And from->Remove->The method comprises the steps of carrying out a first treatment on the surface of the Encoding the new attribute to obtain +.>
(4.8) judgmentWhether or not it is->If yes, judge +.>And->Is->Updating the item if the value is less than the value; if it is not->In->Adding the state; restoring the decoded state in sub-step (4.3), repeating sub-steps (4.6) to (4.8) until each +.>All are performed completely;
(4.9) ifIs empty, get->Is the minimum memory occupation amount +.>For a corresponding optimal schedule involving recalculation; if->If not, repeating the processes from the sub-step (4.2) to the sub-step (4.9);
(4.10) at each ofScheduling results of->In (3) selecting the least +.>Corresponding->As scheduling at the time of reasoning of the model.
2. The recalculation method for reducing the memory footprint required for neural network reasoning as recited in claim 1, wherein: the directed acyclic graph in step (1)Comprises dot set->Sum of edges->I.e. +.>Wherein, the method comprises the steps of, wherein,,/>each element in the set is a node and represents an operator; edge set->Each element->Represents->One side of (C)>Representation->Node computing usage->Calculation result of node,/->For the source node of the edge, +.>For the target node of the edge, define the function +.>To obtain the memory footprint of an edge or an operator output tensor.
3. The recalculation method for reducing the memory footprint required for neural network reasoning as recited in claim 2, wherein: the operators comprise convolution operation and connection operation.
4. The recalculation method for reducing the memory footprint required for neural network reasoning as recited in claim 1, wherein: the step (3) of performing the power set operation means that, for a given set, the set formed by all the subsets is fetched.
5. The recalculation system for reducing the memory occupation amount required by neural network reasoning is characterized by comprising the following steps:
directed acyclic graphThe construction module is used for constructing a directed acyclic graph through a calculation graph of the neural network model>
Recalculating a set of pairsAcquisition module based on directed acyclic graph +.>Obtaining a set of recalculation pairs>The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:
(2.1) obtaining a mapA set of all branch operators in->When the output tensor of an operator is used by more than 1 operator, it is called branch calculationA seed;
(2.2) definition of Single Strand as a mapThe input degree and the output degree of the rest nodes except the last node are 1;
(2.3) for the setEach branch operator of (a)>Searching that the memory occupation amount of all input tensors meeting the first node is smaller than the branching operator +.>Single chain of output tensor memory occupation amount; is provided with->The output tensor of (2) is +.>Single strand->The input tensor of the first node of (2) is +.>Obtain the recalculation pair as +.>The recalculation Path is +.>The method comprises the steps of carrying out a first treatment on the surface of the From all the obtained recalculation pairs and the corresponding single chains constitute a recalculation Table>
Power set operation module, recomputing pair setPerforming a power set operation according to each element +.>For the picture->The additional attribute gets the corresponding map->The method comprises the steps of carrying out a first treatment on the surface of the The method specifically comprises the following steps:
(3.1) alignment chartIs set to->The attribute is the number of operators taking the tensor as input;
(3.2) pairIs calculated for each pair of pairs (>) Setting->Is->Attribute is true, set +.>Is->The attribute being trueThe method comprises the steps of carrying out a first treatment on the surface of the For->Input tensor of all operators in (a)>Value is increased to +.>The operator number-1 for the input tensor;
(3.3) forRemoving the first operator and the input tensor of the remaining operators>The attribute is set to true;
(3.4) obtaining an element based on the above stepsCorresponding diagram->
The recalculation module is based on a graphPerforming a scheduling algorithm involving a recalculation policy; in all->Selecting the scheduling with the lowest memory occupation amount from the scheduling results of the (a); the method specifically comprises the following steps:
(4.1) for oneInitially->Is [ is ]], />Is->Input tensor, < >>Is->, />Is at a preset maximum value>Is->Coding of the current Properties, state Table->The method comprises the steps of carrying out a first treatment on the surface of the The mentioned coding operation is: this time is +.>Tensor number in (a), and +/for each tensor>The attribute is recorded to form a tuple; coding results->Called a state, as a state table +.>Is a bond to (a);
(4.2) obtainingAll items of->Empty state table->
(4.3) for all itemsEach item of->For the->Key of->Decoding is carried out; the decoding operation mentioned is: the tuple resulting from the encoding operation is restored to the state +.>And +/for each tensor>An attribute;
(4.4) based on the reducedValue sum->Obtaining an alternative set of next execution operators +.>The method comprises the steps of carrying out a first treatment on the surface of the The method for obtaining the catalyst comprises the following steps: removing tensors in the memory, and adding operators with 0 degree in the remaining operators; at the same time the output tensor of the operator should not be equal to +.>The tensors in (a) are the same, and +.>An attribute other than 0;
(4.5) ifIf the set is empty, it indicates that the state has completed a reasoning, and then a judgment is made that +.>Less thanUpdate->
(4.6) ifIf the set is not empty, the intermediate process of the state output once reasoning is described; for->Each operator in the set +.>:/>Output tensor +.>
(4.7) pairTo determine whether or not release is possible: for->Is +.>Is->Attribute-1, ifIs>None of the attributes are true and +.>Attribute is 0, or->Is->The attribute is true, or +.>The attribute is true, or +.>Is->Attribute true and +.>The attribute is 0, the +.>Is occupied by the memory; />And from->Remove->The method comprises the steps of carrying out a first treatment on the surface of the Encoding the new attribute to obtain +.>
(4.8) judgmentWhether or not it is->If yes, judge +.>And->Is->Updating the item if the value is less than the value; if it is not->In->Adding the state; restoring the decoded state in sub-step (4.3), repeating sub-steps (4.6) to (4.8) until each +.>The execution is completed;
(4.9) asFruit setIs empty, get->Is the minimum memory occupation amount +.>For a corresponding optimal schedule involving recalculation; if->If not, repeating the processes from the sub-step (4.2) to the sub-step (4.9);
(4.10) at each ofScheduling results of->In (3) selecting the least +.>Corresponding->As scheduling at the time of reasoning of the model.
6. A recalculation device for reducing the memory footprint required for neural network reasoning, comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, being configured to implement the recalculation method for reducing the memory footprint required for neural network reasoning as claimed in any of claims 1 to 4.
7. A computer readable storage medium having stored thereon a program which, when executed by a processor, implements the recalculation method for reducing memory usage required for neural network reasoning as claimed in any of claims 1 to 4.
8. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements the recalculation method of any of claims 1-4 that reduces the memory footprint required for neural network reasoning.
CN202311278523.8A 2023-10-07 2023-10-07 Recalculation method and system for reducing memory occupation amount required by neural network reasoning Active CN117009093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311278523.8A CN117009093B (en) 2023-10-07 2023-10-07 Recalculation method and system for reducing memory occupation amount required by neural network reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311278523.8A CN117009093B (en) 2023-10-07 2023-10-07 Recalculation method and system for reducing memory occupation amount required by neural network reasoning

Publications (2)

Publication Number Publication Date
CN117009093A CN117009093A (en) 2023-11-07
CN117009093B true CN117009093B (en) 2024-03-12

Family

ID=88562178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311278523.8A Active CN117009093B (en) 2023-10-07 2023-10-07 Recalculation method and system for reducing memory occupation amount required by neural network reasoning

Country Status (1)

Country Link
CN (1) CN117009093B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117892769B (en) * 2024-03-15 2024-06-11 之江实验室 Neural network training method, video memory scheduling method, system, equipment and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128702A (en) * 2021-04-15 2021-07-16 杭州电子科技大学 Neural network self-adaptive distributed parallel training method based on reinforcement learning
CN114186687A (en) * 2022-02-17 2022-03-15 之江实验室 Intermediate representation method and device for neural network model calculation
CN114358267A (en) * 2022-01-05 2022-04-15 浙江大学 Method for reducing GPU memory occupation in deep neural network training process
CN114462591A (en) * 2021-12-23 2022-05-10 北京时代民芯科技有限公司 Inference method for dynamic quantitative neural network
CN115809699A (en) * 2023-02-03 2023-03-17 之江实验室 Method and device for estimating minimum memory occupation amount required by neural network model inference

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128702A (en) * 2021-04-15 2021-07-16 杭州电子科技大学 Neural network self-adaptive distributed parallel training method based on reinforcement learning
CN114462591A (en) * 2021-12-23 2022-05-10 北京时代民芯科技有限公司 Inference method for dynamic quantitative neural network
CN114358267A (en) * 2022-01-05 2022-04-15 浙江大学 Method for reducing GPU memory occupation in deep neural network training process
CN114186687A (en) * 2022-02-17 2022-03-15 之江实验室 Intermediate representation method and device for neural network model calculation
CN115809699A (en) * 2023-02-03 2023-03-17 之江实验室 Method and device for estimating minimum memory occupation amount required by neural network model inference

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Training Deep Nets with Sublinear Memory Cost;Tianqi Chen等;《arXiv》;全文 *
基于剪枝与量化的卷积神经网络压缩方法;孙彦丽等;计算机科学(第08期);全文 *
深度神经网络压缩与加速综述;纪荣嵘等;计算机研究与发展(第09期);全文 *

Also Published As

Publication number Publication date
CN117009093A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN117009093B (en) Recalculation method and system for reducing memory occupation amount required by neural network reasoning
US11928599B2 (en) Method and device for model compression of neural network
CN111079899A (en) Neural network model compression method, system, device and medium
CN116521380A (en) Resource self-adaptive collaborative model training acceleration method, device and equipment
CN110188877A (en) A kind of neural network compression method and device
Han et al. Design automation for efficient deep learning computing
WO2018228399A1 (en) Computing device and method
CN116167461B (en) Model training method and device, storage medium and electronic equipment
Lou et al. Autoqb: Automl for network quantization and binarization on mobile devices
CN113177632B (en) Model training method, device and equipment based on pipeline parallelism
US20230056315A1 (en) Computer-implemented methods and systems for compressing recurrent neural network (rnn) models and accelerating rnn execution in mobile devices to achieve real-time inference
CN115543945B (en) Model compression method and device, storage medium and electronic equipment
CN116402113A (en) Task execution method and device, storage medium and electronic equipment
CN113887719B (en) Model compression method and device
CN116225192A (en) Control method and device of heat dissipation system, storage medium and electronic equipment
US20180204115A1 (en) Neural network connection reduction
CN115294336A (en) Data labeling method, device and storage medium
TW202137067A (en) Method, electric device and computer program product for convolutional neural network
CN117058525B (en) Model training method and device, storage medium and electronic equipment
CN110751274A (en) Neural network compression method and system based on random projection hash
CN111143641A (en) Deep learning model training method and device and electronic equipment
US20240193409A1 (en) Techniques for accelerating machine learning models
CN109753351A (en) A kind of Time-critical tasks processing method, device, equipment and medium
CN117152752B (en) Visual depth feature reconstruction method and device with self-adaptive weight
CN117909371B (en) Model training method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant