CN112346877A - Memory allocation method and system for effectively accelerating deep learning calculation - Google Patents

Memory allocation method and system for effectively accelerating deep learning calculation Download PDF

Info

Publication number
CN112346877A
CN112346877A CN202110028503.XA CN202110028503A CN112346877A CN 112346877 A CN112346877 A CN 112346877A CN 202110028503 A CN202110028503 A CN 202110028503A CN 112346877 A CN112346877 A CN 112346877A
Authority
CN
China
Prior art keywords
branch
target
memory
operation layer
memory space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110028503.XA
Other languages
Chinese (zh)
Other versions
CN112346877B (en
Inventor
李国亮
张磊
杨勤富
钱军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanbo Semiconductor Shanghai Co ltd
Original Assignee
Hanbo Semiconductor Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanbo Semiconductor Shanghai Co ltd filed Critical Hanbo Semiconductor Shanghai Co ltd
Priority to CN202110028503.XA priority Critical patent/CN112346877B/en
Publication of CN112346877A publication Critical patent/CN112346877A/en
Application granted granted Critical
Publication of CN112346877B publication Critical patent/CN112346877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a memory allocation method and a system for effectively accelerating deep learning, wherein the method and the system can determine a target operation sequence of a multi-branch operation layer according to the size of a memory space required by the operation of the multi-branch operation layer; determining a target memory allocation scheme for executing continuous storage of branch operation results of the multi-branch operation layer according to the target operation sequence; and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence. Therefore, when multi-branch operation is executed, the operation sequence with the minimum memory space occupied in the operation process can be selected as a target operation sequence, and memory allocation setting is carried out on the multi-branch layer by layer according to a target memory scheme for continuously storing branch results of the multi-branch operation layer, so that the continuous storage of the branch operation results in the memory space is ensured, the size of the occupied memory space is reduced, and the operation efficiency of the whole neural network is improved.

Description

Memory allocation method and system for effectively accelerating deep learning calculation
Technical Field
The application relates to the field of deep learning system optimization, in particular to a memory allocation method and system for effectively accelerating deep learning calculation based on multi-branch scheduling and allocation.
Background
In recent years, artificial intelligence research is very popular, deep learning is one of the core technologies, and the basic model of the deep learning is a deep neural network. With the progressive deep learning research, the number of layers of artificial neural networks is increasing from 8 layers of AlexNet to 19 layers of VGG and 22 layers of google net, even ResNet is as deep as 152 layers, and meanwhile, more multi-branch networks are gradually appeared in the neural networks, and the networks are wider and more complicated. The deeper, wider networks mean that more memory is required to train the network model.
Since the overall deeper and wider networks have a significant effect on the accuracy improvement of deep learning, the deep neural networks are developed in the deeper and wider directions, and a problem of the deep neural networks is insufficient memory. Generally, the memory size of a computer is very limited, so how to build a deeper and wider network by using the limited memory becomes an urgent problem.
Disclosure of Invention
One objective of the present application is to provide a memory allocation method for effectively accelerating deep learning computation, where the deep learning computation network includes a multi-branch operation layer, and the method includes:
determining a target operation sequence of a multi-branch operation layer according to the size of a memory space required by the operation of the multi-branch operation layer;
determining a target memory allocation scheme for executing continuous storage of branch operation results of the multi-branch operation layer according to the target operation sequence;
and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence.
Compared with the prior art, the method and the system can determine the target operation sequence of the multi-branch operation layer according to the size of the memory space required by the operation of the multi-branch operation layer; determining a target memory allocation scheme for executing continuous storage of branch operation results of the multi-branch operation layer according to the target operation sequence; and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence. Therefore, when executing multi-branch operation, the operation sequence occupying the least memory space in the operation process can be selected as the target operation sequence, and the target memory allocation scheme in the multi-branch operation process is determined according to the target sequence so as to ensure that the least memory space is occupied in the multi-branch operation process, and meanwhile, the data storage mode and the position in the multi-branch operation layer operation process are set according to the target memory scheme in which the branch results of the multi-branch operation layer are continuously stored, so that the branch operation results are continuously stored in the memory space, the size of the occupied memory space is reduced, and the operation efficiency of the whole neural network is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a memory allocation method for efficiently accelerating deep learning computation according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an operational layer of a deep learning computing network, according to an embodiment of the present application;
FIG. 3 is a directed graph corresponding to the multi-branch deep learning computation network shown in FIG. 2;
FIG. 4 is a table of I/O memory size information for performing a target operation sequence for the multi-branch deep learning computation network of FIG. 2;
FIG. 5 is a table showing the memory size information of the branch operation results when the multi-branch deep learning computation network in FIG. 2 operates according to the target operation sequence;
FIG. 6 is a schematic diagram of I/O memory allocation when executing the target operation sequence of the multi-branch deep learning computation network of FIG. 2.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The memory allocation method and the memory allocation system for effectively accelerating deep learning are suitable for the condition that a deep learning neural network operation layer comprises a multi-branch operation layer, and realize effective acceleration of deep learning calculation efficiency based on multi-branch scheduling and memory allocation. The branch mode of the specific multi-branch operation layer is not limited, and the method is applicable to the technical scheme of the application as long as the method comprises multi-branch operation. The scheme of the application is suitable for a compiling or running stage set by a deep learning calculation network operation processor, and according to the attributes and characteristics of multi-branch operation layers included by the deep learning calculation network operation layers, the target operation sequence and the target memory allocation scheme are determined for the multi-branch operation layers, namely, the operation sequence occupying the smallest memory space in the process of executing the multi-branch operation layers is used as the target operation sequence, on the basis of the target operation sequence, the target memory scheme is set for the multi-branch operation process under the condition of ensuring that the memory space is continuously allocated to each branch operation result, so that each branch result is continuously stored in the memory space under the condition of ensuring that the operation sequence occupying the lowest memory space is executed by the multi-branch operation, the memory space occupied by the multi-branch operation is reduced, and the discontinuous storage of each branch output result in the memory space is avoided, effectively prevent to produce the memory piece, improve memory availability factor.
Referring to fig. 1, the present application provides a memory allocation method for effectively accelerating deep learning computation, where the deep learning computation network includes a multi-branch operation layer, and the method includes:
s1, determining the target operation sequence of the multi-branch operation layer according to the size of the memory space required by the multi-branch operation layer.
Specifically, in the process of executing operation, the whole multi-branch operation layer is generally treated as an independent operation layer of the neural network, and the input of the multi-branch operation layer is the input of the independent operation layer, for example, the operation result corresponding to pooling in fig. 2; the output of the multi-branch operation layer is taken as the output of the independent operation layer, such as the output result of concat in FIG. 2. Furthermore, the multi-branch operation layer comprises a plurality of operation branches, the operation sequence of each operation branch is different, and different operation sequences correspond to different operation input and output information and have different requirements on the size of the internal memory.
Referring to the multi-branch shown in fig. 2, the target operation sequence is an operation sequence that occupies the smallest memory space in the whole process of executing the multi-branch operation layer, and includes a plurality of operation schemes according to different operation sequences, generally, when the operation sequence of the multi-branch operation is determined, the branch structure may be converted into a directed graph, as shown in fig. 3, the output tensor of each layer of the multi-branch shown in fig. 2 is used as the vertex of the directed graph, the vertex value is the memory space required for executing the output result of the layer, according to the directed graph, the vertex operation sequence in the same branch cannot be adjusted, and the vertex operation sequence in different branches can be adjusted at will, so that the corresponding vertex value is determined to be the smallest in all the operation sequences, that is, the operation sequence that occupies the smallest memory space is the target operation sequence. In addition, various operation modes of the multi-branch operation layer can be determined in a breadth-first mode or a depth-first mode, and then the operation mode with the minimum occupied memory space is determined to be the target operation mode according to the size of the memory space occupied by each operation mode.
S2 determines a target memory allocation scheme for executing successive storage of branch operation results of the multi-branch operation layer in the target operation order.
Specifically, the operation is executed according to the determined target operation sequence of the multi-branch operation, and finally the operation result of each branch needs to be stored in the memory space, and the operation result of each branch in the multi-branch operation needs to be continuously stored in the memory space, so that fragmentation of the memory space is avoided. Referring to FIG. 4, it is shown that the input and output memory occupancy statistics tables in the execution process of the multi-branch operation in FIG. 2 are obtained according to the target operation sequence, i.e., T0- > T1, TI- > T2, T0- > T3, T3- > T4, T0- > T5, T5- > T6, and T0- > T7, and the operation result of each corresponding branch is the underlined parts "32, 128, 64", i.e., the target memory allocation scheme shown in FIG. 5.
S3, according to the target memory allocation scheme and the target operation sequence, determining the memory allocation scheme for executing the multi-branch operation layer.
Specifically, in order to ensure that each branch operation result can be continuously stored in the memory space during the execution of the multi-branch operation layer, a memory allocation scheme needs to be set for each input and output result during the execution of the branch operation according to the target operation sequence, that is, memory allocation needs to be performed for the inputs and outputs corresponding to 7 operation layers in the multi-branch in fig. 4, and at the same time, it is ensured that 4 branch operation results in fig. 5 are continuously stored in the memory space.
Furthermore, when a memory allocation scheme in the process of executing multi-branch operation is set, the memory allocation position and size of each branch operation result in the target memory scheme need to be considered in advance, and meanwhile, when the output result in the branch operation layer is no longer used as the input of the subsequent operation layer, the memory space of the output result is released. The method ensures that the operation results of each branch are continuously stored in the memory space according to the target memory allocation scheme, avoids fragments in the memory space and provides the utilization rate of the memory space.
With continued reference to fig. 2 and 3, in some preferred embodiments, the S1 includes:
and determining various operation orders and target operation orders of the multi-branch operation layer according to the directed graph operation mode.
Specifically, when determining the operation sequence of the multi-branch operation layer, the multi-branch operation layer may be converted into a directed graph structure, which is shown as a multi-branch operation layer in fig. 2 and is converted into a directed graph structure corresponding to the map 3, wherein an output result of each layer is used as a vertex in the graph, a value of the vertex represents a memory space required by the output result, a vertex corresponding to a first operation layer of the branch operation layer is a head vertex, and a vertex corresponding to a last operation layer of the branch operation layer is a tail vertex; the edges between the vertices are connected according to the following rules:
(1) the operation sequence of the vertexes in the same branch cannot be adjusted at will, and the vertexes are connected by a one-way edge according to the operation sequence, for example, the solid line and the arrow part in fig. 3;
(2) the operation sequence of the vertexes in different branches can be adjusted at will and are connected by two-way edges, such as the dotted lines and the arrow parts in fig. 3.
Further, the directed graph is traversed starting from the head vertex and ending at the tail vertex. In the same branch, if a certain one-way edge points to v from u, the vertex v is called as a back-driving vertex of the vertex u, and the vertex u is called as a front-driving vertex of v; and a precursor list _ list and a post-driver list _ list of each vertex are created according to the method.
Traversing starts from a head vertex and ends from a tail vertex, and the specific traversing process is as follows:
when traversal starts, setting a variable total _ size to represent the memory space required by the branch structure, and setting the variable value as 0, namely total _ size = 0;
when traversal starts, setting a list iter _ list to represent traversed vertexes, and adding head vertexes into the list, namely iter _ list = { head };
before adding the tail vertex to the iter _ list, all other vertices must have been added to the iter _ list; when the tail vertex tail is added into the iter _ list, the traversal is finished;
in the traversal process, when a certain vertex u leaves (namely a new vertex v is added into a traversal list iter _ list), the total _ size is increased, and the value of the increase is curr _ size of the vertex u, namely the total _ size = total _ size + curr _ size;
in the traversal process, when a certain vertex u leaves (namely a new vertex v is added into a traversal list iter _ list), u is deleted from a post-driving list sucessor _ list of a predecessor vertex w, if the post-driving list sucessor _ list of the vertex w is empty, the total _ size is reduced, and the curr _ size of the vertex w is reduced, namely the total _ size = total _ size-curr _ size;
when the traversal is finished, iter _ list represents a calculation sequence, and total _ size is the memory space required by the calculation sequence. And exhausting all calculation sequences, and finding out the operation sequence with the minimum required memory, namely the target operation sequence.
In some preferred embodiments, the S1 includes:
s11 (not shown) determining a plurality of operation orders of the multi-branch operation layer;
s12 (not shown) determines, according to the size of the memory space occupied correspondingly in the plurality of operation sequences, the operation sequence occupying the smallest memory space as a target operation sequence.
Specifically, when determining the target operation order of the multi-branch operation layer, a plurality of operation orders of the multi-branch operation layer may be determined first, for example, a plurality of different operation orders may be determined according to a depth priority principle or a breadth priority principle, the size of the memory space required in executing each different operation order is further calculated, and the operation order with the minimum required memory space is selected as the target operation order of the multi-branch operation layer.
In some preferred embodiments, the S11 includes:
determining a plurality of operation sequences of the multi-branch operation layer according to a depth priority principle; or determining a plurality of operation orders of the multi-branch operation layer according to a breadth first principle.
Specifically, taking the multi-branch shown in fig. 2 as an example, according to the depth-first principle, the operation sequence includes, but is not limited to, the following two types:
(a)T0->T7,T0->T5,T5->T6,T0->T3,T3->T4,T0->T1,TI->T2;
(b)T0->T1,TI->T2,T0->T3,T3->T4,T0->T5,T5->T6,T0->T7。
according to the breadth first principle, the operation sequence includes, but is not limited to, the following two types:
(c)T0->T7,T0->T5,T0->T3,T0->T1,T5->T6,T3->T4,TI->T2;
(d)T0->T1,T0->T3,T0->T5,T0->T7,T1->T2,T3->T4,T5->T6。
in some preferred embodiments, the S2 includes:
s21 (not shown) executing the multi-branch operation layer according to the target operation sequence, determining an operation result of each branch in the multi-branch operation layer;
s22 (not shown) determines the target memory allocation scheme according to the target operation sequence and the memory space size of the operation result of each branch.
Specifically, as shown in fig. 5, the case is in the process of executing the multi-branch target operation sequence shown in fig. 2, where the operation layer and the corresponding input memory and output memory are the size of the memory space required for executing the corresponding input and output of the layer in the multi-branch operation layer, where the underlined output results "32, 128, and 64" are the operation results corresponding to the four branches of the multi-branch operation layer in fig. 2, and the arrangement according to the target operation sequence is the information shown in fig. 5, that is, the target memory allocation scheme corresponding to the multi-branch operation layer in fig. 2, and is to continuously store the operation results of the four branches in the memory space.
In some preferred embodiments, the S3 includes:
s31 (not shown), allocating memory space for the input information and the output result of each branch operation layer by layer according to the target operation sequence;
s32 (not shown) determines the memory allocation scheme of the multi-branch operation layer according to the memory space allocation result.
Specifically, as shown in fig. 4, the operation process of the multi-branch operation layer shown in fig. 2 is executed according to the target operation sequence, a memory space needs to be allocated to the input and output information layer by layer, as shown in fig. 6, a memory allocation diagram for the multi-branch operation layer in fig. 2 according to the target operation sequence operation process is shown, the upper row shows the operation types of the 7 operation layers in fig. 4, each operation layer type corresponds to the following column, i.e., the input and output information memory allocation scheme when executing the layer, wherein a filling region labeled 1 represents a "locking region", i.e., the input and output of the operation of the layer cannot be stored in the region, a filling region labeled 2 represents an "output result storage region", i.e., the output result storage location and space of the operation of the layer, a filling region labeled 3 represents an "unallocated" memory space, i.e., the unallocated memory space, and a filling region labeled 4 represents an "input information storage region", i.e., the bits stored in the input information storage region of the And (4) arrangement and space.
In some preferred embodiments, the S31 includes:
and preferentially distributing the storage spaces at the top/bottom ends of the memory space to the input information and the output result of the operation layer.
Specifically, in order to ensure the full utilization of the memory space and finally store each branch result according to the target memory scheme, so as to prevent the fragmentation of the memory space, in the process of executing the branch operation layer, the input information or the output result is preferentially stored at the upper/lower ends of the memory space, as shown in fig. 6, when the operation layer increment 3a _ pool is executed, the input information is stored in the area corresponding to the reference numeral 4, the output result is stored in the area corresponding to the reference numeral 2, and the area corresponding to the middle position reference numeral 3 is in an unallocated state.
In some preferred embodiments, the S31 includes:
s311 (not shown) judges whether the output result of the next operation layer is the operation result of the corresponding branch;
if yes, in S312 (not shown), when allocating memory space for the input information and the output result of the currently executed operation layer, the memory space corresponding to the branch operation result is reserved.
Specifically, to ensure that each final branch result can be stored according to a target memory scheme, and prevent fragmentation of a memory space, when a memory allocation scheme is set in the process of executing a branch operation layer, if an output result of a next layer is the branch operation result, it is also necessary to consider that a memory space of the branch result is reserved in advance, that is, a memory space position of the branch result is locked in advance, as shown in fig. 6, when allocating memory to input and output information of an operation layer incept3a _ pool, it is determined that an output result of the next operation layer incept3a _1x1_ pool is an operation result of the branch, at this time, a memory space corresponding to the operation result "32" needs to be locked, as shown in a region corresponding to reference numeral 1 in fig. 6, which means that the memory of the input and output information of the operation layer incept3a _ pool cannot be allocated to the region corresponding to reference numeral 1. The same applies to the case where memory space is allocated for the input and output information of the operation layers include 3a _5x5_ reduce and include 3a _3x3_ reduce. Therefore, the branch operation results of the multi-branch operation layer can be continuously stored in the memory space according to the target memory scheme, the fragmentation of the memory is prevented, and the utilization efficiency of the memory is improved.
In some preferred embodiments, the method further comprises:
s4 (not shown) after executing the current operation layer, determining whether the input information of the current operation layer is needed to be used as the input information of a subsequent operation layer;
s5 (not shown), if not, releasing the memory space of the input information of the current operation layer.
Specifically, in the memory scheme setting stage, after input data of one operation layer is read to participate in operation execution, it is further required to determine whether the data is also used as input of other operation layers, that is, whether the data is also required by subsequent operations, so as to determine whether the data is retained, and if so, the data is retained and still stored in the currently read memory space; if not, deleting the data, releasing the memory space corresponding to the input information, and using the memory space by subsequent operation, thereby more efficiently and reasonably using memory resources and increasing the operation efficiency of the deep learning calculation network.
Further, as shown in fig. 2, in the multi-branch operation layer, since the output result T0 is input to a plurality of subsequent operation layers, T0 needs to be always reserved in the memory space, and the bottom row of the memory allocation scheme shown in fig. 6 is the memory space corresponding to T0 with the size of 192, but after being used as the input information of the operation layer incept3a _1x1, T0 is no longer used as the input information of the subsequent operation layers, at this time, T0 may be deleted, and the memory space corresponding to the operation layer with the size of 192 is released. The released memory space can be reused by the input and output of the subsequent operation layer.
According to another aspect of the present application, there is also provided a memory management system for efficiently accelerating deep learning computation, wherein the system at least includes:
a decider for making a corresponding memory management decision based on multi-branch operation layer information included in a deep learning computing network, wherein the decider is configured to:
and determining a target operation sequence of the multi-branch operation layer according to the size of the memory space required by the operation of the multi-branch operation layer.
Specifically, in the process of executing operation, the whole multi-branch operation layer is generally treated as an independent operation layer of the neural network, and the input of the multi-branch operation layer is the input of the independent operation layer, for example, the operation result corresponding to pooling in fig. 2; the output of the multi-branch operation layer is taken as the output of the independent operation layer, such as the output result of concat in FIG. 2. Furthermore, the multi-branch operation layer comprises a plurality of operation branches, the operation sequence of each operation branch is different, and different operation sequences correspond to different operation input and output information and have different requirements on the size of the internal memory.
Referring to the multi-branch shown in fig. 2, the target operation sequence is an operation sequence that occupies the smallest memory space in the whole process of executing the multi-branch operation layer, and includes a plurality of operation schemes according to different operation sequences, generally, when the operation sequence of the multi-branch operation is determined, the branch structure may be converted into a directed graph, as shown in fig. 3, the output tensor of each layer of the multi-branch shown in fig. 2 is used as the vertex of the directed graph, the vertex value is the memory space correspondingly required for executing the layer, according to the directed graph, the vertex operation sequence in the same branch cannot be adjusted, and the vertex operation sequence in different branches can be adjusted at will, and it is determined that the corresponding vertex value is the smallest in executing all operation sequences, that is, the operation sequence that occupies the smallest memory space is the target operation sequence. In addition, various operation modes of the multi-branch operation layer can be determined in a breadth-first mode or a depth-first mode, and then the operation mode with the minimum occupied memory space is determined to be the target operation mode according to the size of the memory space occupied by each operation mode.
And determining a target memory allocation scheme corresponding to each branch operation result of the multi-branch operation layer according to the target operation sequence.
Specifically, the operation is executed according to the determined target operation sequence of the multi-branch operation, and finally the operation result of each branch needs to be stored in the memory space, and the operation result of each branch in the multi-branch operation needs to be continuously stored in the memory space, so that fragmentation of the memory space is avoided. Referring to FIG. 4, it is shown that the input and output memory occupancy statistics tables in the execution process of the multi-branch operation in FIG. 2 are obtained according to the target operation sequence, i.e., T0- > T1, TI- > T2, T0- > T3, T3- > T4, T0- > T5, T5- > T6, and T0- > T7, and the operation result of each corresponding branch is the underlined parts "32, 128, 64", i.e., the target memory allocation scheme shown in FIG. 5.
A memory distributor, configured to distribute and manage memory space during operation of a multi-branch operation layer in the deep learning computation network based on a memory management decision made by the decision maker, wherein the memory distributor is configured to:
and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence.
Specifically, in order to ensure that each branch operation result can be continuously stored in the memory space during the execution of the multi-branch operation layer, a memory allocation scheme needs to be set for each input and output result during the execution of the branch operation according to the target operation sequence, that is, memory allocation needs to be performed for the inputs and outputs corresponding to 7 operation layers in the multi-branch in fig. 4, and at the same time, it is ensured that 4 branch operation results in fig. 5 are continuously stored in the memory space.
Furthermore, when a memory allocation scheme in the process of executing multi-branch operation is set, the memory allocation position and size of each branch operation result in the target memory scheme need to be considered in advance, and meanwhile, when the output result in the branch operation layer is no longer used as the input of the subsequent operation layer, the memory space of the output result is released. The method ensures that the operation results of each branch are continuously stored in the memory space according to the target memory allocation scheme, avoids fragments in the memory space and provides the utilization rate of the memory space.
Compared with the prior art, the method and the system can determine the target operation sequence of the multi-branch operation layer according to the size of the memory space required by the operation of the multi-branch operation layer; determining a target memory allocation scheme for executing continuous storage of branch operation results of the multi-branch operation layer according to the target operation sequence; and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence. Therefore, when executing multi-branch operation, the operation sequence occupying the least memory space in the operation process can be selected as the target operation sequence, and the target memory allocation scheme in the multi-branch operation process is determined according to the target sequence so as to ensure that the least memory space is occupied in the multi-branch operation process, and meanwhile, the data storage mode and the position in the multi-branch operation layer operation process are set according to the target memory scheme in which the branch results of the multi-branch operation layer are continuously stored, so that the branch operation results are continuously stored in the memory space, the size of the occupied memory space is reduced, and the operation efficiency of the whole neural network is improved.
The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.
The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.
The present application further provides a computer device, comprising:
one or more processors;
a memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.
Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.
An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (12)

1. A memory allocation method for effectively accelerating deep learning computation, wherein the deep learning computation network comprises a multi-branch operation layer, and the method comprises the following steps:
determining a target operation sequence of a multi-branch operation layer according to the size of a memory space required by the operation of the multi-branch operation layer;
determining a target memory allocation scheme for executing continuous storage of branch operation results of the multi-branch operation layer according to the target operation sequence;
and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence.
2. The method of claim 1, wherein said determining a target operation order of said multi-branch operation layer according to a size of a memory space required for said multi-branch operation layer operation comprises:
and determining various operation orders of the multi-branch operation layer and corresponding target operation orders according to the directed graph operation mode.
3. The method of claim 1, wherein said determining a target operation order of said multi-branch operation layer according to a size of a memory space required for said multi-branch operation layer operation comprises:
determining a plurality of operation orders of the multi-branch operation layer;
and determining the operation sequence with the minimum memory space occupation as a target operation sequence according to the size of the corresponding memory space occupation in the operation sequences.
4. The method of claim 3, wherein said determining a plurality of operation orders of said multi-branch operation layer comprises:
determining a plurality of operation sequences of the multi-branch operation layer according to a depth priority principle; or the like, or, alternatively,
and determining various operation orders of the multi-branch operation layer according to a breadth first principle.
5. The method of claim 1, wherein the determining a target memory allocation scheme for consecutively storing results of executing the branch operations of the multi-branch operation layer in the target operation order comprises:
executing the multi-branch operation layer according to the target operation sequence, and determining the operation result of each branch in the multi-branch operation layer;
and determining the target memory allocation scheme according to the target operation sequence and the memory space size of the operation result of each branch.
6. The method of claim 1, wherein said determining a memory allocation scheme for executing the multi-branch operation layer according to the target memory allocation scheme and the target operation order comprises:
allocating memory space for the input information and the output result of each branch operation layer by layer according to the target operation sequence;
and determining the memory allocation scheme of the multi-branch operation layer according to the memory space allocation result.
7. The method of claim 6, wherein the allocating memory space for the input information and the output result of each branch operation layer by layer according to the target operation sequence comprises:
and preferentially distributing the storage spaces at the top/bottom ends of the memory space to the input information and the output result of the operation layer.
8. The method of claim 6, wherein the allocating memory space for the input information and the output result of each branch operation layer by layer according to the target operation sequence comprises:
judging whether the output result of the next operation layer is the operation result of the corresponding branch;
if yes, reserving a memory space corresponding to the branch operation result when allocating the memory space for the input information and the output result of the current execution operation layer.
9. The method of any of claims 1-8, wherein the method further comprises:
after executing the current operation layer, judging whether the input information of the current operation layer is needed to be used as the input information of a subsequent operation layer;
and if not, releasing the memory space of the input information of the current operation layer.
10. A memory management system for efficiently accelerating deep learning computations, the system comprising at least:
a decider for making a corresponding memory management decision based on multi-branch operation layer information included in a deep learning computing network, wherein the decider is configured to:
determining a target operation sequence of a multi-branch operation layer according to the size of a memory space required by the operation of the multi-branch operation layer;
determining a target memory allocation scheme corresponding to each branch operation result of the multi-branch operation layer according to the target operation sequence;
a memory distributor, configured to distribute and manage memory space during operation of a multi-branch operation layer in the deep learning computation network based on a memory management decision made by the decision maker, wherein the memory distributor is configured to:
and determining to execute the memory allocation scheme of the multi-branch operation layer according to the target memory allocation scheme and the target operation sequence.
11. A computer-readable storage medium having stored thereon a computer program which, when executed, is capable of implementing the memory allocation method for efficient accelerated deep learning computation based on multi-branch scheduling, allocation according to any one of claims 1 to 9.
12. An electronic device, characterized in that the electronic device comprises at least:
one or more processors;
a memory for storing executable instructions;
the one or more processors are configured to implement, via the executable instructions, the memory allocation method of any of claims 1-9 based on multi-branch scheduling, allocated efficient accelerated deep learning computation.
CN202110028503.XA 2021-01-11 2021-01-11 Memory allocation method and system for effectively accelerating deep learning calculation Active CN112346877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110028503.XA CN112346877B (en) 2021-01-11 2021-01-11 Memory allocation method and system for effectively accelerating deep learning calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110028503.XA CN112346877B (en) 2021-01-11 2021-01-11 Memory allocation method and system for effectively accelerating deep learning calculation

Publications (2)

Publication Number Publication Date
CN112346877A true CN112346877A (en) 2021-02-09
CN112346877B CN112346877B (en) 2021-04-16

Family

ID=74427548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110028503.XA Active CN112346877B (en) 2021-01-11 2021-01-11 Memory allocation method and system for effectively accelerating deep learning calculation

Country Status (1)

Country Link
CN (1) CN112346877B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806078A (en) * 2021-08-27 2021-12-17 南京中科逆熵科技有限公司 Memory scheduling method for edge ai inference framework
CN114298294A (en) * 2021-12-28 2022-04-08 杭州雄迈集成电路技术股份有限公司 Neural network memory optimization method and device based on hardware accelerator

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304265A (en) * 2018-01-23 2018-07-20 腾讯科技(深圳)有限公司 EMS memory management process, device and storage medium
CN109447253A (en) * 2018-10-26 2019-03-08 杭州比智科技有限公司 The method, apparatus of video memory distribution calculates equipment and computer storage medium
CN110378413A (en) * 2019-07-17 2019-10-25 Oppo广东移动通信有限公司 Neural network model processing method, device and electronic equipment
CN110766135A (en) * 2019-10-15 2020-02-07 北京芯启科技有限公司 Method for storing required data when optimizing operation function of neural network in any depth
CN111814971A (en) * 2020-06-30 2020-10-23 杭州国芯科技股份有限公司 Memory allocation method of neural network
CN111984400A (en) * 2020-07-17 2020-11-24 深圳云天励飞技术有限公司 Memory allocation method and device of neural network
US20200380375A1 (en) * 2019-05-31 2020-12-03 Apple Inc. Decomposition of machine learning operations
CN112084037A (en) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 Memory allocation method and device of neural network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304265A (en) * 2018-01-23 2018-07-20 腾讯科技(深圳)有限公司 EMS memory management process, device and storage medium
CN109447253A (en) * 2018-10-26 2019-03-08 杭州比智科技有限公司 The method, apparatus of video memory distribution calculates equipment and computer storage medium
US20200380375A1 (en) * 2019-05-31 2020-12-03 Apple Inc. Decomposition of machine learning operations
CN110378413A (en) * 2019-07-17 2019-10-25 Oppo广东移动通信有限公司 Neural network model processing method, device and electronic equipment
CN110766135A (en) * 2019-10-15 2020-02-07 北京芯启科技有限公司 Method for storing required data when optimizing operation function of neural network in any depth
CN111814971A (en) * 2020-06-30 2020-10-23 杭州国芯科技股份有限公司 Memory allocation method of neural network
CN111984400A (en) * 2020-07-17 2020-11-24 深圳云天励飞技术有限公司 Memory allocation method and device of neural network
CN112084037A (en) * 2020-09-23 2020-12-15 安徽寒武纪信息科技有限公司 Memory allocation method and device of neural network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806078A (en) * 2021-08-27 2021-12-17 南京中科逆熵科技有限公司 Memory scheduling method for edge ai inference framework
CN114298294A (en) * 2021-12-28 2022-04-08 杭州雄迈集成电路技术股份有限公司 Neural network memory optimization method and device based on hardware accelerator
CN114298294B (en) * 2021-12-28 2022-11-01 杭州雄迈集成电路技术股份有限公司 Neural network memory optimization method and device based on hardware accelerator

Also Published As

Publication number Publication date
CN112346877B (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN112346877B (en) Memory allocation method and system for effectively accelerating deep learning calculation
CN112286694B (en) Hardware accelerator memory allocation method and system based on deep learning computing network
CN115220918A (en) Memory allocation method and device for neural network
CN108981739A (en) A kind of paths planning method, device, server and storage medium
CN106709503B (en) Large-scale spatial data clustering algorithm K-DBSCAN based on density
CN116302461A (en) Deep learning memory allocation optimization method and system
CN116739323B (en) Intelligent evaluation method and system for emergency resource scheduling
CN113485836B (en) Tensor processing method and tensor processing system based on tensor segmentation
CN112256440B (en) Memory management method and device for neural network inference
CN108108242B (en) Storage layer intelligent distribution control method based on big data
CN110930092B (en) Distribution route adjusting method and device, electronic equipment and storage medium
CN116862019A (en) Model training method and device based on data parallel paradigm
CN115461718A (en) Memory allocation in neural networks
CN110334994A (en) A kind of sowing position method for pre-distributing, device, computer equipment and storage medium
CN115952008A (en) Unified scheduling method and device for server cluster resources
Pu et al. MPEFT: A novel task scheduling method for workflows
CN114237903B (en) Memory allocation optimization method, memory allocation optimization device, electronic equipment, medium and program product
KR20230058621A (en) Memory-limit scheduling
CN116700995B (en) Concurrent access method, device, equipment and storage medium for heterogeneous memory pool
CN110321998B (en) Convolutional neural network implementation method and device, acceleration equipment and storage medium
Chen et al. The general yard allocation problem
CN116980423B (en) Model scheduling method, device, computing system, equipment and readable storage medium
CN116610456B (en) Memory optimization method based on eager memory reuse algorithm
CN113537885B (en) Method and system for delivering package by combining truck and unmanned aerial vehicle for delivery
Qiao et al. Massive parallel self-organizing map and 2-opt on GPU to large scale TSP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant