CN112084037A

CN112084037A - Memory allocation method and device of neural network

Info

Publication number: CN112084037A
Application number: CN202011010035.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2020-12-15

Abstract

The embodiment of the application provides a memory allocation method and a memory allocation device for a neural network, wherein the method comprises the following steps: acquiring an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons; calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons; and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, the first data neuron is any data neuron in the original computation graph, and i is smaller than or equal to n. According to the method and the device, under the condition that the neural network comprises a computing core, memory allocation is carried out based on the storage period of the data neurons, and the memory reuse rate is improved.

Description

Memory allocation method and device of neural network

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method and an apparatus for allocating memory in a neural network.

Background

At present, a deep learning framework mainly uses a computation graph to express computation of a deep learning model, neurons of the computation graph can be divided into data neurons and operation neurons according to functions, and connection relations between the data neurons and the operation neurons correspond to data flow directions, so that a directed acyclic graph is formed, and computation tasks are assigned to execute dependency relations, and therefore the computation graph is widely applied to data flow computation and flow data processing. The data flow in the computation graph is controllable, and the execution sequence of the computation can be statically computed according to the computation graph. When there are multiple branches in the computation graph and there is only one computation kernel that can be used, then the sequential arrangement of the computation neurons of the multiple branches is particularly important. Generally, a neural network sequentially executes operations of a computational neuron according to a topological order of a computational graph, and the operations can be executed only after all operations directed to the operations are completed. However, when the calculation scale increases, the plurality of branches all occupy the memory size at the same time to perform the calculation, and therefore a large amount of memory is occupied, and the memory requirement of the neural network may not be met for the calculation device with limited memory resources.

Content of application

The embodiment of the application provides a memory allocation method and device for a neural network, which can effectively improve the memory utilization rate and improve the calculation efficiency.

In a first aspect, an embodiment of the present application provides a memory allocation method for a neural network, which is applied to a single-core processor, where the single-core processor includes n memory regions, where n is a positive integer, and the method includes:

acquiring an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons;

calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons;

and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, the first data neuron is any data neuron in the original computation graph, and i is smaller than or equal to n.

In a second aspect, an embodiment of the present application provides a memory allocation device for a neural network, which is applied to a single-core processor, where the single-core processor includes n memory regions, where n is a positive integer, and the device includes:

the acquisition unit is used for acquiring an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons;

a calculating unit, configured to calculate storage periods of all data neurons in the original computation graph based on the depth-first sequence of the operation neurons;

and the allocation unit is used for allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, the first data neuron is any data neuron in the original computation graph, and i is smaller than or equal to n.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory, a processor, a communication bus, and a communication interface, where the processor and the communication interface memory complete communication with each other through the communication bus; the memory is used for storing computer programs; the processor is configured to implement some or all of the steps described in the above first aspect when executing the program stored in the memory.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium including a computer program stored thereon for data exchange, the computer program, when executed by a processor, implementing some or all of the steps as described in the first aspect of embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application provides a memory allocation method and device for a neural network, which is used for acquiring an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons; calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons; and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, the first data neuron is any data neuron in the original computation graph, and i is smaller than or equal to n. According to the method and the device, under the condition that the neural network comprises a computing core, memory allocation is carried out based on the storage period of the data neurons, and the memory reuse rate is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a diagram illustrating a software stack of an artificial intelligence processor according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a computational graph provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a memory allocation method of a neural network according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a storage cycle length of a data neuron according to an embodiment of the present application;

fig. 6 is a schematic diagram of memory sharing optimization according to an embodiment of the present disclosure;

FIG. 7a is a schematic flow chart of an alternative optimization provided by an embodiment of the present application;

FIG. 7b is a flow chart illustrating that a branched operation cannot use the alternative optimization according to an embodiment of the present disclosure;

fig. 8a is a block diagram illustrating functional units of a memory allocation apparatus of a neural network according to an embodiment of the present disclosure;

fig. 8b is a block diagram illustrating functional units of a memory allocation apparatus of another neural network according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to better understand the scheme of the embodiments of the present application, the following first introduces the related terms and concepts that may be involved in the embodiments of the present application.

(1) Calculating a graph: a computation graph is one way to describe a computation process using a graph structure. If the computation is significantly modularity and there are significant temporal and logical dependencies between modules, it can be described generally using a directed graph structure. In practical application, there are two basic elements of the graph structure, which are nodes and directed edges. In practical application, the neural network is abstracted into a directed graph structure formed by tensor data and operators. Nodes are also called neurons, operators.

Generally, the neural network model is described by using a calculation graph, which is beneficial to overall grasping of the whole neural network calculation task, and meanwhile, the expression mode of the calculation graph is also convenient for scheduling and parallel execution of the calculation task.

(2) A deep learning framework: the deep learning framework refers to a framework for deep learning. Specifically, as shown in fig. 1, the deep learning framework is the first layer in the software stack of the artificial intelligence processor, and is used to communicate with deep learning applications and deep learning computing platforms with various underlying formats.

In the prior art, a deep learning framework generally adopts a computational graph as a main data structure for describing a neural network model, and on the basis, mapping of the computational graph to a bottom kernel function is completed by using neurons as granularity or cross-neuron granularity. Meanwhile, the deep learning framework may implement specific kernel functions in a manner that includes using a programming language directly or calling an underlying computational library.

In the embodiment of the present application, the deep learning framework may include, but is not limited to: google tensor flow graph Tensorflow, convolutional neural network framework Caffe (relational Architecture for Fast Feature embedding), MXNet, Torch, and so on. Taking Caffe as an example, Caffe supports various types of deep learning architectures, image-oriented classification and image segmentation, and can also support Convolutional Neural Networks (CNNs), Convolutional-CNNs (rcnnns) for target detection, Long-Short-Term Memory Neural Networks (LSTM), and fully-connected Neural network designs.

In the embodiment of the present application, the Caffe framework may support multiple types of basic neurons, and specifically, the multiple types of basic neurons referred to herein may include: common neural network neurons. For example, common neural network neurons are: convolution/deconvolution neurons, pooling neurons, activation neurons, softmax (classifier) neurons, fully connected neurons. Wherein, the activation operators may include, but are not limited to, ReLU, Sigmoid, Tanh, and other neurons that may be implemented in an interpolated manner.

In the embodiment of the present application, the functions under the Caffe framework may include: a Caffe Blob function, a Caffe Layer function, and a Caffe Net function. Wherein, Blob is used to store, exchange and process data and derivative information of forward and backward iterations in the network; layer is used for performing calculation, and may include non-linear operations such as convolution (convolution), pooling (pool), inner product (inner product), reconstructed-line and sigmoid, and may also include loss calculation (losses) such as element-level data transformation, normalization (normalization), data loading (load data), classification (softmax) and change. In a specific implementation, each Layer defines 3 important operations, which are initialization setting (setup), forward propagation (forward), and backward propagation (backward). Wherein setup is used for resetting layers and the connection between the layers during model initialization; forward is used for receiving input data from a bottom (bottom) layer, and outputting the input data to a top (top) layer after calculation; back ward is used to give the output gradient of the top layer, calculate the gradient of its input, and pass to the bottom layer. For example, the Layers may include Date Layers, volume Layers, Pooling Layers, InnerProduct Layers, ReLU Layers, Sigmoid Layers, LRN Layers, Dropout Layers, SoftmaxWithLoss Layers, Softmax Layers, Accuracy Layers, and the like. A Net starts with a data layer, i.e., loads data from disk, and ends with a loss layer, i.e., computes objective functions for tasks such as classification and reconstruction. In particular, Net is a directed acyclic computational graph composed of a series of layers, and Caffe preserves all intermediate values in the computational graph to ensure accuracy of forward and reverse iterations.

(3) An artificial intelligence processor: an artificial intelligence processor, also referred to as a special purpose processor, in the embodiments of the present application refers to a processor that is specific to a particular application or domain. For example: a Graphics Processing Unit (GPU), also called a display core, a visual processor, and a display chip, is a special processor dedicated to image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer and a smart phone). Another example is: a Neural Network Processor (NPU), which is a special processor for matrix multiplication in the field of artificial intelligence, adopts a structure of data-driven parallel computation, and is particularly good at Processing massive multimedia data such as video and images.

(4) Storage period: the coordinate interval represents the interval of the data neuron referenced by the operating neuron, namely the coordinate interval of the data in the data neuron referenced by the operating neuron for the first time to the operating neuron referenced by the operating neuron for the last time.

Currently, in the deep learning field, a certain memory block is occupied by a plurality of branches of neurons to perform computation, when all the branches of neurons complete execution, a storage period of the memory block is ended, and at this time, the memory block can be used by other neurons in a neural network. For example, the storage period of the memory block a may be preset to (1, 2, 3), which means that the memory block a can be used by the neuron 1, the neuron 2, and the neuron 3, and when the execution of the neuron 1, the neuron 2, and the neuron 3 is completed and the storage period of the memory block a is ended, the memory block a may be placed in an idle linked list and used by other neurons in the neural network. In the embodiments of the present application, the neurons in the neural network may also be referred to as nodes, operators, layers, and the like.

Currently, a neural network sequentially performs operations of calculating neurons according to a topological order of a calculation graph. When the calculation scale is increased, the plurality of branches simultaneously occupy the size of the memory to execute calculation, the memory blocks are distributed and multiplexed according to the topological sequence executed by the neurons in the neural network, and the memory distribution effect is poor.

For example, as shown in fig. 2, the computational graph of the entire neural network includes 11 operational neurons; the 12 data neurons are In, d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 and Out respectively, and the indexes of the operation neuron A, B, E, F, G, H, J, K, M, N, P are 0,1, 2, 3, 4, 5,6, 7, 8, 9 and 10 respectively according to the topological order of the computational graph. Through pre-analysis, when the neural network shown in fig. 1 operates, according to the topological order, the storage period of the data neuron d1 is [1,3], the storage period of the data neuron d2 is [2,4], the storage period of the data neuron d3 is [3,5], the storage period of the data neuron d4 is [4,6], the storage period of the data neuron d5 is [5,6], and the storage period of the data neuron d9 is [9,10] can be obtained. When the

branches

1 and 2 are serially processed, the storage period of the data neurons on each branch is increased, and when the number of branches is further increased, the phenomenon is more obvious, a large amount of memory is occupied, and the memory requirement of the neural network may not be met for the computing equipment with limited memory resources.

In order to solve the above problems, the present application provides a memory allocation method for a neural network, which may sequence operating neurons under the condition that the neural network includes a computational core, calculate storage periods of corresponding data neurons according to the sequenced operating neurons, reduce the storage periods of the data neurons corresponding to the operating neurons in each branch, and then perform memory allocation based on the storage periods of the data neurons, thereby improving a memory reuse rate.

The present application will be described in detail with reference to specific examples.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure, and as shown in fig. 3, the computer device includes a single-core processor, a memory, a communication bus, and a communication interface, where the single-core processor connects the memory and the communication interface through the communication bus.

The single-core processor is used for realizing the following steps when executing the program stored in the memory:

acquiring an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons; calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons; and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, the first data neuron is any data neuron in the original computation graph, and i is smaller than or equal to n.

Further, the single-core Processor may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The single-core processor can also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the memory allocation method of the neural network of the present application may be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software.

The single-core processor may also be an artificial intelligence processor, such as a Graphics Processing Unit (GPU) or an Image Processing Unit (IPU), a neural Network Processor (NPU). According to the difference of the single-core processors, the memory allocation method of the neural network provided by the embodiment of the application can be applied to the artificial intelligence application fields of image recognition processing, deep learning processing, computer vision processing, intelligent robot processing, natural language processing and the like, and can be used for executing complex function programs in the artificial intelligence field.

The Memory may be a Read-Only Memory (ROM), a Random Access Memory (RAM), or other Memory. In this embodiment of the present application, the memory is used for storing data and executing a software program corresponding to the memory allocation method of the neural network, for example, a program for calculating the storage period of all data neurons in the original computation graph based on the depth-first sequence of the operation neurons in this embodiment of the present application, and the like.

The communication interface enables communication between the computer device and other devices or a communication network using transceiver means such as, but not limited to, a transceiver. For example, model files sent by other devices may be received through the communication interface.

It should be understood that a computer device is merely one example provided for the embodiments of the present application and that a computer device may have more or fewer components than shown, may combine two or more components, or may have a different configuration implementation of components.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a memory allocation method of a neural network according to an embodiment of the present disclosure, where the method is applied to the computer device shown in fig. 3. As shown in fig. 4, the method includes the steps of:

s410, obtaining an original computation graph corresponding to the neural network, wherein the original computation graph comprises a plurality of operation neurons and data neurons corresponding to the operation neurons.

The deep learning framework can adopt a computational graph as a main data structure for describing the neural network model, nodes in the computational graph can be divided into data neurons and operation neurons according to functions of the nodes, and connection relations between the data neurons and the operation neurons correspond to data flow directions, so that a directed acyclic graph is formed. The operation neurons are used for performing calculation, and the data neurons can be input and output data, constant data and intermediate result data of the corresponding operation neurons. Output data is typically allocated space by the user, its size and the data are typically not allowed to be altered; the constant data are usually weight, parameter, etc. In some possible examples, the operational neurons may also be referred to as compute nodes, operational operators; the data neurons may also be referred to as data nodes, data operators.

And S420, calculating storage periods of all data neurons in the original calculation graph based on the depth-first sequence of the operation neurons.

In the embodiment of the application, the storage period is used for representing the interval of the data neuron referenced by the operation neuron. For example, as shown in FIG. 2, data neuron d2 is the output of operational neuron E, the input of operational neuron G. According to the topological sorting, the coordinates of EG are respectively 2 and 4, the storage period of data neuron d2 is [2 and 4], and the length of the corresponding storage period is 2. When the memory resource is limited, the storage period of the data neurons on each branch is increased, and when the number of branches is further increased, the memory resource occupied by the data neurons is multiplexed through a static compiling calculation graph, and the operation neurons need to be reordered.

Wherein the method further comprises: ordering the operational neurons based on a depth-first traversal. By depth-first ordering the operational neurons, the operational neurons of the same branch can be traversed first, thereby reducing the storage period of the data neurons on each branch. For example, as shown in fig. 5, the order of the depth-first ordered operation neurons is A, B, E, G, J, M, F, H, K, N, P, and when the operation neurons are depth-first ordered, the storage periods of the output data neuron d1 corresponding to the demarcation operation neuron B and the input data neuron d9 of the convergence operation neuron P in the figure are longer than the storage periods ordered by the topology, but the storage periods of the data neurons on the branches are all reduced, and the sum of the storage periods of all the data neurons is less than the sum of the storage periods of the data neurons when the operation neurons are topology ordered.

Optionally, the calculating a storage cycle of each data neuron based on the depth-first sequence of the operation neurons includes: based on the depth-first sequence of the operation neurons, acquiring the serial numbers of all corresponding operation neurons of each data neuron; and respectively taking the interval range between the minimum serial number and the maximum serial number in the serial numbers of all corresponding operation neurons of each data neuron as the storage period of each data neuron.

After sorting the operation neurons based on the depth-first traversal, a sequence number of each operation neuron may be obtained, for example, the sequence numbers of the operation neuron A, B, E, F, G, H, J, K, M, N, P in fig. 2 are respectively: 0. 1, 2, 6, 3, 7, 4, 8, 5, 9, 10. Based on the serial number of the operational neuron, the memory period of each data neuron can be calculated. The length of the storage cycle of the data neuron is the difference between the maximum sequence number and the minimum sequence number in the operation neuron to which the data neuron is referred, for example, the storage cycle of the data neuron d1 is [1,6], and the length of the storage cycle is 5; data neuron d3 has a memory period of [6,7] and a memory period length of 1.

S430, based on the storage period of the data neurons, allocating an ith memory area to the first data neurons, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neurons, the first data neurons are any data neurons in the original calculation graph, and i is smaller than or equal to n.

After calculating the storage period of the first data neuron, the ith memory region may be allocated to the first data neuron. The ith memory region may store data of other data neurons that do not overlap with the storage period of the first data neuron, and a remaining memory size of the ith memory region is greater than or equal to a memory size occupied by the first data neuron, that is, the ith memory region may store all data of the first data neuron.

In a possible embodiment, the method further comprises: under the condition that the storage period of the first data neuron and the storage period of the second data neuron do not overlap, allocating the ith memory area to the second data neuron, wherein the size of the memory occupied by the second data neuron comprises the size of the memory occupied by the first data neuron, and the second data neuron is any one of the data neurons except the first data neuron in the original computation graph.

In the embodiment of the present application, a memory sharing allocation policy may be used to allocate the memory, so as to effectively reduce the memory usage of the data neurons. Memory sharing may allow two unrelated variables to access the same block of memory. As shown in fig. 6, after the data neuron d3 is computed, the data in the data neuron d1 may not be needed, and the memory size occupied by the data neuron d1 may be released to store the data of the data neuron d 3. The storage cycles of the data neurons are obtained through the calculation graph, the memory is shared on the premise that the storage cycles of the data neurons are not overlapped, the memory can be correctly allocated, and memory optimization is achieved.

In some possible examples, a method of constructing a conflict table or a conflict graph may be used, for example, constructing a conflict graph to record that data neurons have overlapping storage periods with other data neurons, and running a graph coloring algorithm to perform correct memory allocation for the data neurons having overlapping storage periods. In other possible examples, the step of traversing the graph may be simulated, an operation counter dependent on the data neuron may be constructed, and when the data neuron is used up by all corresponding operation neurons, the memory size occupied by the data neuron may be released, and the released memory size may be collected and allocated to other data neurons.

In the embodiment of the application, the memory allocation method of the neural network stores the memory allocation information of the data neurons, and performs memory allocation after determining the final memory allocation scheme, so as to improve the memory utilization rate.

Optionally, the method further includes: and under the condition that the size of the memory occupied by the second data neuron is larger than that of the ith memory area, expanding the size of the memory of the ith memory area to the size of the memory occupied by the second data neuron.

When a plurality of data neurons with non-overlapping storage cycles share the same memory region, there may be a plurality of situations: if the size of the memory occupied by the data neuron occupying the largest memory size in the plurality of data neurons is smaller than or equal to the size of the memory area, the size of the memory area is the original size of the memory; the memory size occupied by the data neuron with the largest memory size occupied by the plurality of data neurons is larger than the memory area, and the memory area needs to be expanded to the memory size occupied by the largest data neuron.

Specifically, the first data neuron is allocated to the ith memory region, and when the storage period of the first data neuron and the storage period of the second data neuron do not overlap, the second data neuron may also be allocated to the ith memory region. If the size of the memory occupied by the second data neuron is smaller than or equal to the size of the memory of the ith memory area, the ith memory area is still the original memory size; if the size of the memory occupied by the second data neuron is larger than that of the memory of the ith memory area, the size of the memory of the ith memory area needs to be expanded to the size of the memory occupied by the second data neuron, so that the ith memory area is used to the maximum extent, and the memory utilization rate is improved.

In a possible embodiment, the method further comprises: and under the condition that the storage period of the first data neuron and the storage period of the second data neuron are overlapped, allocating the ith memory area to the second data neuron, wherein the memory size occupied by the second data neuron comprises a first offset memory size, the first offset memory size is the memory size occupied by the first data neuron except for first collision data, and the first collision data is the data included by both the first data neuron and the second data neuron.

The first data neuron is allocated to the ith memory region, and the second data neuron overlapping with the storage cycle of the first data neuron may also be allocated to the ith memory region. Specifically, the memory size occupied by the first collision data when the storage period of the first data neuron overlaps with the storage period of the second data neuron is an offset memory size, and the data of the second data neuron is stored in the ith memory area, that is, the data in the second data neuron can multiplex the memory size occupied by the first collision data in the ith memory area, so that the first data neuron and the second data neuron share the ith memory area, and the reuse rate of the memory block is improved.

Optionally, the method further includes: and under the condition that the size of a first memory is larger than that of the ith memory area, expanding the size of the ith memory area to be the size of the first memory, wherein the size of the first memory is the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data.

Specifically, the first data neuron is allocated to the ith memory region, and when the storage period of the first data neuron overlaps with the storage period of the second data neuron, the second data neuron may also be allocated to the ith memory region. If the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data is smaller than or equal to the size of the memory of the ith memory area, the ith memory area is still the original memory size; if the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data is larger than the size of the memory of the ith memory area, the size of the memory of the ith memory area needs to be expanded to the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data, so that the ith memory area is used to the maximum extent, and the memory utilization rate is improved.

In a possible embodiment, the method further comprises: and when the corresponding operation neuron of the first data neuron and the corresponding operation neuron of the second data neuron execute the same operation, allocating the memory size occupied by the first data neuron to the second data neuron.

In practical applications, substitution optimization, i.e., substitution operation memory sharing, may be introduced into the computation graph optimization. For example, as shown in FIG. 7a, in a simple chain activation function operation, the sigmoid function may be calculated with an alternative operation, as shown in FIG. 7a, and if the operation neurons B, C, D are all chain activation function operations, then the data neurons d0, d1, and d2 may be allocated to the same memory region. In the inference phase, for example, when an operation is activated, the memory occupancy of its input and output is consistent, and there is no other data dependency, and its output data can directly cover the memory of the input data, then the storage period of the data neuron d0 in fig. 7a is changed from original [0,1] to [0,3 ].

It should be noted that when the chain operation has other branches, the alternative optimization cannot be performed. As shown in fig. 7b, d1 of the data neuron is dependent on d2 and d4 of the data neuron, and write contamination is introduced by using a replacement operation.

It can be seen that, in the memory allocation method of the neural network according to the embodiment of the present application, an original computation graph corresponding to the neural network is obtained, where the original computation graph includes a plurality of operation neurons and data neurons corresponding to the operation neurons; calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons; and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, and the first data neuron is any one data neuron in the original computation graph. According to the method and the device, under the condition that the neural network comprises a computing core, the memory is allocated according to the storage period of the data neurons after the depth priority ordering, so that the memory reuse rate can be improved, and the memory reuse rate is improved.

For example, when the memory allocation method for the neural network provided by the application is applied to image recognition processing, the processor acquires an original computation graph corresponding to the neural network from a memory, where the original computation graph includes a plurality of operation neurons and data neurons corresponding to the operation neurons; calculating storage periods of all data neurons in the original computational graph based on the depth-first sequence of the operational neurons; and allocating an ith memory area to the first data neuron based on the storage period of the data neuron, wherein the memory size of the ith memory area is larger than or equal to the memory size occupied by the first data neuron, and the first data neuron is any one data neuron in the original computation graph. And memory allocation is carried out through the storage period of the data neurons after depth-first sequencing, so that the memory reuse rate can be improved.

Further, when the memory allocation method provided by the application is applied to deep learning processing, the single-core processor acquires a deep learning instruction sequence from the memory, and then acquires a first optimization cycle from the deep learning instruction sequence, and when instructions in the first optimization cycle are distributed in the ith instruction block and the (i + 1) th instruction block, the first optimization cycle is transferred to the (i + 1) th instruction block, so that the instructions in the first optimization cycle are in the same instruction block, thereby reducing the jump condition caused by natural circulation, reducing the execution time of a deep learning processing program, and improving the operating efficiency of a deep learning system.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 8a, fig. 8a is a block diagram of functional units of a memory allocation apparatus 800 of a neural network according to an embodiment of the present disclosure, and is applied to a single-core processor, where the single-core processor includes n memory regions, where n is a positive integer. As shown in fig. 8a, the apparatus 800 comprises an obtaining unit 810, a calculating unit 820 and an assigning unit 830, wherein,

an obtaining unit 810, configured to obtain an original computation graph corresponding to the neural network, where the original computation graph includes a plurality of operation neurons and data neurons corresponding to the operation neurons;

a calculating unit 820, configured to calculate storage periods of all data neurons in the original computation graph based on the depth-first sequence of the operation neurons;

an allocating unit 830, configured to allocate an ith memory area to a first data neuron based on a storage period of the data neuron, where a memory size of the ith memory area is greater than or equal to a memory size occupied by the first data neuron, the first data neuron is any data neuron in the raw computation graph, and i is less than or equal to n.

Optionally, the calculating unit 820 is specifically configured to: based on the depth-first sequence of the operation neurons, acquiring the serial numbers of all corresponding operation neurons of each data neuron; and respectively taking the interval range between the minimum serial number and the maximum serial number in the serial numbers of all corresponding operation neurons of each data neuron as the storage period of each data neuron.

Optionally, the allocating unit 830 may further be configured to: under the condition that the storage period of the first data neuron and the storage period of the second data neuron do not overlap, allocating the ith memory area to the second data neuron, wherein the size of the memory occupied by the second data neuron comprises the size of the memory occupied by the first data neuron, and the second data neuron is any one of the data neurons except the first data neuron in the original computation graph.

Optionally, as shown in fig. 8b, the functional units of another neural network memory allocation apparatus 800 provided in the embodiment of the present application form a block diagram, where the apparatus 800 further includes: the expansion unit 840 may be, for example,

the extension unit is configured to, when the size of the memory occupied by the second data neuron is larger than the size of the memory in the ith memory area, extend the size of the memory in the ith memory area to the size of the memory occupied by the second data neuron.

Optionally, the allocating unit 830 is further configured to: and under the condition that the storage period of the first data neuron and the storage period of the second data neuron are overlapped, allocating the ith memory area to the second data neuron, wherein the memory size occupied by the second data neuron comprises a first offset memory size, the first offset memory size is the memory size occupied by the first data neuron except for first collision data, and the first collision data is the data included by both the first data neuron and the second data neuron.

Optionally, the expanding unit 840 is further configured to: and under the condition that the size of a first memory is larger than that of the ith memory area, expanding the size of the ith memory area to be the size of the first memory, wherein the size of the first memory is the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data.

Optionally, the allocating unit 830 is further configured to: and when the corresponding operation neuron of the first data neuron and the corresponding operation neuron of the second data neuron execute the same operation, allocating the memory size occupied by the first data neuron to the second data neuron.

It can be understood that the functions of each program module of the memory allocation device of the neural network according to the embodiment of the present application can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process of the method can refer to the related description of the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a terminal device, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A memory allocation method of a neural network is applied to a single-core processor, the single-core processor comprises n memory areas, n is a positive integer, and the method comprises the following steps:

2. The method of claim 1, wherein said computing a memory cycle for each of said data neurons based on a depth-first sequence of said operational neurons comprises:

based on the depth-first sequence of the operation neurons, acquiring the serial numbers of all corresponding operation neurons of each data neuron; and respectively taking the interval range between the minimum serial number and the maximum serial number in the serial numbers of all corresponding operation neurons of each data neuron as the storage period of each data neuron.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

under the condition that the storage period of the first data neuron and the storage period of the second data neuron do not overlap, allocating the ith memory area to the second data neuron, wherein the size of the memory occupied by the second data neuron comprises the size of the memory occupied by the first data neuron, and the second data neuron is any one of the data neurons except the first data neuron in the original computation graph.

4. The method of claim 3, further comprising:

and under the condition that the size of the memory occupied by the second data neuron is larger than that of the ith memory area, expanding the size of the memory of the ith memory area to the size of the memory occupied by the second data neuron.

5. The method according to claim 1 or 2, characterized in that the method further comprises:

and under the condition that the storage period of the first data neuron and the storage period of the second data neuron are overlapped, allocating the ith memory area to the second data neuron, wherein the memory size occupied by the second data neuron comprises a first offset memory size, the first offset memory size is the memory size occupied by the first data neuron except for first collision data, and the first collision data is the data included by both the first data neuron and the second data neuron.

6. The method of claim 5, further comprising:

and under the condition that the size of a first memory is larger than that of the ith memory area, expanding the size of the ith memory area to be the size of the first memory, wherein the size of the first memory is the sum of the size of the memory occupied by the second data neuron and the size of the memory occupied by the first conflict data.

7. The method according to any one of claims 3-6, further comprising:

and when the corresponding operation neuron of the first data neuron and the corresponding operation neuron of the second data neuron execute the same operation, allocating the memory size occupied by the first data neuron to the second data neuron.

8. The memory allocation device of the neural network is applied to a single-core processor, the single-core processor comprises n memory areas, n is a positive integer, and the device comprises:

9. A computer device comprising a processor, a memory, a communication bus and a communication interface, wherein the processor, the communication interface and the memory communicate with each other via the communication bus, the memory is used for storing computer programs, and the processor is used for executing the instructions stored in the memory to implement the steps of the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a computer program stored for data exchange, which computer program, when being executed by a processor, carries out the method according to any one of claims 1-7.