CN111666150B

CN111666150B - Storage space allocation method and device, terminal and computer readable storage medium

Info

Publication number: CN111666150B
Application number: CN202010390297.2A
Authority: CN
Inventors: 文博; 曹庆新
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2022-01-11
Anticipated expiration: 2040-05-09
Also published as: WO2021227789A1; CN111666150A

Abstract

The present application belongs to the technical field of data storage, and in particular, to a method, an apparatus, a terminal and a computer-readable storage medium for allocating a storage space, wherein the method includes: traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination; determining whether a first target input layer combination of allocated storage space exists among a plurality of input layer combinations included in an output layer combination of the first target layer combination; if a first target input layer combination of allocated memory space exists in the plurality of input layer combinations that the output layer combination of the first target layer combination comprises, then memory space that records the first target input layer combination in a memory space map is simultaneously allocated to the first target layer combination, simplifying the software programming complexity of the convolutional neural network processor.

Description

Storage space allocation method and device, terminal and computer readable storage medium

Technical Field

The present application belongs to the technical field of data storage, and in particular, to a method, an apparatus, a terminal and a computer-readable storage medium for allocating storage space.

Background

The Convolutional Neural Network (CNN) is composed of most basic layers (layers), each Layer corresponds to an operation, and operation types of the operations may include a Convolution operation (Convolution), a Pooling operation (firing), a per-Element operation (Element-Wise), a join operation (concatement), a full-join operation (full-Connected), a batch-Normalization operation (Bath-Normalization), and the like.

A Neural Network Processor (NNP) is a processor dedicated to performing convolutional neural network computational tasks. However, the software programming complexity of current convolutional neural network processors is generally high.

Disclosure of Invention

The embodiment of the application provides a storage space allocation method, a storage space allocation device, a terminal and a computer readable storage medium, which can simplify the software programming complexity of a convolutional neural network processor.

A first aspect of an embodiment of the present application provides a method for allocating a storage space, including:

traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination; an output layer combination of the first target layer combination comprises a plurality of input layer combinations;

allocating storage space for each of the first target layer combinations;

wherein allocating storage space for each of the first target layer combinations comprises:

determining whether a first target input layer combination of allocated storage space exists among a plurality of input layer combinations included in an output layer combination of the first target layer combination;

if there is a first target input layer combination of allocated storage space among a plurality of input layer combinations included in an output layer combination of the first target layer combination, simultaneously allocating storage space recording the first target input layer combination in a storage space mapping table to the first target layer combination, and recording a space size required when the storage space of the first target input layer combination is used once as size1+ size2, and recording a space size required when the storage space of the first target input layer combination is repeatedly used as a larger value between size1+ size2 and size _ max 1; wherein size1 is the size of storage space required when the storage space history for the first target input layer combination is used once, size2 is the size of storage space required to be occupied by the calculation result for the first target layer combination, and size _ max1 is the size of storage space required when the storage space history for the first target input layer combination is reused.

A second aspect of the embodiments of the present application provides an apparatus for allocating storage space, including:

the traversal unit is used for traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination; an output layer combination of the first target layer combination comprises a plurality of input layer combinations;

an allocation unit for allocating a storage space for each of the first target layer combinations;

the allocation unit, when allocating storage space for each of the first target layer combinations, is further configured to:

A third aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the above method.

In an embodiment of the application, a first target layer combination of which the output layer combination comprises a plurality of input layer combinations is obtained by traversing the input layer combinations and the output layer combinations of each layer combination of the convolutional neural network, and the step of determining whether a first target input layer combination of the allocated storage space exists in the plurality of input layer combinations comprised in the output layer combination of the first target layer combination is performed for each first target layer combination when a storage space is allocated for each first target layer combination, the storage space of the first target input layer combination is recorded in a storage space mapping table and is simultaneously allocated to the first target layer combination when a first target input layer combination of the allocated storage space exists in the plurality of input layer combinations comprised in the output layer combination of the first target layer combination, that is, the storage space of the first target input layer combination is simultaneously allocated to the first target layer combination, the storage space can simultaneously store the calculation results of the plurality of input layer combinations contained in the output layer combination of the first target layer combination, so that the convolutional neural network processor can read data from the same piece of storage space when executing the operation containing the plurality of input layer combinations, and the data does not need to be read from a plurality of pieces of storage space, thereby simplifying the software programming complexity of the convolutional neural network processor.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a first structural schematic diagram of data input-output relationships between convolutional neural network layer combinations provided by an embodiment of the present application;

FIG. 2 is a diagram illustrating a first result of allocating memory space by using a conventional memory space allocation method;

fig. 3 is a schematic implementation flow chart of a method for allocating a storage space according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a specific implementation of step 302 of a method for allocating a storage space according to an embodiment of the present application;

fig. 5 is a flowchart illustrating a specific implementation of step 403 of a method for allocating a storage space according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a specific implementation of freeing a storage space according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a first result of allocating a storage space according to the storage space allocation method of the present application;

FIG. 8 is a second structural diagram of data input output relationships between convolutional neural network layer combinations provided by embodiments of the present application;

FIG. 9 is a diagram illustrating a second result of allocating memory using an existing memory allocation method;

FIG. 10 is a diagram illustrating a second result of allocating memory space using the memory space allocation method of the present application;

FIG. 11 is a schematic structural diagram of an apparatus for allocating storage space according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

A Neural Network Processor (NNP) is a processor dedicated to performing convolutional neural network computational tasks. And the compiler is matched with the neural network processor and used for compiling the convolutional neural network model to generate machine codes which can execute calculation tasks on the neural network processor. To reduce the bandwidth requirements of the neural network processor for memory other than the local memory of the neural network processor, the compiler attempts to store the results of each Layer in the local memory of the neural network processor when slicing the convolutional neural network. A plurality of continuous layers form a Layer combination (Layer-Group), data exchange is carried out between the Layer-groups through a memory except the local memory of the neural network processor, and the Layer inside the Layer-Group carries out data exchange through the local memory of the neural network processor. How to allocate storage space in storage other than the local storage of the neural network processor to each Layer-Group is the work that the compiler memory management needs to do.

Because some operations in these convolutional neural networks contain only one input, and some contain multiple inputs. For example, Element-Wise typically contains two inputs, while concatemate contains two or more inputs. Therefore, when a plurality of successive layers are combined into one Layer combination (Layer-Group), when the operation type of the first Layer of the Layer-Group is an Element-Wise or concatemate operation type with a plurality of inputs, the Layer-Group will have a plurality of input Layer combinations, and the Layer-Group is one output Layer combination corresponding to each of the plurality of input Layer combinations.

For example, as shown in FIG. 1, the operation of the first Layer of Layer-Group n +1 in FIG. 1 is Element-Wise, and the two inputs of Element-Wise come from Layer-Group n-1 and Layer-Group n, respectively, then Layer-Group n +1 contains two input Layer combinations of Layer-Group n-1 and Layer-Group n, and Layer-Group n +1 is the output Layer combination of the two input Layer combinations of Layer-Group n-1 and Layer-Group n.

In practical application, if the calculation results of Layer-Group n-1 and Layer-Group n are stored in different storage spaces, for example, as shown in fig. 2, BUF0 and BUF1, respectively, the data of BUF0 and BUF1 need to be read during Element-Wise calculation, and the calculation result of Layer-Group n +1 needs to be stored in the unoccupied storage space BUF 2. However, when Element-Wise calculation needs to be performed on a plurality of rounds, the data storage mode needs to alternately read data between two different storage spaces, namely BUF0 and BUF1, when Element-Wise calculation is performed, and therefore software coding of the neural network processor has high complexity.

Based on this, the embodiments of the present application provide a method, an apparatus, a terminal, and a computer-readable storage medium for allocating a storage space, which can simplify the software programming complexity of a convolutional neural network processor.

In order to explain the technical means of the present application, the following description will be given by way of specific examples.

Fig. 3 is a schematic flow chart illustrating an implementation of a method for allocating a storage space, which is provided by an embodiment of the present application, and is applied to a terminal, and can be executed by an apparatus for allocating a storage space configured on the terminal, and is suitable for a situation where software programming complexity of a convolutional neural network processor needs to be simplified. The terminal can be an intelligent terminal such as a computer and a server. The method for allocating the storage space may include steps 301 to 302.

Step 301, traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination. The output layer combination of the first target layer combination comprises a plurality of input layer combinations.

In practical applications, the above process of traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network may include: the method further includes traversing the convolutional neural network from which layer combinations the input data for each layer combination in each layer combination comes, thereby resulting in an input layer combination and an output layer combination for each layer combination of the convolutional neural network, and the output layer combination comprises a first targeted layer combination of the plurality of input layer combinations, and the output layer combination comprises only a second targeted layer combination of one input layer combination.

The application obtains a first target layer combination whose output layer combination comprises a plurality of input layer combinations by traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network, and a second targeted layer combination, the output layer combination comprising only one input layer combination, such that when storage space is allocated for the first targeted layer combination, storage space for other input layer combinations included in an output layer combination corresponding to a first target layer combination may be allocated to the storage space for the first target layer combination, enabling storage of computation results for multiple input layer combinations included in an output layer combination corresponding to the first target layer combination into the same piece of storage space, such that when the convolutional neural network processor performs an operation including multiple input layer combinations, data can be read from the same piece of storage space, and therefore software programming complexity of the convolutional neural network processor is simplified.

It should be noted that, in the embodiment of the present application, allocating a memory space for the first target layer combination refers to allocating a memory space in a memory other than the local memory of the neural network processor for the first target layer combination, and the type of the memory other than the local memory of the neural network processor may include a double data rate synchronous dynamic random access memory (DDR SDRAM), a Synchronous Dynamic Random Access Memory (SDRAM), or a bus random access memory (RDRAM), which is not limited in this application.

Step 302, allocating a storage space for each of the first target layer combinations.

In the embodiment of the present application, when allocating a storage space for each first target layer combination, it is necessary to store calculation results of a plurality of input layer combinations included in an output layer combination corresponding to each first target layer combination into the same piece of storage space, respectively. Specifically, as shown in fig. 4, when allocating a storage space for each of the first target layer combinations, steps 401 to 402 may be performed respectively.

Step 401, determining whether there is a first target input layer combination of allocated storage space in a plurality of input layer combinations included in an output layer combination of the first target layer combination.

In an embodiment of the present application, it is determined whether a portion of the plurality of input layer combinations that comprise the output layer combination of the first target layer combination has completed allocation of storage space by determining whether the first target input layer combination for allocated storage space exists among the plurality of input layer combinations.

Step 402, if there is a first target input layer combination of allocated storage space among a plurality of input layer combinations comprised by an output layer combination of the first target layer combination, then simultaneously allocating storage space recording the first target input layer combination in a storage space map to the first target layer combination, and recording a space size required when the storage space of the first target input layer combination is used once as size1+ size2, and recording a space size required when the storage space of the first target input layer combination is repeatedly used as a larger value between size1+ size2 and size _ max 1.

Wherein size1 is the size of storage space required when the storage space history for the first target input layer combination is used once, size2 is the size of storage space required to be occupied by the calculation result for the first target layer combination, and size _ max1 is the size of storage space required when the storage space history for the first target input layer combination is reused.

Since the presence of a first target input layer combination of allocated storage space in a plurality of input layer combinations included in an output layer combination of a first target layer combination indicates whether or not the allocation of storage space has been completed for a part of the input layer combinations included in the plurality of input layer combinations, in order to ensure that the calculation results of the plurality of input layer combinations included in the same output layer combination can be stored in the same piece of storage space, it is necessary that the storage space recording the first target input layer combination in a storage space mapping table is allocated to the first target layer combination at the same time. Also, the space size required when the storage space of the first target input layer combination is used once needs to be recorded as size1+ size2, and the space size required when the storage space of the first target input layer combination is repeatedly used is recorded as a larger value between size1+ size2 and size _ max1, so that when dividing each piece of storage space, the division can be performed according to the space size required when the storage space recorded in the storage space mapping table is repeatedly used, and the division of the data storage location inside each piece of storage space can be performed according to the size2 of each newly added space.

Optionally, in some embodiments of the present application, as shown in fig. 4, after the step 401, a step 403 may be further included: allocating unoccupied storage space to the first target layer combination if the first target input layer combination for which storage space has been allocated does not exist among the plurality of input layer combinations that the output layer combination of the first target layer combination includes.

Since the absence of the first target input layer combination of the allocated memory space in the plurality of input layer combinations included in the output layer combination of the first target layer combination indicates that no memory space is allocated to each of the plurality of input layer combinations, any unoccupied memory space can be allocated to the first target layer combination, and the allocation of the memory space for the first target layer combination can be completed.

For example, as shown in FIG. 1, the output Layer combination Layer-Group n +1 corresponding to Layer-Group n-1 comprises two input Layer combinations Layer-Group n-1 and Layer-Group n; when the first target Layer combination is Layer-Group n-1 and the storage space is allocated for the first target Layer combination Layer-Group n-1, the corresponding output Layer combination Layer-Group n +1 contains two input Layer combinations of Layer-Group n-1 and Layer-Group n, and the first target input Layer combination with the allocated storage space does not exist, so that any unoccupied storage space BUF m can be allocated to the first target Layer combination Layer-Group n-1.

When the first target Layer combination is Layer-Group n and the storage space is allocated to the first target Layer combination Layer-Group n, the corresponding output Layer combination Layer-Group n +1 contains Layer-Group n-1 and Layer-Group n, the first target input Layer combination Layer-Group n-1 with the storage space allocated exists, so that the storage space BUF m allocated to Layer-Group n-1 needs to be allocated to Layer-Group n at the same time. That is, recording BUF m in the storage space mapping table and simultaneously allocating BUF m to Layer-Group n to ensure that the calculation results of two input Layer combinations (Layer-Group n-1 and Layer-Group n) included in the same output Layer combination Layer-Group n +1 can be stored in the same piece of storage space BUF m, so that the convolutional neural network processor can read the calculation results of Layer-Group n-1 and Layer-Group n from the same piece of storage space BUF m without data reading from two pieces of storage spaces when performing an operation including a plurality of input Layer combinations Layer-Group n +1, thereby simplifying the software programming complexity of the convolutional neural network processor.

Specifically, when BUF m is simultaneously allocated to Layer-Group n in the storage space mapping table, it is further necessary to record the size of the space required when the storage space of the first target input Layer combination is used once as size1+ size2, and record the size of the space required when the storage space of the first target input Layer combination is repeatedly used as a larger value between size1+ size2 and size _ max 1.

Wherein size1 is the size of the storage space needed by the storage space history record of the first target input Layer combination Layer-Group n-1 when being used once, namely the size of the storage space needed to be occupied by the calculation result of Layer-Group n-1; size2 is the size of the storage space required by the calculation result of the first target Layer combination Layer-Group n, size _ max1 is the size of the storage space required by the storage space history record of the first target input Layer combination when the storage space history record is reused, so that when each piece of storage space is divided, the storage space can be divided according to the size of the storage space required by the calculated storage space when the storage space is reused, and the division of the data storage position inside each piece of storage space can be divided according to the size of size2 of each newly added space.

For example, according to the allocation information of the storage space recorded in the storage space mapping table, the larger value between size1+ size2 and size _ max1 of the space size of BUF m can be obtained; Layer-Group n-1 is deposited at a relative address offset in the address space that may be base address + BUF m; Layer-Group n is deposited at a relative address offset + size1 where the address space may be base address + BUF m. Wherein the base address refers to the starting hardware address of the result of the computation of the storage layer combination. The relative address offset of BUF m refers to an address offset of the address of BUF m relative to the base address.

According to the embodiment of the application, when the storage space is allocated to each first target layer combination, the calculation results of the plurality of input layer combinations contained in the output layer combination corresponding to the first target layer combination are stored in the same piece of storage space, so that the convolutional neural network processor can read data from the same piece of storage space when executing the operation containing the plurality of input layer combinations, the data do not need to be read from the plurality of pieces of storage space, and the software programming complexity of the convolutional neural network processor is further simplified.

In some embodiments of the present application, in order to reduce memory fragmentation of the storage space, as shown in fig. 5, the step 403 for allocating the unoccupied storage space to the first target layer combination may include steps 501 to 502.

Step 501, searching the storage space mapping table, and judging whether the storage space mapping table is empty.

Step 502, if the storage space mapping table is empty, recording that the storage space with the relative address offset of 0 is allocated to the first target layer combination in the storage space mapping table; and recording a space size required when the storage space having the relative address offset of 0 is used once as size2, and recording a space size required when the storage space having the relative address offset of 0 is repeatedly used as a larger value between size2 and size _ max 2; wherein size _ max2 is the size of the storage space required when the storage space history record with the relative address offset of 0 is reused.

In this embodiment, in order to reduce memory fragmentation of a storage space, when the storage space is allocated for each of the first target layer combination and the second target layer combination, new storage spaces are sequentially allocated. That is, allocation is started from a first piece of storage space with a relative address offset of 0, then, a second piece of storage space with a relative address offset of 0+ a is allocated to a corresponding layer combination, then, a third piece of storage space with a relative address offset of 0+ a + B is allocated to a corresponding layer combination, where a is the size of the first piece of storage space, B is the size of the second piece of storage space, and so on, so that no wasted storage space exists between the previous piece of storage space and the next piece of storage space, and memory fragmentation of the storage space is reduced. The memory space having a relative address offset of 0 is a memory space having an address offset of 0 relative to the base address.

Specifically, the present application determines whether a memory space with a relative address offset of 0 has been allocated to one or more layer combinations by determining whether the memory space mapping table is empty, so that when the memory space mapping table is empty, a memory space with a relative address offset of 0 is allocated to the first target layer combination. That is, recording in the memory map that memory with a relative address offset of 0 has been allocated to the first target layer combination; and recording a space size required when the storage space having the relative address offset of 0 is used once as size2, and recording a space size required when the storage space having the relative address offset of 0 is repeatedly used as a larger value between size2 and size _ max 2.

In some embodiments of the present application, as shown in fig. 5, after step 501, steps 503 to 504 may be further included.

Step 503, if the storage space mapping table is not empty, determining whether a released storage space exists according to the storage space mapping table.

Step 504, if there is a released storage space, recording in a storage space mapping table that the released storage space has been allocated to the first target layer combination, and recording a space size required when the released storage space is used once as size2, and recording a space size required when the released storage space is repeatedly used as a larger value between size2 and size _ max 3; the size _ max3 is the amount of storage space required for the freed storage history to be reused.

Since the memory map is not empty, indicating that memory with a relative address offset of 0 has been allocated to one or more layer combinations, it is necessary to find other unoccupied memory and allocate the found unoccupied memory to the first target layer combination.

However, in order to further reduce the memory fragmentation of the storage space, when searching for other unoccupied storage spaces, it may be determined whether there is a released storage space according to the storage space mapping table, so that when there is a released storage space, the released storage space is allocated to the first target layer combination. That is, recording in a storage space mapping table that the released storage space has been allocated to the first target layer combination, and recording a space size required when the released storage space is used once as size2, and a space size required when the released storage space is repeatedly used as a larger value between size2 and size _ max 3; the size _ max3 is the amount of storage space required for the freed storage history to be reused.

It should be noted that, since the size of the storage space required for the released storage space history to be reused may be larger than size2, or may be smaller than size2, and when the size of the storage space required when the freed storage space history is reused is smaller than size2, if the size of the space required when the released storage space is reused is not recorded as a large value between size2 and size _ max3, it will be such that when the storage space of each piece of memory is divided according to the size of the storage space required when the storage space recorded by the storage space mapping table is reused, this will result in the freed up storage space having a size that does not meet the space requirements of the pre-freed up layer combination, and, therefore, in allocating the freed storage space to the first target layer combination, the size of the space required when the freed storage space is reused needs to be recorded as a larger value between size2 and size _ max 3; also, similarly, in the above step 402, when the storage space for recording the first target input layer combination in the storage space mapping table is simultaneously allocated to the first target layer combination, the size of the space required when the storage space for recording the first target input layer combination is repeatedly used needs to be recorded as a larger value between size1+ size2 and size _ max 1.

Alternatively, in order to allocate the freed memory space to the first target layer combination when allocating the unoccupied memory space to the first target layer combination, as shown in fig. 6, in the process of allocating the memory space for the first target layer combination by using the allocation method of the respective memory spaces described above, after each allocation of the memory space for the first target layer combination, the method may include: step 601 to step 602.

Step 601, determining whether there is a first target output layer combination with unallocated storage space in the output layer combinations of the input layer combinations of the first target layer combination.

In an embodiment of the present application, it is determined whether the calculation of an input layer combination of a first target layer combination has been read by its output layer combination by determining whether the first target output layer combination for which storage space is unallocated exists in the output layer combination of the input layer combination of the first target layer combination.

Step 602, if there is no first target output layer combination of unallocated memory among the output layer combinations of input layer combinations of the first target layer combination, marking in a memory map that memory occupied by the input layer combination of the first target layer combination has been freed.

When a first targeted output layer combination of unallocated memory is not present in an output layer combination of the input layer combinations of the first targeted layer combination, a computation result representing the input layer combination of the first targeted layer combination has been read by its output layer combination and the output layer combination of the input layer combination of the first targeted layer combination has completed the computation using that computation result, i.e., the stored data of the memory space occupied by the input layer combination of the first targeted layer combination has been used by the output layer combination of the input layer combination of the first targeted layer combination, and therefore, the memory space occupied by the input layer combination of the first targeted layer combination can be marked as freed so that it can be reallocated to other layer combinations of the convolutional neural network that have not yet allocated memory.

In some embodiments of the present application, when a first targeted output layer combination is present in an output layer combination of an input layer combination of the first targeted layer combination that does not allocate memory, then the computation result representing the input layer combination of the first targeted layer combination will also be read by that first targeted output layer combination, and therefore, the memory occupied by the input layer combination of the first targeted layer combination cannot yet be marked as freed.

It should be noted that, in some embodiments of the present application, when it is determined that the memory space mapping table is not empty, an unoccupied memory space with a minimum relative address offset may also be allocated to the first target layer combination. Likewise, unoccupied storage space with minimal relative address offset may also be allocated to the first target layer combination when it is determined that there is no freed storage space.

In each of the above-described embodiments, the method for allocating storage space may further include: traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a second target layer combination; the output layer combination of the second target layer combination comprises only one input layer combination; correspondingly, the method for allocating the storage space may further include: in allocating memory for each second target layer combination, the second target layer combination is allocated memory that is unoccupied.

Specifically, in order to reduce memory fragmentation of the storage space, the allocating the unoccupied storage space to the second target layer combination may include: searching the storage space mapping table, and judging whether the storage space mapping table is empty or not; if the storage space mapping table is empty, recording that the storage space with the relative address offset of 0 is allocated to the second target layer combination in the storage space mapping table; and recording a space size required when the storage space having the relative address offset of 0 is used once as size3, and recording a space size required when the storage space having the relative address offset of 0 is repeatedly used as a larger value between size3 and size _ max 2; wherein size3 is the amount of memory space that needs to be occupied by the computation of the second target layer combination.

Similarly, in some embodiments of the present application, when allocating a storage space for each second target layer combination, if the storage space mapping table is not empty, it may be further determined whether there is a released storage space according to the storage space mapping table; recording, in a storage space mapping table, if there is a released storage space, a size of a space required when the released storage space is used once as size3, and a size of a space required when the released storage space is repeatedly used as a larger value between size3 and size _ max3, the released storage space having been allocated to the second target layer combination; the size _ max3 is the amount of storage space required for the freed storage history to be reused. And, after allocating the unoccupied storage space to the second target layer combination, may further include: determining whether a second targeted output layer combination of unallocated storage exists in output layer combinations of input layer combinations of the second targeted layer combination; marking, in a storage mapping table, that storage occupied by an input layer combination of the second target layer combination has been freed if there is no second target output layer combination of unallocated storage in an output layer combination of input layer combinations of the second target layer combination.

In the embodiment of the present application, after the memory allocation is performed on each layer combination of the convolutional neural network by using the allocation method of the storage space described in the above embodiments, each storage space and the data storage location inside each storage space may be divided according to the size of the storage space required when each storage space recorded in the storage space mapping table is used once and the size of the storage space required when the storage space is repeatedly used. When the convolutional neural network processor executes an operation containing a plurality of input layer combinations, the calculation results of the plurality of input layer combinations corresponding to the operation can be read from the same piece of storage space, the software programming complexity of the convolutional neural network processor is simplified, and when the operation is Concatenate, the operation can be directly skipped, so that the data access efficiency and the occupied storage space are improved.

For example, as shown in fig. 7, 2 storage spaces recorded in the storage space mapping table are BUF0, BUF1, respectively, BUF0 is simultaneously allocated to Layer-Group n-1 and Layer-Group n as shown in fig. 1, and the size of the storage space required when BUF0 recorded in the storage space mapping table is used once is size4+ size5, and the size of the space required when BUF0 recorded in the storage space mapping table is repeatedly used is size _ max 4; BUF1 is allocated to Layer-Group n +1 as shown in fig. 1, and the size of the storage space required when BUF1 recorded in the storage space mapping table is used once is size6, and the size of the space required when BUF1 recorded in the storage space mapping table is reused is size _ max 5. Therefore, when dividing each piece of storage space and the data storage position inside each piece of storage space according to the size of the storage space required when each storage space recorded in the storage space mapping table is used once and the size of the storage space required when it is repeatedly used, the size of the space into which BUF0 is divided is size _ max4, the size of the space into which BUF1 is divided is size _ max5, and the storage address of the calculation result of Layer-Group n-1 is the relative address offset of base address + BUF 0; the storage address of the calculation result of Layer-Group n is the relative address offset + size4 of the base address + BUF 0; the calculated result of Layer-Group n +1 is stored at the relative address offset of base address + BUF 1. When the convolutional neural network processor executes the Element-Wise operation of the first Layer of Layer-Group n +1, only the data stored in the BUF0 needs to be read, and the data does not need to be read from two different storage spaces, so that the software programming complexity of the convolutional neural network processor can be simplified.

For another example, as shown in fig. 8, the operation of the first Layer of Layer-Group n +1 is Concatenate, and two inputs of Concatenate are from Layer-Group n-1 and Layer-Group n, respectively, and the calculation result of Concatenate is output to Layer-Group n + 2. If the calculation results of Layer-Group n-1 and Layer-Group n are stored in different storage spaces, for example, as shown in fig. 9, storage space BUF0 and storage space BUF1, when performing the Concatenate calculation, the processor of the convolutional neural network needs to read the data of BUF0 and BUF1, and put the calculation result of Layer-Group n +1 into the unoccupied storage space BUF2, and when performing the operation of Layer-Group n +2, it needs to read the calculation result of Concatenate from BUF2 and store the calculation result of Concatenate into the released storage space BUF 0.

When the memory allocation method provided by the embodiment of the application is used for memory allocation, the output layers of the Layer-Group n-1 and the Layer-Group n are combined into the Layer-Group n +1 including the combination of the two input layers, so that when the memory space is allocated for the Layer-Group n-1 and the Layer-Group n, the same memory space is allocated for the Layer-Group n-1 and the Layer-Group n. For example, as shown in fig. 10, the calculation results of Layer-Group n-1 and Layer-Group n are stored in the same piece of storage space BUF0, and when the convolutional neural network processor performs the operation of Layer-Group n +2, the convolutional neural network processor can directly and continuously read the calculation result in BUF0 and store the calculation result in BUF0 in the unoccupied storage space BUF1, so that the transport step of the Concatenate and the occupied storage space are saved, the data access efficiency is improved, and the software programming complexity of the convolutional neural network processor is simplified.

It should be noted that for simplicity of description, the aforementioned method embodiments are all presented as a series of combinations of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, as some steps may occur in other orders in accordance with the present invention.

Fig. 11 shows a schematic structural diagram of an allocation apparatus 1100 for a storage space according to an embodiment of the present application, which includes a traversal unit 1101 and an allocation unit 1102.

A traversal unit 1101, configured to traverse the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination; an output layer combination of the first target layer combination comprises a plurality of input layer combinations;

an allocation unit 1102 for allocating a storage space for each of the first target layer combinations;

In some embodiments of the present application, the allocating unit 1102 is further configured to, after the determining whether there is a first target input layer combination of allocated storage space in the plurality of input layer combinations included in the output layer combination of the first target layer combination, allocate an unoccupied storage space to the first target layer combination if there is no first target input layer combination of allocated storage space in the plurality of input layer combinations included in the output layer combination of the first target layer combination.

In some embodiments of the present application, the allocating unit 1102 is further configured to search the storage space mapping table, and determine whether the storage space mapping table is empty; if the storage space mapping table is empty, recording that the storage space with the relative address offset of 0 is allocated to the first target layer combination in the storage space mapping table; and recording a space size required when the storage space having the relative address offset of 0 is used once as size2, and recording a space size required when the storage space having the relative address offset of 0 is repeatedly used as a larger value between size2 and size _ max 2; wherein size _ max2 is the size of the storage space required when the storage space history record with the relative address offset of 0 is reused.

In some embodiments of the present application, the allocating unit 1102 is further configured to, after the determining whether the storage space mapping table is empty, if the storage space mapping table is not empty, determine whether a released storage space exists according to the storage space mapping table; recording, in a storage space mapping table, if there is a released storage space, a size of a space required when the released storage space is used once as size2, and a size of a space required when the released storage space is repeatedly used as a larger value between size2 and size _ max3, the released storage space having been allocated to the first target layer combination; the size _ max3 is the amount of storage space required for the freed storage history to be reused.

In some embodiments of the present application, the allocating unit 1102 is further configured to determine whether there is a first target output layer combination of unallocated storage space in the output layer combinations of the input layer combinations of the first target layer combination after each allocation of one of the first target layer combination storage spaces is completed; if there is not a first targeted output layer combination of unallocated memory in the output layer combinations of input layer combinations for the first targeted layer combination, marking in a memory map memory occupied by the input layer combination for the first targeted layer combination as freed.

In some embodiments of the present application, the allocation unit 1102 is further configured to allocate an unoccupied storage space to the second target layer combination.

In some embodiments of the present application, the allocating unit 1102 is further configured to determine whether there is a second target output layer combination of unallocated memory in the output layer combinations of the input layer combinations of the second target layer combination after the allocating of the unoccupied memory to the second target layer combination; marking, in a storage mapping table, that storage occupied by an input layer combination of the second target layer combination has been freed if there is no second target output layer combination of unallocated storage in an output layer combination of input layer combinations of the second target layer combination.

It should be noted that, for convenience and brevity of description, the specific working process of the above-described allocation apparatus 1100 for storage space may refer to the corresponding process of the method described in fig. 1 to fig. 10, and is not described herein again.

As shown in fig. 12, the present application provides a terminal for implementing the above-mentioned allocation method of storage space, where the terminal 12 may include: a processor 120, a memory 121, and a computer program 122, such as a memory allocation program, stored in the memory 121 and operable on the processor 120. The processor 120 executes the computer program 122 to implement the steps in the above-mentioned embodiments of the allocation method of the storage space, such as the steps 301 to 302 shown in fig. 3. Alternatively, the processor 120, when executing the computer program 122, implements the functions of each module/unit in each device embodiment, for example, the functions of the units 1101 to 1102 shown in fig. 11.

The computer program may be divided into one or more modules/units, which are stored in the memory 121 and executed by the processor 120 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the terminal. For example, the computer program may be partitioned into a traversal unit and an allocation unit, each unit having the following specific functions:

The terminal can be a computer, a server and other computing equipment. The terminal may include, but is not limited to, a processor 120, a memory 121. Those skilled in the art will appreciate that fig. 12 is only an example of a terminal and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal may also include input-output devices, network access devices, buses, etc.

The Processor 120 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 121 may be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 121 may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal. Further, the memory 121 may also include both an internal storage unit and an external storage device of the terminal. The memory 121 is used to store the computer program and other programs and data required by the terminal. The memory 121 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for allocating storage space, comprising:

traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination in the layer combinations contained in the convolutional neural network; an output layer combination of the first target layer combination comprises a plurality of input layer combinations;

allocating storage space for each of the first target layer combinations;

2. The method of allocating storage space of claim 1, wherein after said determining whether there is a first targeted input layer combination of allocated storage space among a plurality of input layer combinations comprised by an output layer combination of said first targeted layer combination, comprising:

allocating unoccupied storage space to the first target layer combination if the first target input layer combination for which storage space has been allocated does not exist among the plurality of input layer combinations that the output layer combination of the first target layer combination includes.

3. The method of allocating storage space of claim 2, wherein said allocating unoccupied storage space to the first target layer combination comprises:

searching the storage space mapping table, and judging whether the storage space mapping table is empty or not;

if the storage space mapping table is empty, recording that the storage space with the relative address offset of 0 is allocated to the first target layer combination in the storage space mapping table; and recording a space size required when the storage space having the relative address offset of 0 is used once as size2, and recording a space size required when the storage space having the relative address offset of 0 is repeatedly used as a larger value between size2 and size _ max 2; wherein size _ max2 is the size of the storage space required when the storage space history record with the relative address offset of 0 is reused.

4. The method for allocating storage space according to claim 3, wherein after said determining whether said storage space mapping table is empty, further comprising:

if the storage space mapping table is not empty, judging whether the released storage space exists according to the storage space mapping table;

recording, in a storage space mapping table, if there is a released storage space, a size of a space required when the released storage space is used once as size2, and a size of a space required when the released storage space is repeatedly used as a larger value between size2 and size _ max3, the released storage space having been allocated to the first target layer combination; the size _ max3 is the amount of storage space required for the freed storage history to be reused.

5. The method of allocating memory space of any one of claims 1-4, after each completion of the allocation of said first target layer combined memory space, comprising:

determining whether a first target output layer combination of unallocated storage space exists in output layer combinations of input layer combinations of the first target layer combination;

if there is not a first targeted output layer combination of unallocated memory in the output layer combinations of input layer combinations for the first targeted layer combination, marking in a memory map memory occupied by the input layer combination for the first targeted layer combination as freed.

6. The method of allocating storage space of claim 1, wherein said method of allocating storage space further comprises: traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a second target layer combination; the output layer combination of the second target layer combination comprises only one input layer combination;

allocating unoccupied storage space to each of the second target layer combinations when allocating storage space for the second target layer combination.

7. The method of allocating memory space of claim 6, after said allocating unoccupied memory space to the second target layer combination, comprising:

determining whether a second targeted output layer combination of unallocated storage exists in output layer combinations of input layer combinations of the second targeted layer combination;

marking, in a storage mapping table, that storage occupied by an input layer combination of the second target layer combination has been freed if there is no second target output layer combination of unallocated storage in an output layer combination of input layer combinations of the second target layer combination.

8. An apparatus for allocating storage space, comprising:

the traversal unit is used for traversing the input layer combination and the output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination in the layer combinations contained in the convolutional neural network; an output layer combination of the first target layer combination comprises a plurality of input layer combinations;

9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.