WO2021227789A1 - 存储空间的分配方法、装置、终端及计算机可读存储介质 - Google Patents

存储空间的分配方法、装置、终端及计算机可读存储介质 Download PDF

Info

Publication number
WO2021227789A1
WO2021227789A1 PCT/CN2021/088444 CN2021088444W WO2021227789A1 WO 2021227789 A1 WO2021227789 A1 WO 2021227789A1 CN 2021088444 W CN2021088444 W CN 2021088444W WO 2021227789 A1 WO2021227789 A1 WO 2021227789A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage space
layer combination
target
combination
input layer
Prior art date
Application number
PCT/CN2021/088444
Other languages
English (en)
French (fr)
Inventor
文博
曹庆新
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2021227789A1 publication Critical patent/WO2021227789A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application belongs to the field of data storage technology, and in particular relates to a storage space allocation method, device, terminal, and computer-readable storage medium.
  • Convolutional neural network consists of the most basic layers (Layer), each layer corresponds to an operation, and the operation type of this operation can include convolution operation (Convolution), pooling operation (Pooling), and element operation (Element-Wise), connection operation (Concatenate), fully connected operation (Fully-Connected), batch normalization operation (Bath-Normalization), etc.
  • the Neural Network Processor is a processor specifically used to perform convolutional neural network computing tasks.
  • the software programming complexity of current convolutional neural network processors is generally high.
  • the embodiments of the present application provide a storage space allocation method, device, terminal, and computer-readable storage medium, which can simplify the software programming complexity of the convolutional neural network processor.
  • the first aspect of the embodiments of the present application provides a storage space allocation method, including:
  • the output layer combination of the first target layer combination includes multiple input layer combinations
  • first target input layer combination If there is a first target input layer combination with allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, record the first target input layer combination in the storage space mapping table Storage space is allocated to the first target layer combination at the same time, and the size of the space required when the storage space of the first target input layer combination is used once is recorded as size1+size2, and the first target input layer
  • the space required when the combined storage space is reused is recorded as the larger value between size1+size2 and size_max1; where size1 is the storage space history record of the first target input layer combination when it is used once.
  • the required storage space size, size2 is the storage space required for the calculation result of the first target layer combination
  • size_max1 is the storage space required when the storage space history record of the first target input layer combination is reused size.
  • a second aspect of the embodiments of the present application provides a storage space allocation device, including:
  • An allocation unit configured to allocate storage space for each of the first target layer combinations
  • the allocation unit allocates storage space for each of the first target layer combinations, it is further configured to:
  • first target input layer combination If there is a first target input layer combination with allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, record the first target input layer combination in the storage space mapping table Storage space is allocated to the first target layer combination at the same time, and the size of the space required when the storage space of the first target input layer combination is used once is recorded as size1+size2, and the first target input layer
  • the space required when the combined storage space is reused is recorded as the larger value between size1+size2 and size_max1; where size1 is the storage space history record of the first target input layer combination when it is used once.
  • the required storage space size, size2 is the storage space required for the calculation result of the first target layer combination
  • size_max1 is the storage space required when the storage space history record of the first target input layer combination is reused size.
  • a third aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the steps of the foregoing method when the computer program is executed.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program implements the steps of the foregoing method when the computer program is executed by a processor.
  • the output layer combination contains the first target layer combination of multiple input layer combinations, and the first target layer combination is obtained for each of the first target layer combinations.
  • each first target layer combination is executed to determine whether there is a first target input layer combination that has allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination.
  • FIG. 1 is a first structural schematic diagram of the data input and output relationship between a combination of convolutional neural network layers provided by an embodiment of the present application;
  • FIG. 2 is a schematic diagram of the first result of allocating storage space using an existing storage space allocation method
  • FIG. 3 is a schematic diagram of the implementation process of a storage space allocation method provided by an embodiment of the present application.
  • step 302 of a storage space allocation method provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a specific implementation flow of step 403 of a method for allocating storage space provided by an embodiment of the present application;
  • FIG. 6 is a schematic diagram of a specific implementation process of releasing storage space provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the first result of allocating storage space using the storage space allocation method of the present application.
  • FIG. 8 is a second structural schematic diagram of the data input and output relationship between the convolutional neural network layer combinations provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a second result of allocating storage space by using an existing storage space allocation method
  • FIG. 10 is a schematic diagram of a second result of allocating storage space using the storage space allocation method of the present application.
  • FIG. 11 is a schematic structural diagram of an apparatus for allocating storage space provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the Neural Network Processor is a processor specifically used to perform convolutional neural network computing tasks.
  • the compiler matched with the neural network processor is used to compile the convolutional neural network model to generate machine code that can perform calculation tasks on the neural network processor.
  • the compiler tries to store the results of each layer in the local memory of the neural network processor when cutting the convolutional neural network. Multiple consecutive layers form a layer group (Layer-Group). Layer-Group exchanges data through memories other than the local memory of the neural network processor, and Layers inside the Layer-Group exchange data through the local memory of the neural network processor. Data exchange. How to allocate storage space in the memory other than the local memory of the neural network processor to each Layer-Group is the work that the compiler needs to do in memory management.
  • Layer-Group when the operation type of the first layer of the Layer-Group is an operation type with multiple inputs such as Element-Wise or Concatenate, the Layer- There will be multiple input layer combinations in the Group, and the Layer-Group is an output layer combination corresponding to the multiple input layer combinations.
  • the operation of the first layer of Layer-Group n+1 in Figure 1 is Element-Wise, and the two inputs of Element-Wise come from Layer-Group.
  • Layer-Group n+1 includes Layer-Group The two input layers of n-1 and Layer-Group n are combined, and Layer-Group n+1 is Layer-Group The output layer combination of the two input layer combinations n-1 and Layer-Group n.
  • the Layer-Group n-1 and Layer-Group The calculation results of n are stored in different storage spaces.
  • the storage space BUF0 and storage space BUF1 as shown in Figure 2 then the Element-Wise calculation needs to read the data of BUF0 and BUF1 respectively, and set Layer-Group n The calculation result of +1 is placed in the unoccupied storage space BUF2.
  • this data storage method makes it necessary to alternately read data between the two different storage spaces BUF0 and BUF1 when performing Element-Wise calculations, resulting in a neural network
  • the software coding of the processor has a high degree of complexity.
  • the embodiments of the present application provide a storage space allocation method, device, terminal, and computer-readable storage medium, which can simplify the software programming complexity of the convolutional neural network processor.
  • Figure 3 shows a schematic diagram of the implementation process of a storage space allocation method provided by an embodiment of the present application.
  • the method is applied to a terminal and can be executed by a storage space allocation device configured on the terminal, and is suitable for simplifying convolutional neural networks.
  • the software programming complexity of the network processor may be a computer, a server, and other intelligent terminals.
  • the foregoing storage space allocation method may include step 301 to step 302.
  • Step 301 Traverse the input layer combination and output layer combination of each layer combination of the convolutional neural network to obtain the first target layer combination.
  • the output layer combination of the first target layer combination includes a plurality of input layer combinations.
  • the above process of traversing the input layer combination and output layer combination of each layer combination of the convolutional neural network may include: traversing which layer combination the input data of each layer in each layer combination of the convolutional neural network comes from , And then get the input layer combination and output layer combination of each layer combination of the convolutional neural network, and the output layer combination contains the first target layer combination of multiple input layer combinations, and the output layer combination contains only one input layer combination of the second target Layer combination.
  • This application traverses the input layer combination and output layer combination of each layer combination of the convolutional neural network to obtain the first target layer combination whose output layer combination contains multiple input layer combinations, and the output layer combination contains only one input layer combination.
  • Two target layer combinations so that when allocating storage space for the first target layer combination, the storage space of other input layer combinations included in the output layer combination corresponding to the first target layer combination can be allocated to the storage space of the first target layer combination , Realize that the calculation results of multiple input layer combinations included in the output layer combination corresponding to the first target layer combination are stored in the same storage space, so that the convolutional neural network processor can perform operations that include multiple input layer combinations. Reading data from the same piece of storage space further simplifies the software programming complexity of the convolutional neural network processor.
  • allocating storage space for the first target layer combination refers to allocating storage space in the memory other than the local memory of the neural network processor for the first target layer combination, and, except for the neural network processor, the storage space is allocated to the first target layer combination.
  • the types of memory other than the local memory of the network processor can include double-rate synchronous dynamic random access memory (DDR) SDRAM), Synchronous Dynamic Random Access Memory (SDRAM) or Bus Random Access Memory (RDRAM), this application does not restrict this.
  • Step 302 Allocate storage space for each of the first target layer combinations.
  • step 401 to step 402 when allocating storage space for each first target layer combination, it is necessary to store the calculation results of multiple input layer combinations included in the output layer combination corresponding to each first target layer combination to The same piece of storage space. Specifically, as shown in FIG. 4, when allocating storage space for each of the first target layer combinations, step 401 to step 402 may be performed respectively.
  • Step 401 Determine whether there is a first target input layer combination that has allocated storage space among multiple input layer combinations included in the output layer combination of the first target layer combination.
  • Step 402 If there is a first target input layer combination with allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, record the first target input in a storage space mapping table The storage space of the layer combination is allocated to the first target layer combination at the same time, and the space required when the storage space of the first target input layer combination is used once is recorded as size1+size2, and the first The size of the space required when the storage space of the target input layer combination is reused is recorded as the larger value between size1+size2 and size_max1.
  • size1 is the storage space required when the storage space history record of the first target input layer combination is used once
  • size2 is the storage space required for the calculation result of the first target layer combination
  • size_max1 is The size of the storage space required when the storage space history record of the first target input layer combination is repeatedly used.
  • the output layer combination of the first target layer combination contains the first target input layer combination with allocated storage space in the multiple input layer combinations, it indicates whether some input layer combinations in the multiple input layer combinations have been stored. Space allocation. Therefore, in order to ensure that the calculation results of multiple input layer combinations included in the same output layer combination can be stored in the same piece of storage space, the storage space of the first target input layer combination needs to be recorded in the storage space mapping table. Are simultaneously assigned to the first target layer combination.
  • the size of the space required when the storage space of the first target input layer combination is used once as size1+size2, and the space required when the storage space of the first target input layer combination is reused The size is recorded as the larger value between size1+size2 and size_max1, so that when each piece of storage space is divided, it can be divided according to the size of the space required when the storage space recorded in the storage space mapping table is reused, and each piece The data storage location inside the storage space can be divided according to the size of the newly added space size2 each time.
  • step 403 may be further included: if the output layer combination of the first target layer combination includes multiple input layer combinations If there is no first target input layer combination for which storage space has been allocated, the unoccupied storage space is allocated to the first target layer combination.
  • the output layer combination Layer-Group n+1 corresponding to Layer-Group n-1 contains Layer-Group n-1 and Layer-Group n are the two input layer combinations; when the first target layer combination is Layer-Group n-1, and storage space is allocated for the first target layer combination Layer-Group n-1, its corresponding The output layer combination Layer-Group In the two input layer combinations of Layer-Group n-1 and Layer-Group n included in n+1, there is no first target input layer combination that has allocated storage space, so any piece of unoccupied storage space can be BUF m is allocated to the first target layer combination Layer-Group n-1.
  • the corresponding output layer combination Layer-Group n+1 contains Layer-Group n-1 and Layer-Group In these two input layer combinations, there is the first target input layer combination Layer-Group n-1 that has allocated storage space, so the storage space BUF m allocated to Layer-Group n-1 needs to be allocated to Layer-Group n-1 at the same time. Group n.
  • BUF m is allocated to Layer-Group n at the same time to ensure that the same output layer combination Layer-Group
  • the calculation results of the two input layer combinations (Layer-Group n-1 and Layer-Group n) included in n+1 can be stored in the same storage space BUF m, so that the convolutional neural network processor contains multiple inputs when executing During the operation of Layer-Group n+1, the Layer-Group can be read from the same piece of storage space BUF m
  • the calculation results of n-1 and Layer-Group n do not need to read data from two pieces of storage space, thereby simplifying the software programming complexity of the convolutional neural network processor.
  • size_max1 is the storage space required when the storage space history record of the first target input layer combination is reused, so that when each piece of storage space is divided, it can be calculated according to The obtained storage space is divided by the size of the space required when it is reused, and the data storage location inside each piece of storage space can be divided according to the newly added space size size2 each time.
  • the space size of BUF m is the larger value between size1+size2 and size_max1;
  • Layer-Group n-1 stored in the address space can be Base address+BUF
  • Layer-Group n is stored in the address space and can be the base address + BUF relative address offset of m + size1.
  • the base address refers to the starting hardware address of the calculation result of the storage layer combination.
  • the relative address offset of BUF m refers to the address offset of the address of BUF m relative to the base address.
  • the calculation results of multiple input layer combinations included in the output layer combination corresponding to the first target layer combination are stored in the same piece of storage space , Which enables the convolutional neural network processor to read data from the same piece of storage space when performing operations that include multiple input layer combinations, instead of reading data from multiple pieces of storage space, thus simplifying the convolutional neural network.
  • the software programming complexity of the network processor is not limited to:
  • allocating unoccupied storage space to the first target layer combination may include steps 501 to Step 502.
  • Step 501 Look up the storage space mapping table, and determine whether the storage space mapping table is empty.
  • Step 502 If the storage space mapping table is empty, record in the storage space mapping table that the storage space with a relative address offset of 0 has been allocated to the first target layer combination; and offset the relative address
  • the size of the space required when the storage space shifted to 0 is used once is recorded as size2, and the size of the space required when the storage space with the relative address offset of 0 is reused is recorded as the larger between size2 and size_max2 Value; where size_max2 is the storage space size required when the storage space history record with the relative address offset of 0 is reused.
  • a method of sequentially allocating new storage space is adopted. Way to allocate. That is, start the allocation from the first slice of storage space with a relative address offset of 0, and then allocate the second slice of storage space with a relative address offset of 0+A to the corresponding layer combination, and then offset the relative address
  • the third piece of storage space, 0+A+B is allocated to the corresponding layer combination, where A is the size of the first piece of storage space, B is the size of the second piece of storage space, and so on, so that the previous piece of storage space is the same as There will be no wasted storage space between the next storage space, thereby reducing the memory fragmentation of the storage space.
  • the storage space with a relative address offset of 0 refers to a storage space with an address offset of 0 relative to the base address.
  • this application determines whether the storage space with a relative address offset of 0 has been allocated to one or more layer combinations by judging whether the storage space mapping table is empty, so that when the storage space mapping table is empty , Allocating storage space with a relative address offset of 0 to the first target layer combination. That is, it is recorded in the storage space mapping table that the storage space with a relative address offset of 0 has been allocated to the first target layer combination; and the storage space with the relative address offset of 0 is used once.
  • the space size of is recorded as size2, and the space required when the storage space with the relative address offset of 0 is repeatedly used is recorded as the larger value between size2 and size_max2.
  • steps 503 to 504 may be further included.
  • Step 503 If the storage space mapping table is not empty, judge whether there is free storage space according to the storage space mapping table.
  • Step 504 If there is released storage space, record in the storage space mapping table that the released storage space has been allocated to the first target layer combination, and when the released storage space is used once The required space size is recorded as size2, and the space required when the released storage space is reused is recorded as the larger value between size2 and size_max3; the size_max3 is the history of the released storage space The size of the storage space required when the record is reused.
  • the storage space mapping table When the storage space mapping table is not empty, it means that the storage space with a relative address offset of 0 has been allocated to one or more layer combinations. Therefore, it is necessary to look for other unoccupied storage spaces, and to find the unoccupied storage space.
  • the occupied storage space is allocated to the first target layer combination.
  • the size of the storage space required when the released storage space history is reused may be greater than size2, or it may be less than size2, and it is required when the released storage space history is reused
  • the size of the storage space is less than size2
  • the data recorded in the storage space mapping table will be The size of the storage space required when the storage space is reused.
  • the space required when the released storage space is reused needs to be recorded as the larger value between size2 and size_max3; and, in the same way, the above step 402
  • the storage space mapping table when it is recorded in the storage space mapping table that the storage space of the first target input layer combination is allocated to the first target layer combination at the same time, the storage space of the first target input layer combination also needs to be reused
  • the space required at the time is recorded as the larger value between size1+size2 and size_max1.
  • Step 601 Determine whether there is a first target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the first target layer combination.
  • the input layer combination of the first target layer combination is determined by determining whether there is a first target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the first target layer combination. Whether the calculation result has been read by its output layer combination.
  • Step 602 If there is no first target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the first target layer combination, mark the first target layer combination in the storage space mapping table. The storage space occupied by the input layer combination has been released.
  • the storage space occupied by the input layer combination of the first target layer combination cannot be marked as released.
  • the unoccupied storage space with the smallest relative address offset may also be allocated to the first target layer combination.
  • the unoccupied storage space with the smallest relative address offset may also be allocated to the first target layer combination.
  • the storage space allocation method may further include: traversing the input layer combination and output layer combination of each layer combination of the convolutional neural network to obtain a second target layer combination; the second target The output layer combination of the layer combination contains only one input layer combination; correspondingly, the storage space allocation method may further include: when allocating storage space for each second target layer combination, allocating unoccupied storage space to The second target layer combination.
  • it may include: looking up the storage space mapping table, and determining whether the storage space mapping table is Empty; if the storage space mapping table is empty, record in the storage space mapping table that storage space with a relative address offset of 0 has been allocated to the second target layer combination; and offset the relative address
  • the size of the space required when the storage space of 0 is used once is recorded as size3, and the size of the space required when the storage space with the relative address offset of 0 is reused is recorded as the larger value between size3 and size_max2 ; Wherein, size3 is the size of the storage space that the calculation result of the second target layer combination needs to occupy.
  • the storage space mapping table when allocating storage space for each of the second target layer combinations, if the storage space mapping table is not empty, it can also be judged according to the storage space mapping table whether There is released storage space; if there is released storage space, record in the storage space mapping table that the released storage space has been allocated to the second target layer combination, and the released storage space
  • the size of the space required when used once is recorded as size3, and the size of the space required when the released storage space is reused is recorded as the larger value between size3 and size_max3; the size_max3 is the released The size of the storage space required when the history of the storage space is reused.
  • it may further include: determining whether there is unallocated storage space in the output layer combination of the input layer combination of the second target layer combination If there is no second target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the second target layer combination, mark the first target output layer combination in the storage space mapping table The storage space occupied by the input layer combination of the two target layer combination has been released.
  • each storage space recorded in the storage space mapping table can be used once.
  • the size of the storage space required at the time and the size of the storage space required when it is reused divide each piece of storage space and the data storage location inside each piece of storage space.
  • the convolutional neural network processor performs an operation that includes multiple input layer combinations, it can read the calculation results of multiple input layer combinations corresponding to the operation from the same piece of storage space, which simplifies the convolutional neural network processor.
  • Software programming complexity, and when the operation is Concatenate, the operation can also be skipped directly, thereby improving the efficiency of data access and the storage space occupied.
  • the two storage spaces recorded in the storage space mapping table are BUF 0 and BUF 1, respectively.
  • BUF 0 is simultaneously allocated to Layer-Group n-1 and Layer-Group as shown in Figure 1.
  • n and the storage space required when BUF 0 recorded in the storage space mapping table is used once is size4+size5, and the space required when BUF 0 recorded in the storage space mapping table is reused is recorded as size_max4;
  • BUF 1 is allocated to Layer-Group n+1 as shown in Figure 1, and the storage space required when BUF 1 recorded in the storage space mapping table is used once is size 6, and BUF 1 recorded in the storage space mapping table is The size of the space required for repeated use is recorded as size_max5.
  • the size of the space obtained by dividing BUF 0 is size_max4, the size of space obtained by dividing BUF 1 is size_max5, and the Layer-Group
  • the storage address of the calculation result of n-1 is the relative address offset of the base address + BUF 0; the storage address of the calculation result of Layer-Group n is the base address + BUF Relative address offset of 0 + size4; Layer-Group The storage address of the calculation result of n+1 is the relative address offset of the base address + BUF 1.
  • the convolutional neural network processor executes the element-wise operation of the first layer of Layer-Group n+1, it only needs to read the data stored in BUF 0, and does not need to read data from two different storage spaces. Therefore, the software programming complexity of the convolutional neural network processor can be simplified.
  • the operation of the first layer of Layer-Group n+1 is Concatenate, and the two inputs of Concatenate come from Layer-Group n-1 and Layer-Group respectively.
  • the calculation result of Concatenate is output to Layer-Group n+2.
  • the storage space BUF0 and storage space BUF1 as shown in Figure 9 the processor of the convolutional neural network needs to perform Concatenate calculations. Read the data of BUF0 and BUF1 respectively, and put the calculation result of Layer-Group n+1 into the unoccupied storage space BUF2.
  • the calculation result of Concatenate, and the calculation result of Concatenate is stored in the released storage space BUF0.
  • the storage space allocation method provided in the embodiment of the present application is used for memory allocation, since the output layer combination of Layer-Group n-1 and Layer-Group n is a Layer-Group containing a combination of two input layers n+1, therefore, when allocating storage space for Layer-Group n-1 and Layer-Group n, it will be used for Layer-Group n-1 and Layer-Group n allocate the same piece of storage space. For example, as shown in Figure 10, the calculation results of Layer-Group n-1 and Layer-Group n will be stored in the same storage space BUF0.
  • FIG. 11 shows a schematic structural diagram of an apparatus 1100 for allocating storage space provided by an embodiment of the present application, which includes a traversal unit 1101 and an allocation unit 1102.
  • a traversal unit 1101 configured to traverse the input layer combination and output layer combination of each layer combination of the convolutional neural network to obtain a first target layer combination; the output layer combination of the first target layer combination includes multiple input layer combinations;
  • An allocating unit 1102 configured to allocate storage space for each of the first target layer combinations
  • the allocation unit allocates storage space for each of the first target layer combinations, it is further configured to:
  • first target input layer combination If there is a first target input layer combination with allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, record the first target input layer combination in the storage space mapping table Storage space is allocated to the first target layer combination at the same time, and the size of the space required when the storage space of the first target input layer combination is used once is recorded as size1+size2, and the first target input layer
  • the space required when the combined storage space is reused is recorded as the larger value between size1+size2 and size_max1; where size1 is the storage space history record of the first target input layer combination when it is used once.
  • the required storage space size, size2 is the storage space required for the calculation result of the first target layer combination
  • size_max1 is the storage space required when the storage space history record of the first target input layer combination is reused size.
  • the above-mentioned allocating unit 1102 is further configured to determine whether there is a first target that has allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination. After the input layer combination, if there is no first target input layer combination that has allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, then the unoccupied storage space is allocated to all the input layer combinations. The first target layer combination.
  • the above-mentioned allocating unit 1102 is further configured to look up the storage space mapping table and determine whether the storage space mapping table is empty; if the storage space mapping table is empty, then The storage space mapping table records that the storage space with a relative address offset of 0 has been allocated to the first target layer combination; and the space required when the storage space with a relative address offset of 0 is used once is recorded as size2, the size of the space required when the storage space with the relative address offset of 0 is repeatedly used is recorded as the larger value between size2 and size_max2; where size_max2 is the storage space with the relative address offset of 0 The amount of storage space required when the history records are reused.
  • the above-mentioned allocating unit 1102 is further configured to, after determining whether the storage space mapping table is empty, and if the storage space mapping table is not empty, perform according to the storage space mapping table.
  • the table determines whether there is released storage space; if there is released storage space, record in the storage space mapping table that the released storage space has been allocated to the first target layer combination, and the released storage space
  • the size of the space required when the storage space is used once is recorded as size2, and the size of the space required when the released storage space is reused is recorded as the larger value between size2 and size_max3; the size_max3 is all The size of the storage space required when the historical record of the released storage space is reused.
  • the aforementioned allocation unit 1102 is further configured to determine the output layer combination of the input layer combination of the first target layer combination after each allocation of the storage space of the first target layer combination is completed. Whether there is a first target output layer combination with unallocated storage space in the first target layer combination; if there is no first target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the first target layer combination, then the storage space It is marked in the mapping table that the storage space occupied by the input layer combination of the first target layer combination has been released.
  • the aforementioned allocation unit 1102 is further configured to allocate unoccupied storage space to the second target layer combination.
  • the aforementioned allocating unit 1102 is further configured to determine the input layer combination of the second target layer combination after allocating the unoccupied storage space to the second target layer combination Whether there is a second target output layer combination with unallocated storage space in the output layer combination; if there is no second target output layer combination with unallocated storage space in the output layer combination of the input layer combination of the second target layer combination, Then mark in the storage space mapping table that the storage space occupied by the input layer combination of the second target layer combination has been released.
  • the present application provides a terminal for implementing the foregoing storage space allocation method.
  • the terminal 12 may include: a processor 120, a memory 121, and stored in the memory 121 and can be stored in the processor.
  • the computer program 122 running on 120 is, for example, a memory allocation program.
  • the steps in the foregoing embodiments of the storage space allocation method are implemented, for example, steps 301 to 302 shown in FIG. 3.
  • the processor 120 executes the computer program 122, the functions of the modules/units in the above device embodiments, such as the functions of the units 1101 to 1102 shown in FIG. 11, are realized.
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 121 and executed by the processor 120 to complete the application.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal.
  • the computer program can be divided into a traversal unit and a distribution unit, and the specific functions of each unit are as follows:
  • An allocation unit configured to allocate storage space for each of the first target layer combinations
  • the allocation unit allocates storage space for each of the first target layer combinations, it is further configured to:
  • first target input layer combination If there is a first target input layer combination with allocated storage space among the multiple input layer combinations included in the output layer combination of the first target layer combination, record the first target input layer combination in the storage space mapping table Storage space is allocated to the first target layer combination at the same time, and the size of the space required when the storage space of the first target input layer combination is used once is recorded as size1+size2, and the first target input layer
  • the space required when the combined storage space is reused is recorded as the larger value between size1+size2 and size_max1; where size1 is the storage space history record of the first target input layer combination when it is used once.
  • the required storage space size, size2 is the storage space required for the calculation result of the first target layer combination
  • size_max1 is the storage space required when the storage space history record of the first target input layer combination is reused size.
  • the terminal may be a computing device such as a computer or a server.
  • the terminal may include, but is not limited to, a processor 120 and a memory 121.
  • FIG. 12 is only an example of a terminal, and does not constitute a limitation on the terminal. It may include more or less components than those shown in the figure, or a combination of certain components, or different components, such as the
  • the terminal may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 120 may be a central processing unit (Central Processing Unit) Unit, CPU), it can also be other general-purpose processors, digital memory allocators (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 121 may be an internal storage unit of the terminal, such as a hard disk or memory of the terminal.
  • the memory 121 may also be an external storage device of the terminal, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory card equipped on the terminal. (Flash Card) and so on.
  • the memory 121 may also include both an internal storage unit of the terminal and an external storage device.
  • the memory 121 is used to store the computer program and other programs and data required by the terminal.
  • the memory 121 may also be used to temporarily store data that has been output or will be output.
  • the disclosed device/terminal and method may be implemented in other ways.
  • the device/terminal embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units or Components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Memory System (AREA)
  • Image Analysis (AREA)

Abstract

一种存储空间的分配方法、装置、终端及计算机可读存储介质,涉及数据存储技术领域,其中,所述方法包括:遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合(301);判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合(401);若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,简化卷积神经网络处理器的软件编程复杂度。

Description

存储空间的分配方法、装置、终端及计算机可读存储介质 技术领域
本申请属于数据存储技术领域,尤其涉及一种存储空间的分配方法、装置、终端及计算机可读存储介质。
本申请要求于2020年5月9日提交中国专利局,申请号为202010390297.2、发明名称为“存储空间的分配方法、装置、终端机计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
背景技术
卷积神经网络(CNN)由最基本的层(Layer)组成,每个Layer对应一个操作,并且,该操作的操作类型可以包括卷积操作(Convolution)、池化操作(Pooling)、按元素操作(Element-Wise)、连接操作(Concatenate)、全连接操作(Fully-Connected)、批处理归一化操作(Bath-Normalization)等。
神经网络处理器(NNP)是专门用来执行卷积神经网络计算任务的处理器。然而,目前的卷积神经网络处理器的软件编程复杂度普遍较高。
技术解决方案
本申请实施例提供一种存储空间的分配方法、装置、终端及计算机可读存储介质,可以简化卷积神经网络处理器的软件编程复杂度。
本申请实施例第一方面提供一种存储空间的分配方法,包括:
遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
为每个所述第一目标层组合分配存储空间;
其中,在为每个所述第一目标层组合分配存储空间时包括:
判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
本申请实施例第二方面提供一种存储空间的分配装置,包括:
遍历单元,用于遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
分配单元,用于为每个所述第一目标层组合分配存储空间;
所述分配单元在为每个所述第一目标层组合分配存储空间时,还用于:
判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
本申请实施例第三方面提供一种终端,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述方法的步骤。
本申请实施例第四方面提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述方法的步骤。
本申请实施例,通过遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到输出层组合包含多个输入层组合的第一目标层组合,并在为每个所述第一目标层组合分配存储空间时,对每个第一目标层组合均执行判断该第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合的步骤,在第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合时,在存储空间映射表中记录该第一目标输入层组合的存储空间被同时分配给该第一目标层组合,即,将该第一目标输入层组合的存储空间同时分配给该第一目标层组合,使得该存储空间可以同时存储第一目标层组合的输出层组合包含的多个输入层组合的计算结果,使得卷积神经网络处理器在执行包含多个输入层组合的操作时,可以从同一片存储空间中读取数据,而不需要从多片存储空间中读取数据,进而简化了卷积神经网络处理器的软件编程复杂度。
附图说明
图1是本申请实施例提供的卷积神经网络层组合之间的数据输入输出关系的第一结构示意图;
图2是采用现有的存储空间的分配方法分配存储空间的第一结果示意图;
图3是本申请实施例提供的一种存储空间的分配方法的实现流程示意图;
图4是本申请实施例提供的一种存储空间的分配方法步骤302的具体实现流程示意图;
图5是本申请实施例提供的一种存储空间的分配方法步骤403的具体实现流程示意图;
图6是本申请实施例提供的释放存储空间的具体实现流程示意图;
图7是采用本申请的存储空间的分配方法分配存储空间的第一结果示意图;
图8是本申请实施例提供的卷积神经网络层组合之间的数据输入输出关系的第二结构示意图;
图9是采用现有的存储空间的分配方法分配存储空间的第二结果示意图;
图10是采用本申请的存储空间的分配方法分配存储空间的第二结果示意图;
图11是本申请实施例提供的存储空间的分配装置的结构示意图;
图12是本申请实施例提供的终端的结构示意图。
本发明的实施方式
神经网络处理器(NNP)是专门用来执行卷积神经网络计算任务的处理器。与神经网络处理器配套的编译器,用于对卷积神经网络模型进行编译,产生可以在神经网络处理器上执行计算任务的机器码。为了减少神经网络处理器对除神经网络处理器的本地存储器以外的存储器的带宽需求,编译器在对卷积神经网络进行切割时尽量让每个Layer的结果存储在神经网络处理器的本地存储器。多个连续Layer组成一个层组合(Layer-Group),Layer-Group之间通过除神经网络处理器的本地存储器以外的存储器进行数据交换,Layer-Group内部的Layer通过神经网络处理器的本地存储器进行数据交换。如何给每个Layer-Group分配除神经网络处理器的本地存储器以外的存储器中的存储空间是编译器内存管理需要做的工作。
由于在这些卷积神经网络的操作中,有一些操作只包含一个输入,而有一些操作则包含多个输入。例如,Element-Wise通常包含两个输入,而Concatenate则包含两个输入或两个以上的输入。因此,在将多个连续Layer组成一个层组合(Layer-Group)时,当Layer-Group的第一层的操作类型为Element-Wise或Concatenate等存在多个输入的操作类型时,则该Layer-Group将存在多个输入层组合,并且,该Layer-Group即为该多个输入层组合分别对应的一个输出层组合。
例如,图1所示,图1中Layer-Group n+1的第一层的操作为Element-Wise,并且,Element-Wise的两个输入分别来自Layer-Group n-1和Layer-Group n,则Layer-Group n+1包含Layer-Group n-1和Layer-Group n这两个输入层组合,并且,Layer-Group n+1为Layer-Group n-1和Layer-Group n这两个输入层组合的输出层组合。
在实际应用中,若将Layer-Group n-1和Layer-Group n的计算结果分别存放到不同的存储空间,例如,如图2所示的存储空间BUF0和存储空间BUF1,则Element-Wise计算时需要分别读取BUF0和BUF1的数据,并将Layer-Group n+1的计算结果放到没有被占用的存储空间BUF2。然而,当Element-Wise需要进行多个round的计算时,这种数据存储方式使得进行Element-Wise计算时,需要在BUF0和BUF1这两片不同的存储空间之间交替读取数据,造成神经网络处理器的软件编码存在较高的复杂度。
基于此,本申请实施例提供了一种存储空间的分配方法、装置、终端及计算机可读存储介质,可以简化卷积神经网络处理器的软件编程复杂度。
为了说明本申请的技术方案,下面通过具体实施例来进行说明。
如图3示出了本申请实施例提供的一种存储空间的分配方法的实现流程示意图,该方法应用于终端,可以由终端上配置的存储空间的分配装置执行,适用于需简化卷积神经网络处理器的软件编程复杂度的情形。其中,上述终端可以为电脑、服务器等智能终端。上述存储空间的分配方法可以包括步骤301至步骤302。
步骤301,遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合。所述第一目标层组合的输出层组合包含多个输入层组合。
在实际应用中,上述遍历卷积神经网络每个层组合的输入层组合和输出层组合的过程中,可以包括:遍历卷积神经网络每个层组合中每个层的输入数据来自哪些层组合,进而得到卷积神经网络每个层组合的输入层组合和输出层组合,以及输出层组合包含多个输入层组合的第一目标层组合,输出层组合只包含一个输入层组合的第二目标层组合。
本申请通过遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到输出层组合包含多个输入层组合的第一目标层组合,以及输出层组合只包含一个输入层组合的第二目标层组合,以便在为第一目标层组合分配存储空间时,可以将第一目标层组合对应的输出层组合包含的其他输入层组合的存储空间分配给该第一目标层组合的存储空间,实现将第一目标层组合对应的输出层组合包含的多个输入层组合的计算结果存储至同一片存储空间,使得卷积神经网络处理器在执行包含多个输入层组合的操作时,可以从同一片存储空间中读取数据,进而简化了卷积神经网络处理器的软件编程复杂度。
需要说明的是,在本申请实施例中,为第一目标层组合分配存储空间是指为第一目标层组合分配除神经网络处理器的本地存储器以外的存储器中的存储空间,并且,除神经网络处理器的本地存储器以外的存储器的类型可以包括双倍速率同步动态随机存储器(DDR SDRAM)、同步动态随机存取存储器(SDRAM)或总线式随机存取存储器(RDRAM),本申请对此不做限制。
步骤302,为每个所述第一目标层组合分配存储空间。
本申请实施例中,在为每个第一目标层组合分配存储空间分配存储空间时,需要将每个第一目标层组合对应的输出层组合包含的多个输入层组合的计算结果分别存储至同一片存储空间。具体的,如图4所示,在为每个所述第一目标层组合分配存储空间时,可以分别执行步骤401至步骤402。
步骤401,判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合。
本申请实施例中,通过判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合,以确定该多个输入层组合中是否已经有部分输入层组合已完成存储空间的分配。
步骤402,若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值。
其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
由于第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合时,表示该多个输入层组合中是否已经有部分输入层组合已完成存储空间的分配,因此,为了保证同一输出层组合包含的多个输入层组合的计算结果可以被存储在同一片存储空间,需要在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合。并且,需要将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值,以便在划分每片存储空间时,可以根据存储空间映射表中记录的存储空间被重复使用时所需的空间大小进行划分,而每片存储空间内部的数据存储位置的划分则可以根据每次新增的空间大小size2进行划分。
可选的,在本申请的一些实施方式中,如图4所示,在上述步骤401之后,还可以包括步骤403:若所述第一目标层组合的输出层组合包含的多个输入层组合中不存在已分配存储空间的第一目标输入层组合,则将未被占用的存储空间分配给所述第一目标层组合。
由于第一目标层组合的输出层组合包含的多个输入层组合中不存在已分配存储空间的第一目标输入层组合时,表示该多个输入层组合中各个输入层组合均未进行存储空间的分配,因此,可以将任意一片未被占用的存储空间分配给该第一目标层组合,完成该第一目标层组合存储空间的分配。
例如,如图1所示,Layer-Group n-1对应的输出层组合Layer-Group n+1包含Layer-Group n-1与Layer-Group n这两个输入层组合;当第一目标层组合为Layer-Group n-1,并为该第一目标层组合Layer-Group n-1分配存储空间时,其对应的输出层组合Layer-Group n+1包含的Layer-Group n-1与Layer-Group n这两个输入层组合中,不存在已分配存储空间的第一目标输入层组合,所以可以将任意一片未被占用的存储空间BUF m分配给所述第一目标层组合Layer-Group n-1。
而当第一目标层组合为Layer-Group n,并为该第一目标层组合Layer-Group n分配存储空间时,其对应的输出层组合Layer-Group n+1包含的Layer-Group n-1与Layer-Group n这两个输入层组合中,存在已分配存储空间的第一目标输入层组合Layer-Group n-1,所以需要将已分配给Layer-Group n-1的存储空间BUF m同时分配给Layer-Group n。即,在存储空间映射表中记录BUF m被同时分配给Layer-Group n,以保证同一输出层组合Layer-Group n+1包含的两个输入层组合(Layer-Group n-1与Layer-Group n)的计算结果可以被存储在同一片存储空间BUF m,使得卷积神经网络处理器在执行包含多个输入层组合Layer-Group n+1的操作时,可以从同一片存储空间BUF m中读取Layer-Group n-1和Layer-Group n的计算结果,而不需要从两片存储空间进行数据读取,进而简化了卷积神经网络处理器的软件编程复杂度。
具体的,在存储空间映射表中记录BUF m被同时分配给Layer-Group n时,还需要将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值。
其中,size1为所述第一目标输入层组合Layer-Group n-1的存储空间历史记录的被使用一次时所需的存储空间大小,即Layer-Group n-1的计算结果需要占用的存储空间大小;size2为所述第一目标层组合Layer-Group n的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小,以便在划分每片存储空间时,可以根据计算得到的存储空间被重复使用时所需的空间大小进行划分,而每片存储空间内部的数据存储位置的划分则可以根据每次新增的空间大小size2进行划分。
例如,根据上述存储空间映射表中记录的存储空间的分配信息,可以得到BUF m的空间大小为size1+size2与size_max1之间的较大值;Layer-Group n-1被存放在地址空间可以为基地址+BUF m的相对地址偏移;Layer-Group n被存放在地址空间可以为基地址+BUF m的相对地址偏移+ size1。其中,基地址是指存储层组合的计算结果的起始硬件地址。BUF m的相对地址偏移是指BUF m的地址相对于基地址的地址偏移。
本申请实施例,通过在为每个所述第一目标层组合分配存储空间时,将该第一目标层组合对应的输出层组合包含的多个输入层组合的计算结果存储于同一片存储空间,使得卷积神经网络处理器在执行包含多个输入层组合的操作时,可以从同一片存储空间中读取数据,而不需要从多片存储空间中读取数据,进而简化了卷积神经网络处理器的软件编程复杂度。
在本申请的一些实施方式中,为了减少存储空间的内存碎片,如图5所示,上述步骤403中,将未被占用的存储空间分配给所述第一目标层组合,可以包括步骤501至步骤502。
步骤501,查找所述存储空间映射表,判断所述存储空间映射表是否为空。
步骤502,若所述存储空间映射表为空,则在所述存储空间映射表中记录相对地址偏移为0的存储空间已分配给所述第一目标层组合;并将所述相对地址偏移为0的存储空间被使用一次时所需的空间大小记录为size2,将所述相对地址偏移为0的存储空间被重复使用时所需的空间大小记录为size2与size_max2之间的较大值;其中,size_max2为所述相对地址偏移为0的存储空间历史记录的被重复使用时所需的存储空间大小。
本申请实施例中,为了减少存储空间的内存碎片,在为每个所述第一目标层组合以及每个所述第二目标层组合分别分配存储空间时,采用依序分配新的存储空间的方式进行分配。即,从相对地址偏移为0的第一片存储空间开始分配,然后,将相对地址偏移为0+A、的第二片存储空间分配给相应的层组合,接着,将相对地址偏移为0+A+B的第三片存储空间分配给相应的层组合,其中,A为第一片存储空间的大小,B为第二片存储空间的大小,依次类推,使得上一片存储空间与下一片存储空间之间不会存在被浪费的存储空间,进而减少存储空间的内存碎片。其中,相对地址偏移为0的存储空间是指相对于基地址的地址偏移为0的存储空间。
具体的,本申请通过判断所述存储空间映射表是否为空,来确定相对地址偏移为0的存储空间是否已被分配给某个或多个层组合,以便在存储空间映射表为空时,将相对地址偏移为0的存储空间分配给所述第一目标层组合。即,在所述存储空间映射表中记录相对地址偏移为0的存储空间已分配给所述第一目标层组合;并将所述相对地址偏移为0的存储空间被使用一次时所需的空间大小记录为size2,将所述相对地址偏移为0的存储空间被重复使用时所需的空间大小记录为size2与size_max2之间的较大值。
在本申请的一些实施方式中,如图5所示,在上述步骤501之后,还可以包括步骤503至步骤504。
步骤503,若所述存储空间映射表不为空,则根据所述存储空间映射表判断是否存在已释放的存储空间。
步骤504,若存在已释放的存储空间,则在存储空间映射表中记录所述已释放的存储空间已分配给所述第一目标层组合,并将所述已释放的存储空间被使用一次时所需的空间大小记录为size2,将所述已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值;所述size_max3为所述已释放的存储空间历史记录的被重复使用时所需的存储空间大小。
由于存储空间映射表不为空时,表示相对地址偏移为0的存储空间已被分配给某个或多个层组合,因此,需要寻找其他未被占用的存储空间,并将寻找到的未被占用的存储空间分配给所述第一目标层组合。
然而,为了进一步地减少存储空间的内存碎片,在寻找其他未被占用的存储空间时,可以先根据所述存储空间映射表判断是否存在已释放的存储空间,以便在存在已释放的存储空间时,将该已释放的存储空间分配给所述第一目标层组合。即,在存储空间映射表中记录所述已释放的存储空间已分配给所述第一目标层组合,并将所述已释放的存储空间被使用一次时所需的空间大小记录为size2,将所述已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值;所述size_max3为所述已释放的存储空间历史记录的被重复使用时所需的存储空间大小。
需要说明的是,由于已释放的存储空间历史记录的被重复使用时所需的存储空间大小有可能大于size2,也有可能小于size2,而当已释放的存储空间历史记录的被重复使用时所需的存储空间大小小于size2时,若未将所述已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值,则将使得在根据存储空间映射表记录的存储空间被重复使用时所需的存储空间大小划分每片内存的存储空间时,将导致该已释放的存储空间的空间大小无法满足释放前的层组合的空间需求,因此,在将该已释放的存储空间分配给所述第一目标层组合时,需要将已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值;并且,同理,上述步骤402中,在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合时,也需要将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值。
可选的,为了在将未被占用的存储空间分配给所述第一目标层组合时,可以实现将已释放的存储空间分配给所述第一目标层组合,如图6所示,在采用上述描述的各个存储空间的分配方法实现第一目标层组合存储空间的分配的过程中,在每完成一个所述第一目标层组合存储空间的分配之后,可以包括:步骤601至步骤602。
步骤601,判断所述第一目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第一目标输出层组合。
本申请实施例中,通过判断所述第一目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第一目标输出层组合,以确定第一目标层组合的输入层组合的计算结果是否已经被其输出层组合读取。
步骤602,若所述第一目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第一目标输出层组合,则在存储空间映射表中标记所述第一目标层组合的输入层组合占用的存储空间已释放。
当所述第一目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第一目标输出层组合时,表示第一目标层组合的输入层组合的计算结果已经被其输出层组合读取,并且第一目标层组合的输入层组合的输出层组合已经使用该计算结果完成计算,即,第一目标层组合的输入层组合占用的存储空间存储的数据已经被第一目标层组合的输入层组合的输出层组合使用,因此,可以将第一目标层组合的输入层组合占用的存储空间标记为已释放,以便该存储空间可以被重新分配给卷积神经网络的其他还未分配存储空间的层组合。 
在本申请的一些实施方式中,当所述第一目标层组合的输入层组合的输出层组合中存在未分配存储空间的第一目标输出层组合时,则表示第一目标层组合的输入层组合的计算结果还将被该第一目标输出层组合读取,因此,第一目标层组合的输入层组合占用的存储空间还不能被标记为已释放。
需要说明的是,在本申请的一些实施方式中,在确定存储空间映射表不为空时,还可以将相对地址偏移最小的未被占用的存储空间分配给所述第一目标层组合。同样的,在确定不存在已释放的存储空间时,也可以将相对地址偏移最小的未被占用的存储空间分配给所述第一目标层组合。
在上述描述的各个实施方式中,所述存储空间的分配方法还可以包括:遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第二目标层组合;所述第二目标层组合的输出层组合只包含一个输入层组合;相应的,所述存储空间的分配方法还可以包括:在为每个第二目标层组合分配存储空间时,将未被占用的存储空间分配给所述第二目标层组合。
具体的,为了减少存储空间的内存碎片,上述将未被占用的存储空间分配给所述第二目标层组合时,可以包括:查找所述存储空间映射表,判断所述存储空间映射表是否为空;若所述存储空间映射表为空,则在所述存储空间映射表中记录相对地址偏移为0的存储空间已分配给所述第二目标层组合;并将所述相对地址偏移为0的存储空间被使用一次时所需的空间大小记录为size3,将所述相对地址偏移为0的存储空间被重复使用时所需的空间大小记录为size3与size_max2之间的较大值;其中,size3为所述第二目标层组合的计算结果需要占用的存储空间大小。
同样的,在本申请的一些实施方式中,在为每个所述第二目标层组合分配存储空间时,若所述存储空间映射表不为空,还可以根据所述存储空间映射表判断是否存在已释放的存储空间;若存在已释放的存储空间,则在存储空间映射表中记录所述已释放的存储空间已分配给所述第二目标层组合,并将所述已释放的存储空间被使用一次时所需的空间大小记录为size3,将所述已释放的存储空间被重复使用时所需的空间大小记录为size3与size_max3之间的较大值;所述size_max3为所述已释放的存储空间历史记录的被重复使用时所需的存储空间大小。并且,在将所述未被占用的存储空间分配给所述第二目标层组合之后,还可以包括:判断所述第二目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第二目标输出层组合;若所述第二目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第二目标输出层组合,则在存储空间映射表中标记所述第二目标层组合的输入层组合占用的存储空间已释放。
本申请实施例中,在利用上述各实施方式描述的存储空间的分配方法为卷积神经网络的每个层组合进行内存分配之后,即可根据存储空间映射表中记录的各个存储空间被使用一次时所需的存储空间大小,以及被重复使用时所需的存储空间大小划分每片存储空间以及每片存储空间内部的数据存储位置。所述卷积神经网络处理器在执行包含多个输入层组合的操作时,可以从同一片存储空间中读取该操作对应的多个输入层组合的计算结果,简化卷积神经网络处理器的软件编程复杂度,并且当该操作为Concatenate时,还可以直接跳过该操作,从而提高了数据的存取效率,以及占用的存储空间。
例如,如图7所示,存储空间映射表中记录的2个存储空间分别为BUF 0、BUF 1,BUF 0被同时分配给如图1所示的Layer-Group n-1和Layer-Group n,并且存储空间映射表中记录的BUF 0被使用一次时所需的存储空间大小为size4+size5,存储空间映射表中记录的BUF 0被重复使用时所需的空间大小记录为size_max4;BUF 1被分配给如图1所示的Layer-Group n+1,并且存储空间映射表中记录的BUF 1被使用一次时所需的存储空间大小为size6,存储空间映射表中记录的BUF 1被重复使用时所需的空间大小记录为size_max5。因此,在根据存储空间映射表中记录的各个存储空间被使用一次时所需的存储空间大小,以及被重复使用时所需的存储空间大小划分每片存储空间以及每片存储空间内部的数据存储位置时,BUF 0被划分得到的空间大小为size_max4,BUF 1被划分得到的空间大小为size_max5,并且,Layer-Group n-1的计算结果的存储地址为基地址+BUF 0的相对地址偏移;Layer-Group n的计算结果的存储地址为基地址+BUF 0的相对地址偏移+ size4;Layer-Group n+1的计算结果的存储地址为基地址+BUF 1的相对地址偏移。卷积神经网络处理器在执行Layer-Group n+1第一层的操作Element-Wise时,只需要读取BUF 0中存储的数据即可,不需要分别从两片不同的存储空间读取数据,因而,可以简化卷积神经网络处理器的软件编程复杂度。
又例如,如图8所示,Layer-Group n+1的第一层的操作为Concatenate,并且,Concatenate的两个输入分别来自Layer-Group n-1和Layer-Group n,Concatenate的计算结果输出给Layer-Group n+2。若将Layer-Group n-1和Layer-Group n的计算结果分别存放到不同的存储空间,例如,如图9所示的存储空间BUF0和存储空间BUF1,则卷积神经网络的处理器在进行Concatenate计算时,需要分别读取BUF0和BUF1的数据,并将Layer-Group n+1的计算结果放到没有被占用的存储空间BUF2,同时,在执行Layer-Group n+2的操作时,需要从BUF2中读取Concatenate的计算结果,并将Concatenate的计算结果存储至已释放的存储空间BUF0。
而采用本申请实施例提供的存储空间的分配方法进行内存分配时,由于Layer-Group n-1和Layer-Group n的输出层组合为包含两个输入层组合的Layer-Group n+1,因此,在为Layer-Group n-1和Layer-Group n分配存储空间时,会为Layer-Group n-1和Layer-Group n分配同一片存储空间。例如,如图10所示,Layer-Group n-1和Layer-Group n的计算结果会被存储至同一片存储空间BUF0,卷积神经网络处理器在执行Layer-Group n+2的操作时,可以直接连续读取BUF0中的计算结果,并将BUF0中的计算结果存储至未被占用的存储空间BUF1,节省了Concatenate的搬运步骤以及占用的存储空间,提高了数据的存取效率,同时,简化了卷积神经网络处理器的软件编程复杂度。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为根据本发明,某些步骤可以采用其它顺序进行。
图11示出了本申请实施例提供的一种存储空间的分配装置1100的结构示意图,包括遍历单元1101和分配单元1102。
遍历单元1101,用于遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
分配单元1102,用于为每个所述第一目标层组合分配存储空间;
所述分配单元在为每个所述第一目标层组合分配存储空间时,还用于:
判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
在本申请的一些实施方式中,上述分配单元1102,还用于在所述判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合之后,若所述第一目标层组合的输出层组合包含的多个输入层组合中不存在已分配存储空间的第一目标输入层组合,则将未被占用的存储空间分配给所述第一目标层组合。
在本申请的一些实施方式中,上述分配单元1102,还用于查找所述存储空间映射表,判断所述存储空间映射表是否为空;若所述存储空间映射表为空,则在所述存储空间映射表中记录相对地址偏移为0的存储空间已分配给所述第一目标层组合;并将所述相对地址偏移为0的存储空间被使用一次时所需的空间大小记录为size2,将所述相对地址偏移为0的存储空间被重复使用时所需的空间大小记录为size2与size_max2之间的较大值;其中,size_max2为所述相对地址偏移为0的存储空间历史记录的被重复使用时所需的存储空间大小。
在本申请的一些实施方式中,上述分配单元1102,还用于在所述判断所述存储空间映射表是否为空之后,若所述存储空间映射表不为空,则根据所述存储空间映射表判断是否存在已释放的存储空间;若存在已释放的存储空间,则在存储空间映射表中记录所述已释放的存储空间已分配给所述第一目标层组合,并将所述已释放的存储空间被使用一次时所需的空间大小记录为size2,将所述已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值;所述size_max3为所述已释放的存储空间历史记录的被重复使用时所需的存储空间大小。
在本申请的一些实施方式中,上述分配单元1102,还用于在每完成一个所述第一目标层组合存储空间的分配之后,判断所述第一目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第一目标输出层组合;若所述第一目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第一目标输出层组合,则在存储空间映射表中标记所述第一目标层组合的输入层组合占用的存储空间已释放。
在本申请的一些实施方式中,上述分配单元1102,还用于将未被占用的存储空间分配给所述第二目标层组合。
在本申请的一些实施方式中,上述分配单元1102,还用于在所述将未被占用的存储空间分配给所述第二目标层组合之后,判断所述第二目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第二目标输出层组合;若所述第二目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第二目标输出层组合,则在存储空间映射表中标记所述第二目标层组合的输入层组合占用的存储空间已释放。
需要说明的是,为描述的方便和简洁,上述描述的存储空间的分配装置1100的具体工作过程,可以参考上述图1至图10中描述的方法的对应过程,在此不再赘述。
如图12所示,本申请提供一种用于实现上述存储空间的分配方法的终端,该终端12可以包括:处理器120、存储器121以及存储在所述存储器121中并可在所述处理器120上运行的计算机程序122,例如内存分配程序。所述处理器120执行所述计算机程序122时实现上述各个存储空间的分配方法实施例中的步骤,例如图3所示的步骤301至302。或者,所述处理器120执行所述计算机程序122时实现上各装置实施例中各模块/单元的功能,例如图11所示单元1101至1102的功能。
所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器121中,并由所述处理器120执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在所述终端中的执行过程。例如,所述计算机程序可以被分割成遍历单元和分配单元,各单元具体功能如下:
遍历单元,用于遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
分配单元,用于为每个所述第一目标层组合分配存储空间;
所述分配单元在为每个所述第一目标层组合分配存储空间时,还用于:
判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
所述终端可以是电脑、服务器等计算设备。所述终端可包括,但不仅限于,处理器120、存储器121。本领域技术人员可以理解,图12仅仅是终端的示例,并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端还可以包括输入输出设备、网络接入设备、总线等。
所称处理器120可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字内存分配器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现成可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器121可以是所述终端的内部存储单元,例如终端的硬盘或内存。所述存储器121也可以是所述终端的外部存储设备,例如所述终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器121还可以既包括所述终端的内部存储单元也包括外部存储设备。所述存储器121用于存储所述计算机程序以及所述终端所需的其他程序和数据。所述存储器121还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
在本申请所提供的实施例中,应该理解到,所揭露的装置/终端和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。

Claims (10)

  1. 一种存储空间的分配方法,其特征在于,包括:
    遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
    为每个所述第一目标层组合分配存储空间;
    其中,在为每个所述第一目标层组合分配存储空间时包括:
    判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
    若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
  2. 如权利要求1所述的存储空间的分配方法,其特征在于,在所述判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合之后,包括:
    若所述第一目标层组合的输出层组合包含的多个输入层组合中不存在已分配存储空间的第一目标输入层组合,则将未被占用的存储空间分配给所述第一目标层组合。
  3. 如权利要求2所述的存储空间的分配方法,其特征在于,所述将未被占用的存储空间分配给所述第一目标层组合,包括:
    查找所述存储空间映射表,判断所述存储空间映射表是否为空;
    若所述存储空间映射表为空,则在所述存储空间映射表中记录相对地址偏移为0的存储空间已分配给所述第一目标层组合;并将所述相对地址偏移为0的存储空间被使用一次时所需的空间大小记录为size2,将所述相对地址偏移为0的存储空间被重复使用时所需的空间大小记录为size2与size_max2之间的较大值;其中,size_max2为所述相对地址偏移为0的存储空间历史记录的被重复使用时所需的存储空间大小。
  4. 如权利要求3所述的存储空间的分配方法,其特征在于,在所述判断所述存储空间映射表是否为空之后,还包括:
    若所述存储空间映射表不为空,则根据所述存储空间映射表判断是否存在已释放的存储空间;
    若存在已释放的存储空间,则在存储空间映射表中记录所述已释放的存储空间已分配给所述第一目标层组合,并将所述已释放的存储空间被使用一次时所需的空间大小记录为size2,将所述已释放的存储空间被重复使用时所需的空间大小记录为size2与size_max3之间的较大值;所述size_max3为所述已释放的存储空间历史记录的被重复使用时所需的存储空间大小。
  5. 如权利要求1-4任意一项所述的存储空间的分配方法,其特征在于,在每完成一个所述第一目标层组合存储空间的分配之后,包括:
    判断所述第一目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第一目标输出层组合;
    若所述第一目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第一目标输出层组合,则在存储空间映射表中标记所述第一目标层组合的输入层组合占用的存储空间已释放。
  6. 如权利要求1所述的存储空间的分配方法,其特征在于,所述存储空间的分配方法还包括:遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第二目标层组合;所述第二目标层组合的输出层组合只包含一个输入层组合;
    在为每个所述第二目标层组合分配存储空间时,将未被占用的存储空间分配给所述第二目标层组合。
  7. 如权利要求6所述的存储空间的分配方法,其特征在于,在所述将未被占用的存储空间分配给所述第二目标层组合之后,包括:
    判断所述第二目标层组合的输入层组合的输出层组合中是否存在未分配存储空间的第二目标输出层组合;
    若所述第二目标层组合的输入层组合的输出层组合中不存在未分配存储空间的第二目标输出层组合,则在存储空间映射表中标记所述第二目标层组合的输入层组合占用的存储空间已释放。
  8. 一种存储空间的分配装置,其特征在于,包括:
    遍历单元,用于遍历卷积神经网络每个层组合的输入层组合和输出层组合,得到第一目标层组合;所述第一目标层组合的输出层组合包含多个输入层组合;
    分配单元,用于为每个所述第一目标层组合分配存储空间;
    所述分配单元在为每个所述第一目标层组合分配存储空间时,还用于:
    判断所述第一目标层组合的输出层组合包含的多个输入层组合中是否存在已分配存储空间的第一目标输入层组合;
    若所述第一目标层组合的输出层组合包含的多个输入层组合中存在已分配存储空间的第一目标输入层组合,则在存储空间映射表中记录所述第一目标输入层组合的存储空间被同时分配给所述第一目标层组合,并将所述第一目标输入层组合的存储空间被使用一次时所需的空间大小记录为size1+size2,将所述第一目标输入层组合的存储空间被重复使用时所需的空间大小记录为size1+size2与size_max1之间的较大值;其中,size1为所述第一目标输入层组合的存储空间历史记录的被使用一次时所需的存储空间大小,size2为所述第一目标层组合的计算结果需要占用的存储空间大小,size_max1为所述第一目标输入层组合的存储空间历史记录的被重复使用时所需的存储空间大小。
  9. 一种终端,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任意一项所述方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任意一项所述方法的步骤。
PCT/CN2021/088444 2020-05-09 2021-04-20 存储空间的分配方法、装置、终端及计算机可读存储介质 WO2021227789A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010390297.2 2020-05-09
CN202010390297.2A CN111666150B (zh) 2020-05-09 2020-05-09 存储空间的分配方法、装置、终端及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021227789A1 true WO2021227789A1 (zh) 2021-11-18

Family

ID=72383508

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088444 WO2021227789A1 (zh) 2020-05-09 2021-04-20 存储空间的分配方法、装置、终端及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111666150B (zh)
WO (1) WO2021227789A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666150B (zh) * 2020-05-09 2022-01-11 深圳云天励飞技术股份有限公司 存储空间的分配方法、装置、终端及计算机可读存储介质
CN112256440B (zh) * 2020-12-23 2021-03-09 上海齐感电子信息科技有限公司 神经网络推理的内存管理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备
WO2018103472A1 (zh) * 2016-12-09 2018-06-14 杭州海康威视数字技术股份有限公司 一种应用于深度学习网络的缓存优化方法及装置
CN110597616A (zh) * 2018-06-13 2019-12-20 华为技术有限公司 一种神经网络的内存分配方法及装置
CN110866589A (zh) * 2018-08-10 2020-03-06 高德软件有限公司 深度神经网络模型的运行方法、装置及框架
CN111666150A (zh) * 2020-05-09 2020-09-15 深圳云天励飞技术有限公司 存储空间的分配方法、装置、终端及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2880961B2 (ja) * 1996-08-16 1999-04-12 日本電気アイシーマイコンシステム株式会社 データバッファリング装置及びその制御方法
US8826438B2 (en) * 2010-01-19 2014-09-02 Damballa, Inc. Method and system for network-based detecting of malware from behavioral clustering
CN106919918B (zh) * 2017-02-27 2022-11-29 腾讯科技(上海)有限公司 一种人脸跟踪方法和装置
CN110245748B (zh) * 2018-03-09 2021-07-13 赛灵思电子科技(北京)有限公司 卷积神经网络实现方法、装置、硬件加速器、存储介质
CN109886390B (zh) * 2019-01-10 2023-11-24 平安科技(深圳)有限公司 卷积神经网络模型优化方法、装置、计算机设备及存储介质
CN109976903B (zh) * 2019-02-22 2021-06-29 华中科技大学 一种基于层宽内存分配的深度学习异构计算方法和系统
CN110750351B (zh) * 2019-12-20 2020-12-22 安徽寒武纪信息科技有限公司 多核任务调度器、多核任务调度方法、装置及相关产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018103472A1 (zh) * 2016-12-09 2018-06-14 杭州海康威视数字技术股份有限公司 一种应用于深度学习网络的缓存优化方法及装置
CN106874219A (zh) * 2016-12-23 2017-06-20 深圳云天励飞技术有限公司 一种卷积神经网络的数据调度方法、系统及计算机设备
CN110597616A (zh) * 2018-06-13 2019-12-20 华为技术有限公司 一种神经网络的内存分配方法及装置
CN110866589A (zh) * 2018-08-10 2020-03-06 高德软件有限公司 深度神经网络模型的运行方法、装置及框架
CN111666150A (zh) * 2020-05-09 2020-09-15 深圳云天励飞技术有限公司 存储空间的分配方法、装置、终端及计算机可读存储介质

Also Published As

Publication number Publication date
CN111666150B (zh) 2022-01-11
CN111666150A (zh) 2020-09-15

Similar Documents

Publication Publication Date Title
CN110149803B (zh) 数据存储方法、系统及终端设备
CN102362464B (zh) 内存访问监测方法和装置
CN111767143A (zh) 交易数据处理方法、装置、设备及系统
CN103970520A (zh) MapReduce架构中的资源管理方法、装置和架构系统
WO2021227789A1 (zh) 存储空间的分配方法、装置、终端及计算机可读存储介质
CN112287182A (zh) 图数据存储、处理方法、装置及计算机存储介质
WO2020119307A1 (zh) 一种基于dsp的任务调度方法及装置
CN109033365B (zh) 一种数据处理方法及相关设备
CN113326005A (zh) 一种raid存储系统的读写方法和装置
CN114385089B (zh) 一种基于交叉编址的动态bank存储方法、装置及电子设备
CN109213423A (zh) 基于地址屏障无锁处理并发io命令
CN110968538B (zh) 一种数据缓冲方法和装置
CN108170380B (zh) 一种固态硬盘提升顺序读性能的方法及固态硬盘
CN115794417A (zh) 内存管理方法及装置
CN114356796A (zh) 快闪存储器卡、快闪存储器卡的预分配方法及系统
CN111104435B (zh) 一种元数据组织方法、装置、设备及计算机可读存储介质
CN114610231A (zh) 大位宽数据总线分段存储的控制方法、系统、设备及介质
CN110865901B (zh) 组建ec条带的方法和装置
CN208569620U (zh) 具有nid池的存储设备
CN115712581A (zh) 数据访问方法、存储系统及存储节点
CN111008195A (zh) 一种数据库空闲空间管理方法、系统、终端及存储介质
CN117785758B (zh) Cxl模组、控制器、任务处理方法、介质和系统
WO2020000480A1 (zh) 数据存储方法及数据存储装置
CN115599705B (zh) 用于管理存储空间的装置及方法、计算设备、芯片
CN116991595B (zh) 一种基于Bitmap的内存分配方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21805262

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21805262

Country of ref document: EP

Kind code of ref document: A1