CN114153753A

CN114153753A - Storage resource allocation method and device and non-transitory storage medium

Info

Publication number: CN114153753A
Application number: CN202111481283.2A
Authority: CN
Inventors: 袁庆; 陈庆; 华芮; 潘于
Original assignee: Haiguang Information Technology Co Ltd
Current assignee: Haiguang Information Technology Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-08

Abstract

A storage resource allocation method, a device and a non-transitory storage medium are used for allocating a shared memory in a computing unit, wherein the shared memory comprises N memory segments, the N memory segments are sequentially arranged according to segment numbers, and N is a positive integer greater than 1. The resource allocation method comprises the following steps: receiving an allocation request occupying M continuous storage segments; and responding to the N storage segments comprising a first number of available storage segment groups, and determining one available storage segment group which is closest to the boundary at two ends of the N storage segments in the available storage segment groups and is used for responding to the allocation request, wherein the available storage segment groups each comprise M continuous storage segments in an idle state so as to meet the allocation request, and M is a positive integer and is less than N. The method can balance the work of the shared memory, prolong the service life and optimize the fragmentation problem of the storage resources.

Description

Storage resource allocation method and device and non-transitory storage medium

Technical Field

Embodiments of the present disclosure relate to a storage resource allocation method, apparatus, and non-transitory storage medium.

Background

In a currently designed parallel processor (general purpose graphics processing Unit (GPGPU)), a Workgroup (Workgroup) is assigned to a Computing Unit (CU) for processing. Each compute Unit includes a plurality of processing units (PEs), a shared Memory Unit (PMU), a register file, and a work item set scheduling module. Each processing unit includes an Arithmetic Logic Unit (ALU), a floating point calculation unit, and the like. For example, the work group may be divided into a plurality of work item sets corresponding to one sub-task, and these work item sets are scheduled and distributed by the work item set scheduling module in the computing unit, and data sharing is performed through the shared memory. Because a plurality of working groups can be processed on the same computing unit in parallel, the delay of reading a cache (cache) is effectively hidden through the cooperative work of a plurality of working item sets in a shared memory, and the working efficiency of the parallel computing unit is improved. The storage space in the shared memory may be divided into a plurality of independent memory segments. The storage resources need to be dynamically allocated according to the storage segment required to be used by each working group, so that the working groups can work independently.

Disclosure of Invention

At least some embodiments of the present disclosure provide a storage resource allocation method for allocating a shared memory in a computing unit, where the shared memory includes N memory segments, the N memory segments are sequentially arranged according to a segment number, N is a positive integer greater than 1, and the resource allocation method includes: receiving an allocation request occupying M continuous storage segments; in response to the N memory segments comprising a first number of available memory segment groups, determining one of the available memory segment groups closest to boundaries at both ends of the N memory segments for responding to the allocation request, wherein the available memory segment groups each comprise M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N.

For example, some embodiments of the present disclosure provide a storage resource allocation method further including: determining whether the N memory segments comprise the first number of available memory segment groups.

For example, in a resource allocation method provided in some embodiments of the present disclosure, determining whether the N memory segments include the available memory segment group includes: acquiring storage state data for the shared memory, wherein the storage state data has N bits, and the N bits of the storage state data are used for recording whether the N storage segments are idle or occupied in a one-to-one correspondence manner; determining whether the N memory segments comprise the set of available memory segments using the memory status data.

For example, in some embodiments of the present disclosure, a storage resource allocation method for determining whether the N storage segments include the available storage segment group using the storage status data includes: in the storage status data, it is judged bit by bit whether there is an available memory segment group having a currently judged current bit as a start bit in an ascending order or a descending order, and in response to the presence of an available memory segment group having the current bit as a start bit, a record is made of the available memory segment group corresponding to the current bit.

For example, in some embodiments of the present disclosure, a storage resource allocation method, where each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and determining, bit by bit, whether there is an available storage segment group with the current bit as a start bit includes: acquiring mask data, wherein the mask data comprises N bits, the N bits of the mask data correspond to the N bits of the storage state data one by one, the mask data comprises mask segments which correspond to continuous M bits with the current bit as a starting bit and have values of 1, inverting the mask data, performing bitwise OR operation on the mask data and the storage state data, performing bitwise AND operation on N bit results obtained by the bitwise OR operation, determining that an available storage segment group with the current bit as the starting bit exists in response to the result of the AND operation being 1, and determining that an available storage segment group with the current bit as the starting bit does not exist in response to the result of the AND operation being 0.

For example, in some embodiments of the present disclosure, a storage resource allocation method, where each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and determining, bit by bit, whether there is an available storage segment group with the current bit as a start bit includes: performing bitwise AND operation on continuous M bits in the storage state data, the M bits taking the current bit as a start bit, determining that an available memory segment group taking the current bit as the start bit exists in response to a result of the AND operation being 1, and determining that an available memory segment group taking the current bit as the start bit does not exist in response to a result of the AND operation being 0.

For example, some embodiments of the present disclosure provide a storage resource allocation method further including: after judging whether an available storage segment group with the current bit as a start bit exists bit by bit, obtaining selectable position data by recording that an available storage segment corresponding to the current bit exists, wherein the selectable position data comprises N bits, and the N bits of the selectable position data are used for recording whether the available storage segments exist in the ascending order or the descending order from the N storage segments in a one-to-one correspondence mode.

For example, in some embodiments of the present disclosure, in a storage resource allocation method, determining, in the available storage segment groups, one available storage segment group closest to boundaries at two ends of the N storage segments includes: and for the N bits of the optional position data, determining an available storage segment group with the nearest boundary distance at two ends of the N storage segments by adopting a dichotomy.

For example, some embodiments of the present disclosure provide a storage resource allocation method further including: the determined starting address of one of the available memory segment groups and the length M of the available memory segment group are output.

For example, some embodiments of the present disclosure provide a storage resource allocation method further including: in response to the N memory segments not including the set of available memory segments, continuing to monitor the N memory segments until the N memory segments include the set of available memory segments.

For example, some embodiments of the present disclosure provide a storage resource allocation method further including: before or at the same time of receiving allocation requests occupying M continuous storage segments, receiving a release request for at least one currently occupied storage segment in the N storage segments, and processing the release request before processing the allocation requests.

At least some embodiments of the present disclosure further provide a storage resource allocation apparatus, configured to allocate a shared memory in a computing unit, where the shared memory includes N storage segments, the N storage segments are sequentially arranged according to a segment number, N is a positive integer greater than 1, and the resource allocation apparatus includes: an allocation request receiving module configured to receive an allocation request occupying M consecutive memory segments; an available memory segment determining module configured to determine, in response to the N memory segments including a first number of available memory segment groups, one of the available memory segment groups closest to boundaries at both ends of the N memory segments for responding to the allocation request, wherein the available memory segment groups each include M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N.

For example, in some embodiments of the present disclosure, the available memory segment determination module is further configured to determine whether the N memory segments include the available memory segment group.

For example, in a storage resource allocation apparatus provided in some embodiments of the present disclosure, the available storage segment determining module includes: a storage status data obtaining sub-module configured to obtain storage status data for the shared memory, wherein the storage status data has N bits, and the N bits of the storage status data are used for recording whether the N storage segments are idle or occupied in a one-to-one correspondence manner; wherein the available memory segment determination module is configured to determine whether the N memory segments comprise the set of available memory segments using the memory status data.

For example, in the storage resource allocation apparatus provided in some embodiments of the present disclosure, the available storage segment determining module further includes: an available memory segment group presence judgment sub-module configured to judge bit by bit whether an available memory segment group having a currently judged current bit as a start bit exists in the storage status data in an ascending order or a descending order, and configured to record that there is an available memory segment corresponding to the current bit in response to the existence of an available memory segment group having the current bit as a start bit.

For example, in a storage resource allocation apparatus provided in some embodiments of the present disclosure, each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and the available storage segment group existence determining unit includes: a mask data acquisition unit configured to acquire mask data, wherein the mask data includes N bits, the N bits of the mask data correspond one-to-one to the N bits of the storage status data, and the mask data includes mask segments each having a value of 1 and consecutive M bits from the current bit as a start bit; a mask data operation unit configured to perform a bitwise OR operation with the storage status data after inverting the mask data and then perform a bitwise AND operation on N-bit results obtained by the bitwise OR operation, and configured to determine that an available memory segment group having the current bit as a start bit exists in response to a result of the AND operation being 1, and determine that an available memory segment group having the current bit as a start bit does not exist in response to a result of the AND operation being 0.

For example, in the storage resource allocation apparatus provided in some embodiments of the present disclosure, each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and the available storage segment group presence determining submodule includes: a storage status data operating unit configured to perform a bitwise and operation on consecutive M bits of the storage status data, the M bits having the current bit as a start bit; an operation result judgment unit configured to determine that there is an available memory segment group having the current bit as a start bit in response to a result of the AND operation being 1, and determine that there is no available memory segment group having the current bit as a start bit in response to a result of the AND operation being 0.

For example, in the storage resource allocation apparatus provided in some embodiments of the present disclosure, the available storage segment group existence judgment sub-module further includes: an optional position data determination unit configured to obtain optional position data by recording whether there is an available memory segment corresponding to the current bit after determining whether there is an available memory segment group having the current bit as a start bit by bit, wherein the optional position data includes N bits for recording whether there is an available memory segment in the ascending order or the descending order from the N memory segments in a one-to-one correspondence.

For example, in the storage resource allocation apparatus provided in some embodiments of the present disclosure, the available storage segment determining module further includes: and the dichotomy determination submodule is configured to determine, by adopting a dichotomy method, one available storage segment group with the nearest boundary distance between two ends of the N storage segments for the N bits of the selectable position data.

For example, some embodiments of the present disclosure provide a storage resource allocation apparatus further including: an output module configured to output the determined start address of the one available memory segment group and the length M of the available memory segment group.

For example, some embodiments of the present disclosure provide a storage resource allocation apparatus further including: the available memory segment determination module is further configured to continue monitoring the N memory segments until the N memory segments include the available memory segment set in response to the N memory segments not including the available memory segment set.

For example, some embodiments of the present disclosure provide a storage resource allocation apparatus further including: a release request processing module configured to receive a release request for at least one currently occupied memory segment of the N memory segments before or while receiving an allocation request occupying M consecutive memory segments, and process the release request before processing the allocation request.

At least some embodiments of the present disclosure also provide a storage resource allocation apparatus, including: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer readable instructions, wherein the computer readable instructions, when executed by the processor, perform the storage resource allocation method provided by any embodiment of the present disclosure.

At least some embodiments of the present disclosure also provide a non-transitory storage medium that non-transitory stores computer-readable instructions, wherein the computer-readable instructions, when executed by a computer, perform the storage resource allocation method provided by any embodiment of the present disclosure.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.

FIG. 1 is a schematic diagram of a memory segment mask for managing memory resources of a shared memory in a compute unit of a parallel processor;

FIG. 2 is a schematic diagram of the memory segment mask of FIG. 1 after memory segments numbered 2-4 are allocated;

FIG. 3 is a schematic diagram of the memory segment mask of FIG. 2 after the memory segments with segment numbers of 0-1 are released;

FIG. 4 is a flowchart of a storage resource allocation method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a storage resource allocation apparatus according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating an example of a storage resource allocation method according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of exemplary mask data provided by an embodiment of the present disclosure;

FIG. 8A is a schematic diagram of a mask data array according to an embodiment of the present disclosure;

FIG. 8B is a schematic diagram of alternative location data provided by an embodiment of the present disclosure;

FIG. 9 is a flowchart of a method for determining whether N memory segments comprise a set of available memory segments according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a circuit block for determining an available memory segment group with a nearest boundary distance between two ends of N memory segments by using a bisection method according to an embodiment of the present disclosure;

FIG. 11A is a diagram illustrating a dichotomy method for determining the most significant bit-distant one of N storage segments that is available for use in the present disclosure;

FIG. 11B is a diagram illustrating a dichotomy method for determining the set of available memory segments closest to the lowest bit in the N memory segments according to one embodiment of the present disclosure;

fig. 12 is a diagram illustrating a simulation of changes to a remaining working group in a working group queue allocated by a storage resource allocation method according to an embodiment of the present disclosure;

fig. 13 is a diagram illustrating a variation simulation of the amount of remaining storage resources allocated by the storage resource allocation method according to an embodiment of the present disclosure;

FIG. 14 is a time difference graph illustrating the storage resource allocation method and the prior art allocation method for performing the same task according to an embodiment of the present disclosure;

fig. 15 is a schematic block diagram of a storage resource allocation apparatus according to an embodiment of the present disclosure;

FIG. 16 is a schematic block diagram of another storage resource allocation apparatus provided in an embodiment of the present disclosure; and

fig. 17 is a schematic diagram of a non-transitory storage medium according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.

Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

The present disclosure is illustrated by the following specific examples. To maintain the following description of the embodiments of the present disclosure clear and concise, a detailed description of known functions and known components have been omitted from the present disclosure. When any component of an embodiment of the present disclosure appears in more than one drawing, that component is represented by the same or similar reference numeral in each drawing.

Currently designed parallel processors, such as General Purpose Graphics Processors (GPGPUs), include multiple compute units, each including multiple processing units, shared memory, etc., and each including arithmetic logic units, floating point compute units, etc. Computing tasks are typically performed by a plurality of work items (work items). The work items are divided into a plurality of work groups (workgroups) in a workgroup scheduling module before being executed in the general-purpose graphics processor, and then the plurality of workgroups are distributed to respective Computing Units (CUs) via a workgroup distribution module. All work items in a workgroup must be assigned to the same computing unit for execution. Meanwhile, a workgroup may be split into several work item sets (subgroups), each containing a fixed number (or less) of work items, e.g., 32 work items. Multiple workgroups may be executed in the same computing unit.

When a work group is allocated to a computing unit for processing, determining allocable storage resources in current storage resources according to the size of the storage resources required to be occupied by the current work group based on the occupation condition of the current storage resources in a shared memory in the computing unit, wherein a plurality of work item sets in the work group can be executed simultaneously or in a time-sharing manner. And when the working group completes the calculation in the allocable storage resources, releasing the storage resources occupied by the working group.

For example, the storage resources of a shared memory of a computation unit for a parallel processor are configured as sequentially consecutive N memory segments (slots), the N memory segments being sequentially arranged by segment number, N being a positive integer greater than 1. For example, the memory space of the shared memory is 64KB in total, and is divided into 128 segments, and each segment represents 0.5KB of memory space, that is, the allocated minimum granularity is 0.5 KB. For convenience of characterization, the storage resource of the shared memory is virtualized to be a slot mask (slot _ mask), and the storage resource in the shared memory is managed through the slot mask.

For example, as shown in fig. 1, a memory segment mask (slot _ mask) (also referred to as memory state data of a shared memory) is a one-dimensional array, 128 bits (bit) are shared, 128 memory segments are respectively associated with the memory segment mask, and the corresponding segment numbers (slot _ id) are 0 to 127. If a segment with a segment number is occupied, the bit corresponding to the segment is 0 in the segment mask, which indicates that the segment has been allocated and there is a working group running, and the working group allocation cannot be performed on the segment before the segment is released. Correspondingly, if the memory segment of a segment number is not occupied, the bit corresponding to the memory segment is 1 in the memory segment mask, which indicates that the memory segment is in an idle state, and the work group allocation can be performed.

For example, FIG. 1 also shows the case where the bucket mask is occupied by 0-n-1 work groups (wg 0-wg (n-1), etc.). For example, as shown in FIG. 1, memory segments 0-1 (slot _ mask [0:1]) are occupied by working group wg0, and a segment of memory segments (slot _ mask [6: m ], m >6) starting from memory segment 6 is occupied by working group wg (n-1); the memory segments 2-5 (slot _ mask [2:5]) are in idle state at this time, and can be assigned with work groups.

At present, when a certain working group makes a storage resource request for p (p is greater than or equal to 1) storage segments, corresponding storage resources need to be allocated to the working group, and then, according to the size of the storage resource requirement of the working group, in a manner from a low address segment to a high address segment, p storage segments which meet sequential continuity are searched in a storage segment mask to find p continuous storage segments which meet the storage resource requirement of the working group.

For example, with respect to the state shown in fig. 1, when a work group of a storage space corresponding to a storage resource requirement of 3 storage segments (size ═ 3) needs to be allocated, consecutive storage segments satisfying the storage resource requirement of 3 are sequentially searched from 0 to 127 bits in a storage segment mask, a storage segment (slot _ mask [2:4]) with segment numbers of 2 to 4 is determined to satisfy the requirement, and a storage segment (slot _ mask [2:4]) with segment numbers of 2 to 4 is allocated to the work group, so that a bit (bit) corresponding to the slot _ mask [2:4] is changed from a free state 1 to an occupied state 0, which indicates that the storage segments have been allocated, as shown in fig. 2. The whole process above is defined as allocating storage resources (allocate).

For example, with respect to the state shown in fig. 2, if a certain working group (e.g., working group wg0) completes the calculation and needs to reclaim various resources for operating the working group (including reclaiming the storage segment for operating the working group), by calibrating the starting address of the storage segment (e.g., slot _ mask [0:1]) occupied by the working group and the size of the storage resource required by the working group (e.g., the starting position of the storage segment occupied by the working group is wg _ alloc _ base ═ 0, and the size wg _ alloc _ size ═ 2) of the storage resource occupied by the working group, 2 storage segments for operating the working group are found and released (e.g., the storage segment with segment number of 0-1 is released). For example, the bits corresponding to slot _ mask [0:1] change from occupied state 0 to idle state 1, indicating that the corresponding storage resource has been reclaimed, as shown in FIG. 3. The whole process above is defined as releasing the storage resource (deallocate).

The mode of continuously allocating the storage resources has the characteristics of simple logic and easy realization, and can be well suitable for working under the condition that the working groups are relatively stable. However, a work group with a large demand on storage resources and a work group with a relatively small demand on storage resources are randomly allocated to a single computing unit to run, and meanwhile, a work group with an indefinite demand on storage resources is completed by the computing unit to release the storage resources occupied in the shared memory, and after the work group runs for a long time, the storage resources on the computing unit form a plurality of small number of continuous or single storage segments, forming tiny fragments (for example, the storage segments corresponding to slot _ mask [5] in fig. 2). Once such fragmentation is excessive, subsequent workgroups with relatively large demands on storage resources cannot be allocated to run on the compute unit. Moreover, fragmentation can become more severe as more and more work groups are run on the same computing unit. Therefore, the shared memory resources of the computing units cannot be effectively utilized in time, which results in low resource utilization efficiency and waste of hardware resources. Meanwhile, because the allocation and release of the storage resources of a plurality of working groups are dynamically changed, the storage resources are released after the calculation of one working group is finished, and the released storage resources may be occupied by the next working group. The shared memory does not support the allocation method of the first unconnected address domains (i.e. wrap), and can only allocate according to the sequence from the low address to the high address, which results in that the storage resource of the low address is always occupied preferentially, the probability of the storage resource of the low address being occupied is higher, the probability of the storage resource of the high address being occupied is lower, and finally the working fatigue of the shared memory is unbalanced, and the overall working life is reduced.

At least some embodiments of the present disclosure provide a storage resource allocation method for allocating a shared memory in a computing unit, where the shared memory includes N storage segments, the N storage segments are sequentially arranged according to a segment number, and N is a positive integer greater than 1. The resource allocation method comprises the following steps: receiving an allocation request occupying M continuous storage segments; in response to the N memory segments comprising a first number of available memory segment groups, determining one of the available memory segment groups closest to boundaries at both ends of the N memory segments for responding to the allocation request, wherein the available memory segment groups each comprise M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N.

Some embodiments of the present disclosure further provide a storage resource allocation apparatus corresponding to the above storage resource allocation method, where the resource allocation apparatus includes: an allocation request receiving module configured to receive an allocation request occupying M consecutive memory segments; an available memory segment determination module configured to determine, in response to the N memory segments including a first number of available memory segment groups, one of the available memory segment groups that is closest to boundaries at both ends of the N memory segments for responding to the allocation request.

Some embodiments of the present disclosure also provide a non-transitory storage medium corresponding to the above storage resource allocation method, the storage medium non-transitory storing computer readable instructions, wherein when the computer readable instructions are executed by a computer, the storage resource allocation method provided by the above embodiments of the present disclosure is performed.

According to the storage resource allocation method provided by the above embodiment of the present disclosure, according to the allocation request occupying M consecutive storage segments, one available storage segment group closest to the boundary distance between two ends of N storage segments is allocated, so that the high address storage segments and the low address storage segments at two ends of N storage segments have the same allocation priority, the low address storage segments of N storage segments are prevented from being allocated in a centralized manner, the working fatigue of the shared memory is balanced, and the service life is prolonged. Meanwhile, unoccupied storage resources are effectively gathered to the middle parts of the N storage sections, and the problem of fragmentation of the storage resources is optimized.

Some embodiments of the present disclosure and examples thereof are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Fig. 4 is a flowchart of a storage resource allocation method according to some embodiments of the present disclosure. Fig. 5 is a schematic structural diagram of a storage resource allocation apparatus according to some embodiments of the present disclosure.

For example, as shown in fig. 4, the storage resource allocation method includes the following steps S100 to S400, and a specific flow of the storage resource allocation method S100 to S500 applied to the allocation apparatus is described in conjunction with the storage resource allocation apparatus shown in fig. 5.

Step S100: an allocation request occupying M consecutive memory segments is received.

The memory resource allocation arrangement 100 shown in fig. 5 is for example implemented by digital circuitry, connected to or being part of a computing unit (not shown in fig. 5), for controlling the allocation of shared memory in the computing unit. After allocating one or more work groups to be processed to the computing unit, the computing unit generates an allocation request for the storage space of the shared memory corresponding to the current work group, and sends the allocation request to a cache module (allocation _ fifo) in the storage resource allocation apparatus 100. The buffer module serves as an allocation request receiving module 110 for receiving allocation requests, and for example, may include a storage queue (e.g., a first-in first-out (FIFO) queue) for storing a plurality of allocation requests, where each allocation request may wait until there is enough storage resources in the shared memory to be satisfied before being distributed and processed.

The allocation request comprises that the size of the storage resource occupied by the current working group is continuous M storage segments. It should be noted that the size of the storage space occupied by each storage segment may be configured according to the size of the storage space of the shared memory and/or the size of the storage resource required by the working group that can be processed by the computing unit.

Step S200: in response to the N memory segments comprising the available memory segment groups, determining one of the available memory segment groups that is closest to the boundaries at the two ends of the N memory segments for responding to the allocation request.

Here, the set of available memory segments each includes M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N, i.e., the set of memory segments includes one or more memory segments.

For example, an allocation request is responded to when it is determined that there is a first number of available memory segment groups (i.e., M memory segments that are contiguous and in an idle state) in the shared memory that satisfy the allocation request. The first number is, for example, one or more, that is, in different cases, when there are available memory segment groups in the shared memory, the number of available memory segment groups is one or more, and then one available memory segment group for responding to the allocation is further selected among the one or more available memory segment groups. The available storage segment determining module 120 finds an available storage segment group satisfying the allocation request from among the N storage segments in the shared memory, determines an available storage segment group closest to both end boundaries of the N storage segments (i.e., both end boundaries of the shared memory) from among the one or more available storage segment groups, and outputs the determined available storage segment group through the output module 130.

For example, in the alternative one or more available memory segment sets, the start bit of the available memory segment set having the lowest start bit has a first distance to the low end boundary of the shared memory (e.g., slot _ id is 0 in fig. 1), and the end bit of the available memory segment set having the highest end bit has a second distance to the high end boundary of the shared memory (e.g., slot _ id is 127 in fig. 1), the available memory segment set having the lowest start bit is selected if the first distance is less than the second distance, and the available memory segment set having the highest end bit is selected if the first distance is greater than the second distance.

Step S300: the determined starting address of one of the available memory segment groups and the length M of the available memory segment group are output.

For example, FIG. 5 illustrates the output module 130 outputting the determined starting address of one of the available memory segments and the length M of the group of available memory segments. The length M of the available memory segment group is equal to the size of the memory resource occupied by the current working group in the allocation request, namely M continuous memory segments. The output module 130 feeds back the output result to the computing unit, and the computing unit sends the current workgroup to the determined available memory segment according to the output result for computing.

For example, in some embodiments, as shown in fig. 4, the storage resource allocation method provided by the embodiments of the present disclosure may further include step S400.

Step S400: in response to the N memory segments not including the available memory segment set, the N memory segments are continuously monitored until the N memory segments include the available memory segment set.

For example, when there is no available set of memory segments in the shared memory that satisfies the allocation request, the N memory segments in the shared memory may be continuously monitored, i.e., attempts may be continuously made to determine whether there is an available set of memory segments that satisfies the allocation request (described in detail below) until it is determined that there is an available set of memory segments that satisfies the allocation request.

For example, in some embodiments, the storage resource allocation method provided by the embodiments of the present disclosure further includes step S500.

Step S500: before or at the same time of receiving allocation requests occupying M continuous storage segments, receiving a release request for at least one currently occupied storage segment in the N storage segments, and processing the release request before processing the allocation requests.

In the case where the memory resource allocation means shown in fig. 5 is implemented by, for example, a digital circuit, the allocation operation and the release operation for the memory resource may be performed in parallel independently of each other, and both may be performed simultaneously. For example, if the release request processing module 150 receives the release request before the allocation request receiving module 110 receives the allocation request, the release request and the allocation request are processed in sequence, respectively. If both an allocation request and a release request are received, the release request may be processed first. And after the release request processing is finished, determining whether the work group can be issued to the released shared memory in the idle state for calculation according to the allocation request. This can improve the hit rate of allocation requests, and thus can more effectively improve the efficiency of allocation of the entire storage resource.

For example, the procedure of the release request processing is as follows: before or simultaneously with receiving allocation requests occupying M consecutive memory segments, the release request processing module 150 receives a memory resource release operation for triggering the release of memory resources in response to the release request. The release request comprises the starting address of the memory segment to be released and the number of the memory segments (for example, P memory segments are continuous, P is a positive integer), the starting address of the memory segment to be released and the number of the memory segments are calibrated according to the starting address of the memory segment to be released, and the release of the storage space of the calibrated memory segment to be released is completed.

An exemplary flowchart for determining whether there is an available storage segment group in the storage resource allocation method according to the embodiment of the present disclosure is further described below with reference to fig. 5 and fig. 6.

In the process of allocating storage resources, as shown in fig. 6, for example, the allocation request may be managed by a storage resource allocation apparatus (or software thread) as shown in fig. 5. First, it is determined whether there is an allocation request, and if there is an allocation request, it is looked for whether there is an available memory segment group among N (e.g., N-128) memory segments provided by the shared memory, and if there is no allocation request, it returns to continue waiting for an allocation request.

At least one example of the process of determining whether there is a group of available memory segments when there is an allocation request is as follows:

1) the mask data (allocate _ mask) of the allocated memory segment is acquired by the mask data acquisition unit (size _ mask) shown in fig. 5 according to the size of the memory resource (e.g., M memory segments) that needs to be allocated in the allocation request.

For example, in the case where the shared memory has 128 memory segments (refer to fig. 1 to 3), as shown in fig. 7, one mask data is obtained, which is a one-dimensional array including 128 bits, where M memory segments corresponding to the size of the requested memory resource (request _ size ═ M) are 1 and the rest are 0. In fig. 7, the mask data corresponds to the case where slot _ id is 0, that is, the 0 th to M-1 th bits have a value of 1, and the M th to 127 th bits have a value of 0.

2) According to the size of the memory resource (for example, M memory segments) to be allocated, the M memory segments are sequentially calibrated on the N memory segments in ascending or descending order by a shift calibration operation (shift _ array) in the mask data operation unit.

For example, as shown in fig. 8A, in the case of the shared memory having 128 memory segments (refer to fig. 1 to 3), the first mask data is obtained from M memory segments with segment numbers 0 to M-1, the second mask data is obtained from M memory segments with segment numbers 1 to M, the third mask data is obtained from M memory segments with segment numbers 2 to M +1, and so on, and finally the 128 th mask data is obtained from segment numbers 128-M to 127, and the 128 mask data (arrays) are combined together to generate a 128 × 128 two-dimensional matrix, i.e., a mask data array (allocate _ array). The mask data array may be pre-stored in the memory device after being generated, or may be generated dynamically throughout the process by the mask data without pre-storing the entire mask data array in advance.

Each row in the mask data array corresponds to one allocation case, so there are 128 different allocation cases in total, which are used to adapt to each bit of the storage status data (slot _ resource), the storage status data is the current storage segment mask (slot _ mask), as mentioned above, it is a one-dimensional array with 128 bits, for each storage segment of the shared memory, occupation is represented by 0, and idle is represented by 1, which is also convenient for the subsequent and, or and so on operations.

And performing mask data adaptation operation (match _ array) in a mask data operation unit, acquiring current storage state data (slot _ resource) in the shared memory from a storage state data acquisition submodule to obtain storage state data (slot _ resource _ pre) for prejudgment, adapting each line of a mask data array to the acquired storage state data, judging whether an available storage segment group with a current bit as a start bit exists bit by bit, and obtaining optional position data (available _ mask) by recording available storage segments corresponding to the current bit.

The optional position data is a one-dimensional array, has 128 bits (bit), and is in one-to-one correspondence with the 128 bits of the storage state data, which indicates whether the storage segment group with the current bit as the start bit allows resource allocation of M storage segments. If a bit of the optional position data is 1, the allocation request can be responded, and the memory segment group taking the bit as the starting bit can be used for allocation, and the memory resource can be allocated to the shared memory. If all the bits in the optional position data are 0, the current shared memory is not enough to allocate M memory segment resources, and therefore the shared memory cannot be allocated.

For example, for the case shown in FIG. 1 (i.e., for the storage status data shown in FIG. 1), if the size of the storage resource currently requested to be allocated is 3, the resulting alternate location data, as shown in FIG. 8B, where for a bit having a value of 1, a set of storage segments with a starting bit that has the same sequence number as the bit may be used to satisfy the allocation request. For example, bit 2 has a value of 1, indicating that a segment group of length 3 starting with bit 2 in the segment mask (i.e., a constituent segment group of segments 2-4) may be used for allocation to correspond to an allocation request; similarly, bit 3 has a value of 1, indicating that a segment group of length 3 starting with bit 3 in the segment mask (i.e., the constituent segment groups of segments 3-5) may be used for allocation to correspond to an allocation request; the 124 th bit has a value of 1, which indicates that the segment group with the 124 th bit as the start bit and the length of 3 in the segment mask (i.e. the segment group consisting of the segments 124 and 126) can be used for allocation to correspond to the allocation request; the 125 th bit has a value of 1, which indicates that the segment group with the 125 th bit as the start bit and the length of 3 in the segment mask (i.e., the segment group consisting of the segments 125 and 127) can be used for allocation to correspond to the allocation request. When there are no available allocated memory segment groups and thus allocation is not possible, a return is made to the memory resource allocation arrangement, for example, to try the above-described procedure again in the next operation cycle (time cycle).

When there are available allocation memory segment groups to enable allocation, the position of the first bit of 1 is found sequentially from the lowest bit and the highest bit respectively, a first distance between the lowest bit distance and the position of the first bit of 1 is determined, and a second distance between the highest bit and the position of the first bit of 1 is determined, whereby the position of the bit of 1 having the shortest distance is determined according to the sizes of the first distance and the second distance, and an available memory segment group (allocate _ mask _ sel) closest to the boundary distance between both ends of the N memory segments is determined. The operation of determining an available memory segment group closest to the boundary at the two ends of the N memory segments may be performed by inputting optional position data (avail _ mask) to the binary determination submodule, for example.

If it is determined that there is such an available memory segment set, an allocable result signal (avail _ en) is generated and the corresponding allocation request is made using the available memory segment set (allocate _ mask _ sel) and the current memory state data is updated.

When an allocable result signal (avail _ en) and an operation signal which is sent by the cache module and is related to a corresponding allocation request are provided at the same time, a signal (alloc _ en) indicating that allocation is currently possible is generated, and the output module is operated (enabled) to output the starting address of the available memory segment and the length of the memory segment (for example, M memory segments in succession) according to the allocation request.

In the process of releasing the storage resource, the release request can be managed by the storage resource release management module. If a release request is received, mask data (deallocate _ mask) of the release memory segment is acquired from a mask data acquisition subunit (size _ mask) according to the length (for example, M memory segments) of the memory segment required to be released included in the release request. The mask data of the released memory segments are shifted by a shift operation (mask _ shift), the shift amount being determined by the number of memory segments that need to be released. And marking the starting position of the memory segment needing to be released on the N memory segments according to the length of the memory segment needing to be released, which is included in the release request. And updating the current storage state data according to the initial position of the storage segment required to be released.

As described above, the storage resource allocation method shown in fig. 4 may further include step S210 before step S200.

Step S210: it is determined whether the N memory segments comprise a group of available memory segments.

After the comparison and determination, it may be determined whether the N memory segments include an available memory segment group, and the number of the available memory segment groups is one or more (i.e., at least one), so that step S200 may be performed.

The operation of how to determine whether N memory segments comprise a set of available memory segments is further described below by various examples.

For example, in at least one example, as shown in fig. 9, step S210 of the above storage resource allocation method may further include step S211 and step S212.

Step S211: and acquiring storage state data for the shared memory, wherein the storage state data has N bits, and the N bits of the storage state data are used for recording whether the N storage sections are idle or occupied in a one-to-one correspondence mode.

Step S212: the storage status data is used to determine whether the N storage segments comprise a group of available storage segments.

For example, fig. 5 shows a storage status data obtaining sub-module (slot-resource _ update) configured to obtain storage status data (slot _ mask or slot _ resource), where the storage status data is a one-dimensional array and has N bits, and the N bits of the storage status data are used to record that N storage segments are free or occupied in a one-to-one correspondence manner. For example, each bit in the storage status data is set to 0 when occupied and set to 1 when idle. The available memory segment determination module 120 uses the memory status data to determine whether the N memory segments comprise an available memory segment group.

For example, the storage segment group presence determination submodule may perform step S212, and in at least one example, the step S212 further includes steps S2121 and S2122 as follows:

step S2121: in the storage state data, it is judged bit by bit whether there is an available memory segment group having the currently judged current bit as the start bit in an ascending order or a descending order.

Step S2122: in response to there being a set of available memory segments with the current bit as the start bit, the record has a set of available memory segments corresponding to the current bit.

For example, in the storage status data, it is determined bit by bit whether there are M consecutive memory segments having the current bit as the start bit, that is, in the idle state, of which status data is 1 in order from the low address to the high address, or from the high address to the low address. If so, recording the M consecutive memory segments with the current bit as the start bit as an available memory segment group.

For example, for the case shown in fig. 1, for the case where M is 3, it may be determined bit by bit starting from bit 0 up to bit 127 whether there is an available memory segment group that can be used to satisfy the allocation request, e.g., bit 0 does not exist, bit 1 does not exist, bit 2 does exist, bit 3 exists, … … does exist, bit 124 does exist, bit 125 does exist, bit 126 does not exist, bit 127 does not exist. Alternatively, it may be determined bit by bit starting from bit 127 up to bit 0 in descending order whether there is a group of available memory segments that can be used to satisfy the allocation request, e.g., bit 127 does not exist, bit 126 does not exist, bit 125 exists, bit 124 exists, bit … …, bit 3 exists, bit 2 exists, bit 1 does not exist, bit 0 does not exist.

The determination of whether there is a group of available memory segments with the current bit as the start bit on a bit-by-bit basis can be performed in a variety of ways, at least two examples of which are described below.

For example, in one example, each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and step S2121 further includes steps S21211 to S21213 as follows:

step S21211: acquiring mask data, wherein the mask data comprises N bits, the N bits of the mask data correspond to the N bits of the storage state data one by one, and the mask data comprises mask segments which correspond to continuous M bits taking the current bit as a start bit and have values of 1;

step S21212: performing bitwise OR operation on the mask data and the storage state data after inverting the mask data, and performing bitwise AND operation on N bit results obtained by the bitwise OR operation;

step S21213: in response to the result of the AND operation being 1, it is determined that there is an available bank having the current bit as the start bit, and in response to the result of the AND operation being 0, it is determined that there is no available bank having the current bit as the start bit.

For example, when it is determined bit by bit whether there is an available memory segment group having the currently determined current bit as the start bit, the mask data obtaining unit obtains mask data, such as two-dimensional matrix data shown in fig. 8A, according to the arithmetic logic formula:

avail _ mask [ i ] & (— allocate _ array [ i ] | slot _ resource), i is the number of rows of the two-dimensional matrix data, e.g., 0 ≦ i ≦ 128-M, to calculate the available memory segment set.

For example, the following description will be made in steps, taking M as 4 as an example. If the current storage state data slot _ resource [0:15] ═ 0111_1001_1111_1100, for the first line data slot _ mask [0] [0:15] ═ 1111_0000_0000 of the mask data matrix (refer to fig. 8A) corresponding to slot _ id [0], the slot _ mask [0] [0:15] is inverted to obtain slot _ mask [0] [0:15] ═ 0000_1111_1111, and the current storage state data and the mask data obtained by inversion are subjected to an or operation bit by bit, so that 0111_1111_1111 is obtained, and the and operation is further performed bit by bit on the calculation result to obtain a result 0, so that there is no available memory segment group with M ═ 4 for slot _ id [0 ].

Similarly, for the second line data slot _ mask [1] [0:15] ═ 0111_1000_0000_0000 of the mask data matrix corresponding to slot _ id [1], the slot _ mask [1] [0:15] is inverted in accordance with the same calculation method to obtain slot _ mask [1] [0:15] ═ 1000_0111_1111_1111, and or operation is performed bit by bit in correspondence with the current storage state data slot _ resource [0:15] ═ 0111_1001_1111_1100 to obtain 1111_1111_1111, and the result is obtained by bit and operation, so that there is a group of available memory segments where M is 4 for slot _ id [1 ].

As follows, a determination is made in order for slot _ id [2:15] to determine whether or not there is a usable memory segment group of M-4 for slot _ id [2:15], and as a result, there are usable memory segment groups of M-4 for slot _ id [7] to slot _ id [10] of the 14 bits of slot _ id [2:15], respectively.

For example, in another example, each bit in the storage status data is set to 0 when occupied and set to 1 when idle, and step S2121 further includes steps S21214 and S21215 as follows:

step S21214: performing AND operation on the continuous M bits taking the current bit as the start bit in the storage state data bit by bit,

step S21215: it is determined that there is an available bank group having the current bit as the start bit in response to the result of the and operation being 1, and it is determined that there is no available bank group having the current bit as the start bit in response to the result of the and operation being 0.

For example, still taking M4 and the current storage state data is slot _ resource [0:15] ═ 0111_1001_1111_1100 as an example, first, the first four bits (i.e., the leftmost 0111) are subjected to a bit-by-bit and operation, the operation result is 0, then the storage segment corresponding to the first four bits of slot _ id [0] is not a usable storage segment group, next, the second four bits (i.e., the second bit from the leftmost 1111) are subjected to a bit-by-bit and operation, the operation result is 1, then the storage segments corresponding to the second four bits of slot _ id [1] are usable storage segment groups, and similarly, the obtained result is that usable storage segment groups of M4 exist for slot _ id [7] - [10] of the 9 bits of slot _ id [2:15], respectively.

In an embodiment of the present disclosure, for example, in at least one example, for step S212, after step S2121, step S212 further includes step S2123:

step S2123: after determining bit by bit whether there is an available memory segment group with the current bit as the start bit, the optional position data is obtained by recording that there is an available memory segment corresponding to the current bit.

The selectable position data may also be a one-dimensional array including N bits, and the N bits of the selectable position data are used to record, in a one-to-one correspondence, whether there is an available memory segment in an ascending order or a descending order from the N memory segments.

For example, still taking the case that M is 4 and the current storage status data is slot _ resource [0:15] ═ 0111_1001_1111_1100 as an example, through the above manner, the optional location data is slot _ mask [0:15] ═ 0100_0001_1110_0000, and the storage segments with segment numbers of 1, 7 to 10 are all the starting locations of the optional storage segment group. Here, if the distance of the segment group based on the segment number 1 as the start position from the lowest order (lower order end boundary) of the shared memory is 1, and the distance of the segment group based on the segment number 10 as the start position from the highest order (upper order end boundary) of the shared memory is 2, then the segment group based on the segment number 1 as the start position will be selected for responding to the allocation request.

For example, in step S200, among the available storage segment groups, determining an available storage segment group closest to the boundaries at the two ends of the N storage segments may be implemented in a plurality of ways, and in one example, may be implemented by step S220 as follows:

step S220: and for N bits of the optional position data, determining an available storage segment group with the nearest boundary distance at two ends of the N storage segments by adopting a dichotomy method.

For example, based on the result of the available memory segment group (avail _ mask), the position in which the lowest bit is 1 and the position in which the highest bit is 1 are found, the distances between the two and the left and right boundaries (the lower end boundary and the upper end boundary) of the shared memory are compared, and the available memory segment group closest to the available memory segment group is selected for responding to the allocation request. Since the nominal location in the set of available memory segments (avail _ mask) is determined based on the location of the least significant bit of the mask data (allocate _ mask) that allocates the memory segment, the distance that the allocate _ array [ i ] covers the slot _ resource region from the left and right boundaries deviates from the nominal location. Assuming that the lowest leading 1 position of an available memory segment group (avail _ mask) is at the ith bit and the highest leading 1 position is at the jth bit, for continuous M memory segments needing to be occupied in one allocation request, it is obvious that the distance between an allocation region corresponding to the lowest leading 1 position and a lower boundary (slot _ resource [0]) is i; the distance between the allocation region corresponding to the highest leading 1 position and the upper boundary (slot _ resource [127]) needs to be 128-j-M considering the size of M. Therefore, the relationship exists between the starting address (mask _ det) of the finally output available memory segment set:

i, i + j is less than or equal to 128-M, or

J, i + j being greater than 128-M.

A specific exemplary circuit block diagram is shown in fig. 10. The signal (e.g., data0[ 127: 0]) to input the current storage state data is input to an MLOP (most leading 1 prediction) module and an LLOP (least leading 1 prediction) module, respectively. In the LLOP module, the position of the lowest leading 1 of an available storage segment group (avail _ mask) is quickly found to be the ith bit by the dichotomy, and an 'i position' signal is output. In the MLOP module, the signal that the highest leading 1 position of an available memory segment group (avail _ mask) is at the jth bit, namely the 'j position', is quickly searched by a dichotomy. Inputting a signal of 128 bits and a signal (for example, request _ size ═ M) indicating the size of the requested memory resource, obtaining a signal indicating "128-M" by a subtracter, inputting a signal indicating "i position" and a signal indicating "j position" into an adder to obtain a signal "i + j", comparing the signal "i + j" with the signal "128-M" by a comparator, and when "i + j" is less than or equal to "128-M", outputting a start address of an available memory segment group (allocate _ mask _ sel) by a selector as follows: when "i + j" is greater than "128-M", the selector outputs the start address of the available memory segment set (allocate _ mask _ sel) as: mask _ det ═ j.

For example, since the current storage status data of the shared memory has N bits, for example, N ═ 128 bits, 128 bits can be converted into binary, since 2^N’1-127, N-7, so 7 bits are needed for the token to find the "i-position" and "j-position" in the binary. The LLOP module includes 5 sub-modules, which are LLOP _ fetch (N '═ 7), LLOP _ fetch (N' ═ 6), LLOP _ fetch (N '═ 5), LLOP _ fetch (N' ═ 4), and LLOP _ last8, respectively. LLOP _ fetch (N' ═ 7) is used to determine that "i position" exists [63: 0:]or [127:64 ]]The method comprises the following steps: when the "i position" exists [127:64 ]]In the middle, output llop [6 ]]1, and will [127:64]As data1[63: 0]]The input arrives in LLOP _ fetch (N' ═ 6). When the "i position" exists in [63: 0]]In the middle, output llop [6 ]]0, and will [63:0]As data1[63: 0]]The input arrives in LLOP _ fetch (N' ═ 6). Similarly, LLOP _ fetch (N' ═ 6) is used to determine that "i position" exists [63:32]Or [31: 0]]In the case where the "i position" exists [63:32 ]]In the middle, output llop [5]]1, and will [63:32 ]]As data2[31: 0]]Input to LLOP _ fetch (N' ═ 5). When the "i position" exists in [31: 0]]In the middle, output llop [5]]0, and will [63:32 ]]As data2[31: 0]]Input to LLOP _ fetch (N' ═ 5). Similarly, LLOP _ fetch (N' ═ 5) is used to determine that "i position" exists [31: 16:]or [15: 0]]In (1). LLOP _ fetch (N' ═ 4) is used to determine that "i position" exists [15:8]Or [7: 0]]In (4), LLOP _ fetch (N' ═ 4) outputs data4[7:0]To LLOP _ last 8. Since data4[7: 0]]The bit width is small, only 8 bits are needed, and the retrieval can be directly carried out through a simple table look-up method. Based on the steps, the i position, namely the llop [6: 0] is finally obtained]. Similarly, the MLOP module obtains the "j position", i.e., MLOP [6: 0]]The specific process is not described again, and reference is made to the processing process of the LLOP module. Then by comparing llop [6: 0]]Distance from the low end boundary, and mlop [6: 0]]And the high-end boundaryThe available memory segment group (allocate _ mask _ sel) where the bit 1 with the smallest distance is located is selected as the final output.

A hardware implementation of using bisection to determine an available memory segment group with the nearest boundary distance between two ends of N memory segments is shown in fig. 11A and 11B. The LLOP module may adopt a structure as shown in fig. 11A. The MLOP module may employ a structure as shown in fig. 11B.

As shown in FIG. 11A, the current storage state data0[2 ]^N’-1:0](data0[127:0]N' ═ 7) is divided into two parts, the first part data (low N/2bit data) is data0[2 ] starting from the lowest bit 0bit to the middle bit^N’-1-1:0](data0[63:0]) The second part of data (high N/2bit data) is data0[2 ] from the middle bit to the highest bit^N’-1:2^N’-1](data0[63:127]). Determining whether the "i position" is at data0[2 ] by determining whether there are one or more 1-bit bits in the first portion of data^N’-1:0]And simultaneously determining half of data selected by the dichotomy. For example, when there is a 1bit in the first portion of data, indicating that the "i position" is in the first portion of data, the first portion of data is selected, apparently logically | data0[2 ]^N’-1-1:0]The result of (1) | data0[2 ]^N’-1-1:0]1 bitwise inverting output llop _1[ N' -1 ═ 1]0 (e.g., llop [6 ] in fig. 10)]＝0)，data1[2^N’-1-1:0](e.g., data1[63: 0] in FIG. 10]) And output as the final result. When the first portion of data is all 0, indicating "i position" in the second portion of data, the second portion of data is selected, apparently logical | data0[2 ]^N’-1-1:0]The result of (1) is 0, | data0[2 ]^N’-¹-1:0]0 bitwise inverting output llop _1[ N' -1 ═ 0]1 (e.g., llop [6 ] in fig. 10)]＝1)，data1[2^N’-1-1:0](e.g., data1[63: 0] in FIG. 10]) And output as the final result.

As shown in FIG. 11B, it is determined whether "j position" in FIG. 11A is in data0[2 ] by determining whether there are one or more 1-bit cases in the second portion of data^N’-1:2^N’-1]And simultaneously determining half of data selected by the dichotomy. For example, when there is a 1bit in the second portion of data, the "j position" is indicated"in the second partial data, then the second partial data is selected, apparently logical | data0[2^N’-1:2^N’-1]The result of (1) | data0[2 ]^N’-1:2^N’-1]Bitwise negation output mlop _1[ N' -1 ═ 1]0 (e.g., mlop [6 ] in fig. 10)]＝0)，data1[2^N’-1-1:0](e.g., data1[63: 0] in FIG. 10]) And output as the final result. When the second portions of data are all 0 bits, indicating that the "j position" is in the first portion of data, the first portion of data is selected, apparently logical | data0[2 ]^N’-1:2^N’-1]The result of (1) is 0, | data0[2 ]^N’-1:2^N’-1]0 bitwise negation output mlop _1[ N' -1 ═ 0]1 (e.g., mlop [6 ] in fig. 10)]＝1)。data1[2^N’-1-1:0](e.g., data1[63: 0] in FIG. 10]) And output as the final result.

For example, the storage resource allocation is simulated by using the storage resource allocation method proposed by the present disclosure and the existing storage resource allocation method, respectively. Fig. 12 and fig. 13 show that, by software simulation, in the case that the size of the storage resource required by each work group is random and the calculation time of each work group is random, the same work group queue is processed, and the storage resource allocation method proposed by the present disclosure can complete the storage resource allocation more quickly.

For example, as shown in fig. 12, the speed of reducing the remaining work groups (workgroups) in the work group queue allocated by the storage resource allocation method proposed by the present disclosure to 0 is faster (a curve (dotted line) pointed by reference sign a1 in fig. 12 represents a time variation curve of the remaining work groups in the work group queue allocated by the storage resource allocation method proposed by the present disclosure, and a curve (solid line) pointed by reference sign a2 represents a time variation curve of the remaining work groups in the work group queue allocated by the existing storage resource allocation method); as shown in fig. 13, the size of the remaining storage resources (resource remaining) allocated by the storage resource allocation method proposed by the present disclosure is kept at a lower value for a period of time, which indicates that the storage resources are allocated faster for the work group sequence by the storage resource allocation method proposed by the present disclosure, and the calculation is completed faster (the curve (solid line) pointed to by reference numeral B1 in fig. 13 represents the variation curve of the remaining storage resources allocated by the storage resource allocation method proposed by the present disclosure, and the curve (dotted line) pointed to by reference numeral B2 represents the variation curve of the remaining storage resources allocated by the existing storage resource allocation method). That is to say, in the dynamic change process of allocation and release, the storage resource allocation method proposed by the embodiment of the present disclosure has a higher probability of occurrence of a continuous storage segment and a lower probability of occurrence of a fragmented storage segment. For example, as shown in fig. 14, by performing multiple (about 200) simulations on the allocation process, the time (workgroup process time) for completing the same task by the existing storage resource allocation method and the storage resource allocation method proposed by the embodiment of the present disclosure is counted, and the time difference (delta time) is calculated. In most cases, the time difference of most of the time differences obtained by 200 simulations is a positive value, which indicates that the allocation time of the storage resource allocation method according to the embodiment of the present disclosure is shorter and the allocation is more efficient.

At least some embodiments of the present disclosure also provide a storage resource allocation apparatus, for example, for a parallel processor, such as a General Purpose Graphics Processor (GPGPU), to which embodiments of the present disclosure are not limited. For example, the parallel processor comprises a first or a plurality of computing units, each computing unit comprises a plurality of processing units, a shared memory, a register file, a work item set scheduling module, and a storage resource allocation device according to the embodiment of the disclosure, that is, the storage resource allocation device is integrated inside the computing unit, for example, in a module manner in the work item set scheduling module. In other examples, the storage resource allocation apparatus may also be provided outside the computing unit.

Fig. 15 is a schematic block diagram of a storage resource allocation apparatus according to some embodiments of the present disclosure, where the storage resource allocation apparatus may be implemented by hardware, firmware, and the like, for example, each of the modules mentioned below may be implemented by using a digital circuit or any combination of a digital circuit and an analog circuit, for example, refer to the specific example shown in fig. 5, and the embodiments of the present disclosure are not limited thereto. For example, as shown in fig. 15, the storage resource allocation apparatus 100 includes an allocation request receiving module 110 and an available memory segment determining module 120.

The allocation request receiving module 110 is configured to receive an allocation request occupying M consecutive memory segments.

The available memory segment determining module 120 is configured to determine, in response to the N memory segments including a first number of available memory segment groups, one available memory segment group that is closest to boundaries at both ends of the N memory segments for responding to the allocation request, wherein the available memory segment groups each include M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N.

For example, the available memory segment determination module 120 is further configured to determine whether the N memory segments include the first number of available memory segment groups.

For example, the available memory segment determination module 120 may include a storage status data acquisition submodule configured to acquire storage status data for the shared memory. The storage status data has N bits, and the N bits of the storage status data are used for recording whether the N storage segments are idle or occupied in a one-to-one correspondence mode. At this time, the available memory segment determination module is further configured to determine whether the N memory segments include an available memory segment group using the memory status data.

For example, the available memory segment determination module 120 may further include an available memory segment group presence judgment sub-module configured to judge bit by bit whether there is an available memory segment group having the currently judged current bit as a start bit in the storage status data in an ascending order or a descending order, and configured to record that there is an available memory segment corresponding to the current bit in response to there being an available memory segment group having the current bit as a start bit.

For example, each bit in the storage status data is set to 0 when occupied and set to 1 when idle. For example, the usable segment group presence judgment unit includes: a mask data acquisition unit and a mask data manipulation unit.

The mask data acquisition unit is configured to acquire mask data, wherein the mask data includes N bits, the N bits of the mask data correspond to the N bits of the storage status data one by one, and the mask data includes mask segments of which the values are 1 and which are continuous M bits corresponding to the current bit as a start bit.

The mask data operation unit is configured to perform a bit-wise OR operation on the mask data after inverting the mask data and the memory state data, and then perform a bit-wise AND operation on N bit results obtained by the bit-wise OR operation, and is configured to determine that an available memory segment group having the current bit as a start bit exists in response to a result of the AND operation being 1, and determine that an available memory segment group having the current bit as a start bit does not exist in response to a result of the AND operation being 0.

For example, each bit in the storage status data is set to 0 when occupied and set to 1 when idle. For example, the usable segment group presence judgment sub-module includes: a storage status data operation unit and an operation result judgment unit.

The storage state data operating unit is configured to perform a bitwise AND operation on consecutive M bits of the storage state data, the M bits having a current bit as a start bit.

The operation result judgment unit is configured to determine that there is an available memory segment group having the current bit as the start bit in response to a result of the and operation being 1, and determine that there is no available memory segment group having the current bit as the start bit in response to a result of the and operation being 0.

For example, the available memory segment group presence judgment sub-module further includes an optional position data determination unit configured to obtain optional position data by recording whether there is an available memory segment corresponding to the current bit after judging bit by bit whether there is an available memory segment group having the current bit as a start bit, wherein the optional position data includes N bits for recording in a one-to-one correspondence whether there is an available memory segment in an ascending order or a descending order from the N memory segments.

For example, the available memory segment determining module 120 may further include a dichotomy determining submodule configured to determine, for N bits of the selectable position data, one available memory segment group where boundaries at two ends of the N memory segments are closest to each other by dichotomy. For example, the storage resource allocation apparatus 100 further includes an output module 130, and the output module 130 is configured to output the determined start address of one of the available storage segment groups and the length M of the available storage segment group.

For example, available memory segment determination module 120 is further configured to continue monitoring the N memory segments until the N memory segments comprise an available memory segment set in response to the N memory segments not comprising an available memory segment set. After the available memory segment determination module 120 fails to perform an allocation operation for an allocation request, it may wait a predetermined time (e.g., in the next operation cycle) and may perform the allocation operation again until the allocation operation is successful.

For example, the storage resource allocation apparatus 100 may further include a release request processing module 150, where the release request processing module 150 is configured to receive a release request for at least one currently occupied storage segment of the N storage segments before or at the same time of receiving an allocation request occupying M consecutive storage segments, and process the release request before processing the allocation request.

For example, each of the allocation request receiving module 110, the available memory segment determining module 120, the outputting module 130, and the release request processing module 140 may be implemented by hardware, firmware, or software.

Thus, for example, a processor may execute code and programs to implement some or all of the functionality of the various modules as described above. For another example, each of the allocation request receiving module 110, the available memory segment determining module 120, the outputting module 130 and the release request processing module 140 may be a hardware device, and is used to implement part or all of the functions of each module as above. For example, each of the allocation request receiving module 110, the available memory segment determining module 120, the outputting module 130, and the release request processing module 140 may be a circuit board or a combination of a plurality of circuit boards for implementing the above functions.

It should be noted that the storage resource allocation apparatus 100 can be used to implement the foregoing storage resource allocation method. For example, the allocation request receiving module 110 may be configured to implement step S100 in the foregoing storage resource allocation method, and for specific implementation processes and details, reference may be made to relevant descriptions of step S100, which is not repeated herein. For example, the memory segment determining module 120 may be configured to implement step S200 in the foregoing memory resource allocation method, and for specific implementation processes and details, reference may be made to relevant descriptions of step S200, which is not repeated herein. For example, the output module 130 may be configured to implement step S300 in the foregoing storage resource allocation method, and specific implementation processes and details may refer to the related description of step S300, which is not repeated herein. For example, the memory segment determining module 120 may also be configured to implement step S400 in the foregoing memory resource allocation method, and for specific implementation processes and details, reference may be made to the related description of step S400, which is not repeated herein. For example, the release request processing module 140 may be configured to implement step S500 in the foregoing storage resource allocation method, and for specific implementation processes and details, reference may be made to the related description of step S500, and no repeated description is provided here.

Fig. 16 is a schematic block diagram of another storage resource allocation apparatus provided in some embodiments of the present disclosure. For example, as shown in fig. 16, the storage resource allocation apparatus 500 includes a memory 510 and a processor 520, and the storage resource allocation apparatus may be used for a parallel processor, such as a General Purpose Graphics Processor (GPGPU), for example, which is not limited by the embodiments of the present disclosure. For example, the memory 510 is used for non-transitory storage of computer readable instructions, and the processor 520 is used for executing the computer readable instructions, and the computer readable instructions are executed by the processor 520 to perform the storage resource allocation method provided by any embodiment of the disclosure.

For example, the memory 510 and the processor 520 may be in direct or indirect communication with each other. For example, in some examples, the storage resource allocation apparatus 500 may further include a system bus 530 as shown in fig. 16, and the memory 510 and the processor 520 may communicate with each other through the system bus 530, for example, the processor 520 may access the memory 510 through the system bus 1006. For example, in other examples, components such as memory 510 and processor 520 may communicate over a Network On Chip (NOC) connection.

For example, processor 520 may control other components in the storage resource allocation apparatus to perform desired functions. The processor 520 may be a device with data processing capability and/or program execution capability, such as a Central Processing Unit (CPU), Tensor Processor (TPU), Network Processor (NP), or graphics processor GPU, and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.

For example, memory 510 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.

For example, one or more computer instructions may be stored on memory 510 and executed by processor 520 to implement various functions. Various applications and various data, as well as various data used and/or generated by the applications, and the like, may also be stored in the computer-readable storage medium.

For example, some computer instructions stored by memory 510, when executed by processor 520, may perform one or more steps in accordance with the storage resource allocation method above.

For example, as shown in FIG. 16, the storage resource allocation apparatus 500 may further include an input interface 540 that allows an external device to communicate with the storage resource allocation apparatus 500. For example, input interface 540 may be used to receive instructions from an external computer device, from a user, and the like. The storage resource allocation apparatus 500 may further include an output interface 550 interconnecting the storage resource allocation apparatus 500 and one or more external devices. For example, the storage resource allocation apparatus 500 may display an image or the like through the output interface 550.

For example, for a detailed description of a processing procedure of the storage resource allocation method according to the foregoing embodiment of the present disclosure, reference may be made to the related description in the foregoing embodiment of the storage resource allocation method, and repeated descriptions are omitted here.

It should be noted that the storage resource allocation apparatus provided in the embodiments of the present disclosure is illustrative and not restrictive, and the storage resource allocation apparatus may further include other conventional components or structures according to practical application needs, for example, in order to implement the necessary functions of the storage resource allocation apparatus, a person skilled in the art may set other conventional components or structures according to a specific application scenario, and the embodiments of the present disclosure are not limited thereto.

For technical effects of the storage resource allocation apparatus provided in the embodiments of the present disclosure, reference may be made to corresponding descriptions regarding the storage resource allocation method in the foregoing embodiments, and details are not repeated here.

At least some embodiments of the present disclosure also provide a non-transitory storage medium. Fig. 17 is a schematic diagram of a non-transitory storage medium according to some embodiments of the present disclosure. For example, as shown in fig. 17, the storage medium 600 non-transitory stores computer readable instructions 610, and when the non-transitory computer readable instructions 610 are executed by a computer (including a processor), the storage resource allocation method provided by any embodiment of the disclosure may be executed.

For example, one or more computer instructions may be stored on the storage medium 600. Some of the computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the storage resource allocation method described above.

For example, the storage medium may include a storage component of a tablet computer, a hard disk of a personal computer, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a compact disc read only memory (CD-ROM), a flash memory, or any combination of the above storage media, as well as other suitable storage media. For example, the storage medium 600 may include the memory 510 in the aforementioned storage resource allocation apparatus 500.

For technical effects of the storage medium provided by the embodiments of the present disclosure, reference may be made to corresponding descriptions about the storage resource allocation method in the foregoing embodiments, and details are not repeated here.

For the present disclosure, there are the following points to be explained:

(1) in the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are referred to, and other structures may refer to general designs.

(2) Features of the disclosure in the same embodiment and in different embodiments may be combined with each other without conflict.

The above is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A storage resource allocation method is used for allocating a shared memory in a computing unit, wherein the shared memory comprises N memory segments, the N memory segments are sequentially arranged according to segment numbers, N is a positive integer greater than 1,

the resource allocation method comprises the following steps:

receiving an allocation request occupying M continuous storage segments;

in response to said N memory segments comprising a first number of available memory segment groups, determining, among said available memory segment groups, one of said available memory segment groups that is closest to boundaries at both ends of said N memory segments for responding to said allocation request,

wherein the set of available memory segments each includes M consecutive memory segments in an idle state to satisfy the allocation request, M being a positive integer and less than N.

2. The method of claim 1, further comprising:

determining whether the N memory segments comprise the first number of available memory segment groups.

3. The method of claim 2, wherein determining whether the N memory segments comprise the set of available memory segments comprises:

acquiring storage state data for the shared memory, wherein the storage state data has N bits, and the N bits of the storage state data are used for recording whether the N storage segments are idle or occupied in a one-to-one correspondence manner;

determining whether the N memory segments comprise the set of available memory segments using the memory status data.

4. The method of claim 3, wherein using the storage status data to determine whether the N storage segments comprise the set of available storage segments comprises:

judging whether an available memory segment group using the currently judged current bit as a start bit exists in the memory state data in an ascending order or a descending order bit by bit,

in response to there being a set of available memory segments with the current bit as a start bit, recording that there is a set of available memory segments corresponding to the current bit.

5. The method of claim 4, wherein each bit in the storage status data is set to 0 when occupied and 1 when idle,

determining, bit by bit, whether there is a group of available memory segments with the current bit as a start bit, comprising:

acquiring mask data, wherein the mask data comprises N bits, the N bits of the mask data correspond to the N bits of the storage state data one by one, the mask data comprises mask segments which correspond to continuous M bits with the current bit as a start bit and have values of 1,

after the mask data is inverted, carrying out bitwise OR operation on the mask data and the storage state data, carrying out bitwise AND operation on N bit results obtained by the bitwise OR operation,

in response to a result of the AND operation being 1, it is determined that there is an available bank having the current bit as a start bit, and in response to a result of the AND operation being 0, it is determined that there is no available bank having the current bit as a start bit.

6. The method of claim 4, wherein each bit in the storage status data is set to 0 when occupied and 1 when idle,

performing bitwise AND operation on continuous M bits in the storage state data by taking the current bit as a start bit,

7. The method of claim 5 or 6, further comprising:

after judging whether an available memory segment group with the current bit as a start bit exists bit by bit, by recording that a memory segment corresponding to the current bit is available, optional position data is obtained,

the selectable position data comprises N bits, and the N bits of the selectable position data are used for recording whether the storage segments are available in the ascending order or the descending order from the N storage segments in a one-to-one correspondence mode.

8. The method of claim 7, wherein determining, among the sets of available memory segments, one set of available memory segments that is closest to boundaries at both ends of the N memory segments comprises:

and for the N bits of the optional position data, determining an available storage segment group with the nearest boundary distance at two ends of the N storage segments by adopting a dichotomy.

9. The method of claim 1, further comprising:

the determined starting address of one of the available memory segment groups and the length M of the available memory segment group are output.

10. The method of claim 1, further comprising:

in response to the N memory segments not including the set of available memory segments, continuing to monitor the N memory segments until the N memory segments include the set of available memory segments.

11. The method of claim 1, further comprising:

before or at the same time of receiving allocation requests occupying M continuous storage segments, receiving a release request for at least one currently occupied storage segment in the N storage segments, and processing the release request before processing the allocation requests.

12. A storage resource allocation device is used for allocating a shared memory in a computing unit, wherein the shared memory comprises N memory segments, the N memory segments are arranged in sequence according to segment numbers, N is a positive integer larger than 1,

the resource allocation apparatus includes:

an allocation request receiving module configured to receive an allocation request occupying M consecutive memory segments;

an available memory segment determination module configured to determine, in response to the N memory segments comprising a first number of available memory segment groups, one of the available memory segment groups that is closest to boundaries at both ends of the N memory segments for responding to the allocation request,

13. The apparatus of claim 12, wherein,

the available memory segment determination module is further configured to determine whether the N memory segments comprise the first number of available memory segment groups.

14. The apparatus of claim 13, wherein the available memory segment determining module comprises:

a storage status data obtaining sub-module configured to obtain storage status data for the shared memory, wherein the storage status data has N bits, and the N bits of the storage status data are used for recording whether the N storage segments are idle or occupied in a one-to-one correspondence manner;

wherein the available memory segment determination module is further configured to determine whether the N memory segments comprise the set of available memory segments using the memory status data.

15. The apparatus of claim 14, wherein the available memory segment determining module further comprises:

an available memory segment group presence judgment sub-module configured to judge bit by bit whether an available memory segment group having a currently judged current bit as a start bit exists in the storage status data in an ascending order or a descending order, and configured to record that an available memory segment group corresponding to the current bit exists in response to the existence of an available memory segment group having the current bit as a start bit.

16. The apparatus of claim 15, wherein each bit in the storage status data is set to 0 when occupied and 1 when idle,

the usable memory segment group existence judgment submodule includes:

a mask data acquisition unit configured to acquire mask data, wherein the mask data includes N bits, the N bits of the mask data correspond one-to-one to the N bits of the storage status data, the mask data includes mask segments each having a value of 1 from consecutive M bits corresponding to the current bit as a start bit,

a mask data operation unit configured to perform a bitwise OR operation with the storage status data after inverting the mask data and then perform a bitwise AND operation on N-bit results obtained by the bitwise OR operation, and configured to determine that an available memory segment group having the current bit as a start bit exists in response to a result of the AND operation being 1, and determine that an available memory segment group having the current bit as a start bit does not exist in response to a result of the AND operation being 0.

17. The apparatus of claim 15, wherein each bit in the storage status data is set to 0 when occupied and 1 when idle,

the usable memory segment group existence judgment submodule includes:

a storage state data operating unit configured to AND-operate consecutive M bits of the storage state data bit by bit with the current bit as a start bit,

an operation result judgment unit configured to determine that there is an available memory segment group having the current bit as a start bit in response to a result of the AND operation being 1, and determine that there is no available memory segment group having the current bit as a start bit in response to a result of the AND operation being 0.

18. The apparatus according to claim 16 or 17, wherein the available memory segment group presence determining submodule further includes:

an optional position data determination unit configured to obtain optional position data by recording that there is an available memory segment corresponding to the current bit after determining bit by bit whether there is an available memory segment group having the current bit as a start bit,

19. The apparatus of claim 18, wherein the available memory segment determining module further comprises:

and the dichotomy determination submodule is configured to determine, by adopting a dichotomy method, one available storage segment group with the nearest boundary distance between two ends of the N storage segments for the N bits of the selectable position data.

20. The apparatus of claim 12, further comprising:

an output module configured to output the determined start address of the one available memory segment group and the length M of the available memory segment group.

21. The apparatus of claim 12, wherein the available memory segment determination module is further configured to continue monitoring the N memory segments until the N memory segments comprise the available memory segment set in response to the N memory segments not comprising an available memory segment set.

22. The apparatus of claim 18, further comprising:

a release request processing module configured to receive a release request for at least one currently occupied memory segment of the N memory segments before or while receiving an allocation request occupying M consecutive memory segments, and process the release request before processing the allocation request.

23. A storage resource allocation apparatus, comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer readable instructions, wherein the computer readable instructions, when executed by the processor, perform the storage resource allocation method of any one of claims 1-11.

24. A non-transitory storage medium storing, non-transitory, computer readable instructions, wherein the computer readable instructions, when executed by a computer, perform the resource allocation method of any one of claims 1-11.