CN112559169B - Resource allocation method and device - Google Patents

Resource allocation method and device Download PDF

Info

Publication number
CN112559169B
CN112559169B CN202011333492.8A CN202011333492A CN112559169B CN 112559169 B CN112559169 B CN 112559169B CN 202011333492 A CN202011333492 A CN 202011333492A CN 112559169 B CN112559169 B CN 112559169B
Authority
CN
China
Prior art keywords
register
segment
segments
registers
register segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011333492.8A
Other languages
Chinese (zh)
Other versions
CN112559169A (en
Inventor
喻琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Haiguang Microelectronics Technology Co Ltd
Original Assignee
Chengdu Haiguang Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Haiguang Microelectronics Technology Co Ltd filed Critical Chengdu Haiguang Microelectronics Technology Co Ltd
Priority to CN202011333492.8A priority Critical patent/CN112559169B/en
Publication of CN112559169A publication Critical patent/CN112559169A/en
Application granted granted Critical
Publication of CN112559169B publication Critical patent/CN112559169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Abstract

A resource allocation method and device. The resource allocation method comprises the following steps: determining the number P of register segments needing to be used according to the current work item set, selecting P register segments from the register segments which are in the idle state currently of the N register segments according to the number P of the register segments needing to be used, wherein the segment number sequence numbers of the P register segments are not limited to be continuous, and virtualizing the selected P register segments into sequentially continuous storage space for running the current work item set. The resource allocation method can improve the utilization rate of the register.

Description

Resource allocation method and device
Technical Field
The embodiment of the disclosure relates to a resource allocation method and device.
Background
In currently designed processors (e.g., general-purpose graphics processing units (gpgpgpu), which mainly utilize a graphics processor for processing graphics tasks to compute General-purpose Computing tasks originally processed by a central processing Unit), a Workgroup (e.g., workgroup is a set of all work items in a sub-task, and the Workgroup can be divided into several work item sets) is allocated to a Computing Unit (Computing Unit) for processing. For example, a computing unit includes a plurality of (e.g. 4) vector computing units SIMD (Single Instruction Multiple Data), and one of the vector computing units SIMD will often process different work item sets (Wavefront) at the same time. However, the number of registers (e.g., VGPR (Vector General Purpose Register) in the Vector compute unit SIMD, i.e., the General Purpose registers in the Vector compute unit, is constant, these registers will be shared by a plurality of different work item sets, and the registers VGPR of each work item set do not allow overlap.
Disclosure of Invention
The embodiment of the disclosure provides a resource allocation method and device. The resource allocation method can improve the utilization rate of the register.
The present disclosure provides a resource allocation method and an apparatus, where the resource allocation method is used for allocating registers in a computing unit, where the computing unit includes M registers arranged according to register numbers, the M registers are sequentially segmented into N register segments according to the register number serial numbers, the N register segments are arranged according to segment numbers, where M, N is a positive integer greater than 0, and the resource allocation method includes: determining the number P of register segments needing to be used according to a current work item set, and selecting P register segments from the register segments which are in an idle state currently in the N register segments according to the number P of the register segments needing to be used, wherein the segment number sequence numbers of the P register segments are not limited to be continuous, and the selected P register segments are virtualized into a sequentially continuous storage space to be used for operating the current work item set.
For example, in a resource allocation method provided in at least one embodiment of the present disclosure, segment number sequence numbers of at least two of the P register segments are not consecutive.
For example, the resource allocation method provided in at least one embodiment of the present disclosure further includes: and in the register segments in the idle state at present, starting from the register segment in the idle state with the minimum or maximum segment number, sequentially selecting the register segments in the idle state according to the ascending order or the descending order of the segment numbers to obtain the P register segments.
For example, the resource allocation method provided in at least one embodiment of the present disclosure further includes: after the P register segments are selected, obtaining a number string recording the segment numbers of the P register segments in the N register segments, and virtualizing the P register segments as the sequentially continuous storage space by using the number string, so as to address the P register segments for operating the current work item set.
For example, in a resource allocation method provided in at least one embodiment of the present disclosure, virtualizing the P register segments as the sequentially continuous storage space using the number string to address the P register segments to run the current work item set includes: and obtaining respective segment start physical addresses of the P register segments according to the number string, wherein for an instruction included in the current work item set, according to the number of registers of a group of registers required to be used in the storage space by an operand of the instruction, the segment numbers of the P register segments and the respective segment start physical addresses are used for performing physical address conversion, so as to obtain the physical addresses of the group of registers.
For example, in the resource allocation method provided in at least one embodiment of the present disclosure, the segment start physical address of each of the P register segments corresponds to a register with the smallest register number sequence number in each of the P register segments.
For example, in a resource allocation method provided in at least one embodiment of the present disclosure, the performing physical address translation by using the segment numbers of the P register segments and the starting physical addresses of the P register segments further includes: arranging the segment numbers of the P register segments into a first sequence, correspondingly arranging the segment initial physical addresses of the P register segments into a second sequence, and obtaining the initial physical addresses of the group of registers needed to be used by the operand in the storage space according to the first sequence, the second sequence and the operand of the instruction.
For example, in a resource allocation method provided in at least one embodiment of the present disclosure, the starting physical address of the set of registers corresponds to a register number with a smallest register number sequence number in the set of registers.
For example, in the resource allocation method provided in at least one embodiment of the present disclosure, the set of registers is located in two register segments with discontinuous segment numbers of the P register segments.
For example, the resource allocation method provided in at least one embodiment of the present disclosure further includes: in response to the set of registers being located in a register segment of the P register segments where the two segment numbers are not consecutive, determining a location of the starting physical address of the set of registers in a register segment of the two register segments where the segment number is smaller, and a spacing between the two register segments, addressing using the location and the spacing.
For example, in the resource allocation method provided in at least one embodiment of the present disclosure, a position of the starting physical address of the group of registers in a register segment with a smaller segment number in the two register segments is determined according to the starting physical address of the group of registers and the number of registers in each register segment.
For example, in the resource allocation method provided in at least one embodiment of the present disclosure, it is determined that the group of registers is located in the two register segments according to the position of the starting physical address of the group of registers in the register segment with the smaller segment number in the two register segments and the memory amount of the group of registers.
For example, in a resource allocation method provided in at least one embodiment of the present disclosure, the addressing using the position and the distance includes: and obtaining the physical addresses of the registers in the group except the register corresponding to the initial physical address by using the position and the distance of the initial physical address of the group of registers in the register segment with smaller segment number in the two register segments and combining the initial physical address of the group of registers required to be used by the operand, so as to address the physical addresses of the group of registers in sequence.
The present disclosure also provides a resource allocation apparatus, configured to allocate registers in a computing unit, where the computing unit includes M registers arranged according to register numbers, the M registers are sequentially segmented into N register segments according to the register number serial numbers, and the N register segments are arranged according to segment numbers, where M and N are positive integers greater than 0; the resource allocation apparatus includes: the device comprises a register segment number determining module, a register segment selecting module and a register segment virtual converting module, wherein the register segment number determining module is configured to determine the number P of register segments needing to be used according to a current work item set, the register segment selecting module is configured to select P register segments from the register segments which are in an idle state currently in the N register segments according to the number P of the register segments needing to be used, the segment number serial numbers of the P register segments are not limited to be continuous, and the register segment virtual converting module is configured to virtualize the P selected register segments into sequentially continuous storage spaces for operating the current work item set.
In at least one embodiment of the present disclosure, a resource allocation method and an apparatus thereof virtualize P register segments selected as sequential storage spaces for running a current work item set, and do not require that segment numbers of the P register segments are consecutive, so that an allocation rate of registers can be increased.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description only relate to some embodiments of the present disclosure and do not limit the present disclosure.
Fig. 1 is a schematic flowchart of a resource allocation method according to at least one embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a resource allocation method according to at least another embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating a partial structure of a register segment in a computing unit according to at least one embodiment of the present disclosure;
FIG. 4 is a block starting physical address of a register block to be used according to at least one embodiment of the present disclosure;
fig. 5 is a flowchart illustrating a resource allocation method according to at least one further embodiment of the present disclosure;
FIG. 6 is a diagram illustrating a rearrangement of register segments that may be required for use in accordance with at least one embodiment of the present disclosure;
FIG. 7 is a diagram illustrating obtaining physical addresses of a set of registers that need to be used according to at least one embodiment of the present disclosure;
FIG. 8 is a schematic diagram of obtaining physical addresses of a set of registers that need to be used according to at least one other embodiment of the present disclosure;
fig. 9A is a schematic diagram illustrating an implementation process of a resource allocation method according to at least one embodiment of the present disclosure;
fig. 9B is a schematic diagram illustrating an implementation process of a resource allocation method according to at least another embodiment of the present disclosure; and
fig. 10 is a schematic diagram of a resource allocation apparatus according to at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.
Currently designed processors, such as general purpose graphics processors (GPGPGPUs), have multiple registers (VGPRs) (e.g., 256 32-bit registers) allocated separately for each Vector thread (Vector thread). For example, the plurality of registers are segmented into a plurality of register segments. The shader input unit allocates register resources for each work item set (Wavefront) in units of register segments (VGPR slots).
For example, in some embodiments, when the Shader input unit receives a corresponding Workgroup (Workgroup) and creates a specific work item set, register resource allocation of all computing units (e.g., vector computing units) (SIMDs) in the current Shader (Shader) is queried. Since only one contiguous segment of registers can be allocated to a single work item set, only the number of contiguous registers that meet the requirements of a single work item set can be allocated to the work item set, in other words, only the contiguous register segments can be allocated to a single work item set. Therefore, the shader input unit needs to know the number of the continuous longest register segments in the idle registers of each compute unit, so as to determine whether the work item set can have enough registers to run on the compute unit.
Based on the above-described manner of sequentially allocating register resources, the registers of the compute unit must be allocated to the work item set in consecutive register segments. As the work item set with larger demand for register resources and the work item set with relatively smaller demand for register resources are randomly distributed to a single computing unit for operation, meanwhile, the indefinite work item sets complete the computing tasks and release the register resources occupied by the work item sets on the computing units. In this case, the register resources on the computing unit are cut into a plurality of small numbers of consecutive or single register segments, forming a tiny fragment, and once the fragment is too large, the subsequent work item set with relatively large demand on the register resources cannot be allocated to the computing unit to run, so that one work item set cannot be allocated to the current computing unit. As more and more work item sets run on the same computing unit, the more serious the fragmentation of the register is, so that the register resource of each computing unit in the current computing unit cannot be effectively utilized in time, which causes the resource utilization efficiency to become low, and causes the waste of hardware resources.
At least one embodiment of the present disclosure provides a resource allocation method. The resource allocation method is used for allocating registers in a computing unit, the computing unit comprises M registers which are arranged according to register numbers, the M registers are sequentially segmented into N register segments according to the register number serial numbers, the N register segments are arranged according to segment numbers, M, N is a positive integer which is larger than 0, and the resource allocation method comprises the following steps: determining the number P of register segments needing to be used according to the current work item set, selecting P register segments from the register segments which are in the idle state currently of the N register segments according to the number P of the register segments needing to be used, wherein the segment number sequence numbers of the P register segments are not limited to be continuous, and virtualizing the selected P register segments into sequentially continuous storage space for running the current work item set.
The resource allocation method provided by the foregoing embodiment of the present disclosure virtualizes the selected P register segments as sequentially continuous storage spaces for running the current work item set, and does not require that segment numbers of the P register segments are continuous, thereby increasing the allocation rate of registers.
Embodiments of the present disclosure and examples thereof are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a resource allocation method according to at least one embodiment of the present disclosure. Fig. 2 is a flowchart illustrating a resource allocation method according to at least another embodiment of the present disclosure. The resource allocation method shown in fig. 1 includes steps S110-S130. The resource allocation method shown in fig. 2 includes steps S210-S240.
Step S110: and determining the number P of the register segments needing to be used according to the current work item set. For example, when a piece of execution program (e.g., kernel program (Kernel), which is a piece of parallel execution program that can clearly express the defined functions at the minimum) is compiled, the compiler will analyze how many registers or register segments the piece of execution program needs to use. The shader input unit (shown in FIG. 9A) can know how many registers or register segments the work item set needs to use when receiving the current work item set. The shader input unit allocates register resources for each work item set (Wavefront) in units of register segments (VGPR slots).
Step S120: according to the number P of the register segments needing to be used, P register segments are selected from the register segments which are in the idle state currently in the N register segments, and the segment number sequence numbers of the P register segments are not limited to be continuous.
Fig. 3 is a schematic diagram illustrating a partial structure of a register segment in a computing unit according to at least one embodiment of the present disclosure; FIG. 4 is a block starting physical address of a register block to be used according to at least one embodiment of the present disclosure; fig. 9A is a schematic diagram of an implementation process of a resource allocation method according to at least one embodiment of the present disclosure.
For example, in some embodiments, the resource allocation method is used for allocation of registers in a compute unit, e.g., in a graphics processing unit (GPU, e.g., GPGPU), the compute unit including M registers arranged by register number, the M registers being sequentially segmented into N register segments by register number sequence number, the N register segments being positive integers greater than 0 by segment number arrangement M, N.
As shown in fig. 3 and 4, each computation unit (e.g., vector computation unit) SIMD (Single Instruction Multiple Data) includes 16 register segments (VGPR slots), that is, the value of N is 16. Note that slots in the drawing indicate register segments, and numbers after the slots indicate segment numbers of the register segments. Each register segment comprises 4 registers VGPR, that is to say that M takes the value of the product 64 of 16 and 4 (i.e. 16 x 4). The 16 register segments are sorted from low to high by the segment numbers SN of 0 to 15, correspondingly, the 64 registers VGPR are sorted from low to high by the register numbers SM of 0 to 63. The register segment with segment number SN of 0 includes 4 registers VGPR with register numbers SM from 0 to 3, and so on the registers included in the other register segments. Register segment 0 is located at the Least Significant Bit LSB (Least Significant Bit), and register segment 15 is located at the Most Significant Bit MSB (Most Significant Bit).
It should be noted that, in the embodiment of the present disclosure, the number of registers and register segments shown is for illustration, and the embodiment of the present disclosure is not limited thereto. For example, the number of register segments may also be 64, and the number of registers 256. According to the division of the register segments, each register segment may also be designed to include other numbers of registers according to needs, and the embodiments of the present disclosure are not limited thereto.
For example, as shown in FIG. 3, a register segment that is colored (e.g., gray in the figure) in the figure represents an occupied register segment, and a register segment that is not colored represents a register segment that is in an idle state. It can be seen that some scattered register segments remain available for allocation. For example, if the register segment in the idle state is marked as 1, the segment number A2 of the register segment in the idle state is obtained as available _ mask [15 ] =16 ″ -b0110 \u0001 \u0101 \u0110 (where 16 represents 16 bits in total, b represents binary, and the lower bits in the encoding represent smaller segment numbers). At this time, 7 register segments having common segment numbers SN of 1, 2, 4, 6, 8, 13, and 14 are in an idle state.
It should be noted that the number and the position in the register segment in the compute unit SIMD shown in fig. 3 are taken as an example, and the embodiments of the present disclosure are not limited thereto.
As shown in fig. 9A, when the shader input unit receives the corresponding work group item set and creates a specific work item set, the resource allocation conditions of the register segments of the SIMDs of all the computing units in the current shader input unit are queried. The shader input unit in the figure can select P register segments from the register segments in the idle state shown in fig. 3 according to the number P of register segments that need to be used. The P register segments selected are not limited to being contiguous. In the case shown in fig. 3, for example, when the number of register segments required to be used by one work item set is 5, the shader input unit may select the 5 register segments with segment numbers SN of 1, 2, 4, 6, and 8.
For example, in some embodiments, the segment number sequence numbers of at least two of the P register segments are not consecutive. For example, as shown in fig. 3 and fig. 9A, when 7 register segments with segment numbers SN of 1, 2, 4, 6, 8, 13 and 14 are in an idle state, and the number of register segments required to be used by the work item set is 5, the shader input unit may select 5 register segments with segment numbers SN of 1, 2, 4, 6 and 8, and then all the segment numbers 2, 4, 6 and 8 in the selected register segments are not consecutive to each other.
For example, as shown in fig. 2, step S210: a number Q of register segments of the N register segments that are currently in an idle state is determined. As shown in fig. 9A, the shader input unit searches the calculation unit SIMD shown in fig. 3, and determines a register segment currently in an idle state of the N register segments, for example, 7 register segments with segment numbers SN of 1, 2, 4, 6, 8, 13, and 14, according to a colored condition of the register segment in the calculation unit SIMD. In the present example, Q is 7.
Step S220: and selecting the P register segments from the Q register segments which are in the idle state at present according to the number P of the register segments needing to be used.
For example, as shown in fig. 3 and fig. 9A, the shader input unit may determine that the number Q of register segments currently in the idle state is greater than the number P of register segments that need to be used according to the number P of register segments that need to be used by the current work item set, for example, P is 5. And the shader input unit selects P register segments from the register segments in the idle state at present according to the number P of the register segments required to be used.
For example, in some embodiments, among the register segments currently in the idle state, the shader input unit sequentially selects, starting from the register segment in the idle state with the smallest or largest segment number, in ascending or descending order of segment numbers, to obtain P register segments. For example, as shown in fig. 3 and 9A, when 7 register segments with segment numbers SN of 1, 2, 4, 6, 8, 13, and 14 are in an idle state, the shader input unit may select P register segments in ascending order of segment numbers, starting from the register segment in the idle state with the smallest segment number. That is, the shader input unit selects 5 register segments with segment numbers SN of 1, 2, 4, 6, and 8. Alternatively, the shader input unit selects 5 register segments with segment numbers SN of 14, 13, 8, 6, and 5. The disclosed embodiments are not limited by the order in which register segments need to be used for a particular selection of a set of work items.
For example, as shown in fig. 1, step S130: the selected P register segments are virtualized as sequentially contiguous storage space for running the current work item set. For example, when the segment number sequence numbers of at least two of the selected P registers are not consecutive, the current work item set cannot be directly run, and the selected P register segments need to be virtualized as a sequentially consecutive storage space.
For example, as shown in fig. 2, step S230: after selecting the P register segments, a number string is obtained which records the segment numbers of the P register segments in the N register segments.
For example, as shown in fig. 4, when the colorant input unit selects 5 register segments with segment numbers SN of 1, 2, 4, 6, and 8 to be allocated to the current work item set, the number string B1 of the segment number of the register segment that needs to be used (i.e., the mask _ mask) is 16 "b0000 _0001_0101_0110 (where 16 represents 16 bits in total, B represents binary, and lower bits in the array represent smaller segment numbers). That is, the register segment that needs to be allocated is marked as 1, and the other register segments are marked as 0.
Step S240: the P register segments are virtualized using a numbering string into a sequentially contiguous memory space to address the P register segments for running the current set of work items.
For example, as shown in fig. 4 and 9A, the shader input unit, when sending a command to sequencer SQ to create the current work item set, also sends a number string B1 that records the segment numbers of the P register segments of the current work item set, each among the N register segments. When the value of the number string B1 (i.e., mask _ mask) of the segment number of the register segment to be used is 16 "b0000 _0001_0101_0110, since the segment numbers 2, 4, 6 and 8 in the selected register segment are all discontinuous, the sequencer SQ in fig. 9A virtualizes the (mask _ mask) of the number string B1 as a continuous number to continuously address P register segments for running the current work item set. Thus, discrete registers can be addressed continuously without adding too many logic circuits, so as to improve the efficiency of register resource allocation when the current work item set is operated.
Fig. 5 is a flowchart illustrating a resource allocation method according to at least one further embodiment of the present disclosure; FIG. 6 is a diagram illustrating a rearrangement of register segments that may be required for use in accordance with at least one embodiment of the present disclosure; fig. 7 is a diagram of obtaining physical addresses of a set of registers to be used according to at least one embodiment of the present disclosure.
The resource allocation method shown in fig. 5 includes steps S310-S360. The detailed process of addressing the P register segments will be described in conjunction with fig. 4, 5, 6, 7, and 9A.
Step S310: and obtaining the segment starting physical address of each of the P register segments according to the number string.
For example, in some embodiments, sequencer SQ may derive the segment start physical address ST of each of the P register segments from the number string B1 (i.e., the mask _ mask), as shown in fig. 4 and 9A. For example, sequencer SQ calculates the register with the smallest register number in the register segment with the corresponding position of 1 in each number string B1 (i.e., mask _ mask), records the register with the array vgpr _ physical _ base [15 ] [5:0] and indexes with the segment number of the corresponding register segment. The code for calculating the segment start physical address ST of each register segment is as follows:
For(i=0;i<16;i=i+1)
vgpr_physical_base[i][5:0]={6{allocate_mask[i]}}&(4*i)。
the array of segment start physical addresses ST is vgpr physical base.
For example, in some embodiments, the segment starting physical address of each of the P register segments corresponds to the register with the smallest register number sequence number in each of the P register segments. For example, as shown in fig. 4, the segment start physical address ST of the number string B1 (i.e., mask _ mask) is vgpr 4, vgpr 8, vgpr16, vgpr 24, and vgpr 32. For example, the segment start physical address ST of the segment number 1 register segment in the number string B1 (i.e., the mask _ mask) corresponds to the register with register number 4. For example, the segment start physical address ST of the segment number 2 register segment corresponds to the register number 8 register.
For example, in some embodiments, for a certain instruction included in the current work item set, according to the number of registers of a set of registers that need to be used in the storage space by the operand of the instruction, the physical address translation is performed using the segment numbers of the P register segments and the segment start physical addresses, so as to obtain the physical addresses of the set of registers.
Step S320: the segment numbers of the P register segments are arranged into a first sequence, and the segment initial physical addresses of the P register segments are correspondingly arranged into a second sequence.
For example, as shown in fig. 6 and 9A, the sequencer SQ rearranges the respective number strings B1 (i.e., masks _ masks) of the P register segments into a first sequence S1 in the order of the segment number sequence numbers of the register segments from small to large. The sequencer SQ arranges the segment start physical addresses ST of the respective P register segments corresponding to the first sequence S1 as a second sequence S2. That is, the segment start physical addresses ST are rearranged into the second sequence S2 in the order from the smaller to the larger of the register numbers of the registers VGPR having the smallest register number numbers in each of the P register segments. In FIG. 6, the first sequence S1 is 16 "b0000 _0000_0001_1111 (where 16 represents a total of 16 bits, b represents binary, and lower bits in the array represent smaller segment numbers).
For example, in other embodiments, the sequencer SQ may rearrange the number string B1 (mask _ mask) of the segment numbers of the respective P register segments into the first sequence S1 in order of the segment number sequence numbers of the register segments from large to small.
For example, as shown in fig. 6, by arranging the number string B1 (i.e., the mask _ mask) into the first sequence S1 in order from low to high without changing the 1-bit sequence, the array vgpr _ physical _ base of the segment start physical address ST is also newly arranged along with the mask _ mask, so as to obtain the mask filtered _ allocate _ mask corresponding to the first sequence S1, and obtain the array filtered _ vgpr _ physical _ base corresponding to the second sequence S2. For example, the implementation code of the mask filtered _ allocate _ mask and the array filtered _ vgpr _ physical _ base is as follows:
Figure GDA0003847171870000111
step S330: the starting physical address of the set of registers that the operand needs to use in the memory space is obtained from the first sequence, the second sequence and the operand of the instruction. For example, sequencer SQ derives operands by parsing instructions of a work item set.
For example, in some embodiments, the operand of the instruction of the current work item set has a value of 6 "d17 (6" b01 \u0001), and the sequencer SQ obtains the starting physical address of a set of registers that the operand needs to use in the storage space according to the first sequence, the second sequence, and the operand of the instruction. The starting physical address of the set of registers that the operand needs to use in memory space can be calculated by:
assuming that a source operand (vsrc 0/1/2) of a vector instruction has a value of 6 "d 17 (6" b01 _0001), the starting physical address of the corresponding set of registers can be calculated as follows:
vgpr_physical_addr={6{filtered_allocated_mask[src[5:2]]}}&
(filtered_vgpr_physical_base[src[5:2]]+src[1:0])
VGPR _ physical _ addr denotes a register number of the start physical address of the set of registers VGPR. The starting physical address at which the source operand accesses the set of registers VGPR should be VGPR 33 (shown in fig. 6).
It should be noted that, in fig. 9A, the sequencer 0SQ decodes the vector instruction of the work item set to obtain the values of the source operand vsrc0/1/2, the destination operand vdst, the data operand vdata, and the address operand field vaddr in the instruction data. Sequencer 0SQ derives a physical address offset m0_ gpr _ idx of a register VGPR from other types of registers, such as an instruction pointer register.
For example, in some embodiments, the starting physical address of the set of registers corresponds to the register number of the set of registers with the smallest register number sequence number. As shown in fig. 7, the operand requires a set of registers with register numbers 27, 32, 33 and 34, respectively, and the starting physical address of the set of registers is vgpr 27.
For example, in some embodiments, the set of registers identified above are located in two segmentally numbered non-consecutive register segments of the P register segments. When VGPR physical address conversion is carried out on an instruction operand, the position of the register number of the 1 st register VGPR in the group of registers in the register section needs to be judged, and then the condition that the number of the VGPR is over the register section is obtained according to the number of the registers VGPR in the group of registers.
For example, in embodiments of the present disclosure, the register segments assigned to the current work item set may no longer be contiguous, and situations may arise where the set of registers that need to be used by operands in the memory space are located in non-contiguous register segments. As shown in fig. 7, when the 1 st register VGPR (e.g. VGPR 27) of the group of registers is at the last position of its register segment (e.g. register segment 6), the 2 nd register VGPR needs to span the register segment, that is, the segment number 8 of the register segment where the 2 nd register VGPR is located is not consecutive to the segment number 6 of the register segment where the 1 st register VGPR is located. And the spanned register segment is not actually contiguous with the previous register segment.
Fig. 8 is a diagram of obtaining physical addresses of a set of registers that need to be used according to at least another embodiment of the present disclosure. As shown in fig. 8, when the 2 nd register VGPR (e.g. VGPR 27) of the group of registers is at the last position of its register segment (e.g. register segment 6), the 3 rd register VGPR needs to span the register segment, that is, the segment number 8 of the register segment where the 3 rd register VGPR is located is not consecutive to the segment number 6 of the register segment where the 2 nd register VGPR is located. And the spanned register segment is not actually contiguous with the previous register segment.
Step S340: in response to the set of registers being located in two piecewise numbered non-contiguous register segments of the P register segments.
Step S350: the location of the starting physical address of the set of registers in the register segment with the smaller segment number of the two register segments and the spacing between the two register segments are determined.
For example, in some embodiments, the location of the starting physical address of a set of registers in the segment of the two register segments with the smaller segment number is determined based on the starting physical address of the set of registers and the number of registers in each register segment.
As shown in fig. 7, when the segment number 8 of the register segment in which the 2 nd register VGPR is located is not consecutive to the segment number 6 of the register segment in which the 1 st register VGPR is located, the starting physical address VT of one group of registers, that is, the physical address of the 1 st register VGPR is VGPR 27, and the register segment with the smaller segment number (sequence number 6) among the two segment numbers (sequence number 6 and sequence number 8) is located at the last position among the 4 register segments. The physical address of the first register VGPR is 27, then a division of 27 by 4 yields a remainder of 3, so the first register VGPR is at the last position of the register segment. When the operand of the instruction requires the use of 4 registers VGPR, it follows that there is cross-register segment behavior for the 2 nd register VGPR. The case of a cross-register segment of a set of registers may be recorded with slot _ across _ mask [2:0] =3 ″ -b001 (where 3 represents a total of 3 bits, b represents a binary, and the highest bit in the array represents the 4 th register VGPR, the middle bit represents the 3 rd register VGPR, and the lowest bit represents the 2 nd register VGPR). That is, the 2 nd register VGPR, where there is a cross register segment behavior, is marked as 1, and the 3 rd register VGPR and the 4 th register VGPR are marked as 0.
As shown in fig. 8, when the segment number 8 of the register in which the 3 rd register VGPR is located is not consecutive to the segment number of the register in which the 2 nd register VGPR is located, the starting physical address VT of one group of registers, that is, the physical address of the 1 st register VGPR is VGPR 26, and the register segment with the smaller segment number (sequence number 6) among the two segment numbers (sequence numbers 6 and 8) is located at the 3 rd. The physical address of the leading register VGPR is 26, then 26 divided by 4 is 2, so the leading register VGPR is in the 3 rd position of the register segment. When the operand of the instruction requires the use of 4 registers VGPR, it follows that there is cross-register segment behavior for the 3 rd register VGPR. The case of a cross-register segment of a set of registers can be recorded with slot _ across _ mask [2:0] =3 "b010. That is, the 3 rd register VGPR, where there is a cross register segment behavior, is marked as 1, and the 2 nd register VGPR and the 4 th register VGPR are marked as 0.
For example, in some embodiments, a set of registers is determined to be located in two register segments based on the location of the starting physical address of the set of registers in the segment-numbered smaller of the two register segments and the amount of memory of the set of registers. For example, in addition to knowing that the register segment exists for the number of registers VGPR in a group of registers, it is also necessary to know the number of register segments actually separated by the register segment with larger segment number and the register segment with smaller segment number in two discrete register segments, which can be recorded by slot _ stride [3:0 ]. For example, as shown in fig. 7, when there is cross-register segment behavior in the 2 nd register VGPR and 4 registers VGPR are needed for operands of an instruction, segment numbers of a group of registers in two discontinuous register segments are 6 and 8, respectively, and at this time, the value of the number slot _ stride [3:0] of the register segments with the smaller segment number actually spaced is 2. For example, as shown in fig. 8, when there is cross-register segment behavior in the 3 rd register VGPR and the operand of the instruction needs to use 4 registers VGPR, the segment numbers of the two discontinuous register segments of one group of registers are 6 and 8, respectively, and at this time, the value of the number slot _ stride [3:0] of the register segments actually spaced by the register segment with smaller segment number is 2.
For example, as shown in fig. 5, step S360: and obtaining the physical addresses of the registers in the group except the register corresponding to the initial physical address by using the position and the distance of the initial physical address of the group of registers in the register segment with smaller segment number in the two register segments and combining the initial physical address of the group of registers required to be used by the operand, so as to address the physical addresses of the group of registers in sequence.
For example, as shown in fig. 9A, sequencer SQ sends the starting physical address VT of a set of registers, e.g., a value representing the starting physical address VT in a vgpr _ physical _ addr array, to the SIMD computation unit slot _ across _ mask [2:0] and slot _ stride [3:0 ]. Slot _ across _ mask 2:0 indicates that the first register VGPR exists across the register segments, i.e., the location of the starting physical address of a set of registers in the register segment with the smaller segment number of the two register segments. Slot _ stride [3:0] represents the number of register segments of two discrete register segments that are actually separated by a register segment with a larger segment number and a register segment with a smaller segment number. The computing unit SIMD (e.g., the sub-computing unit paddr calc in the computing unit SIMD map) obtains the physical addresses of the registers in the group of registers except the register corresponding to the starting physical address (i.e., vgpt _ paddr0, vgpt _ paddr1, vgpt _ paddr2vgpt _ paddr3 … …, etc.) according to the starting physical address, the position and the distance. The computation unit SIMD addresses the physical addresses of a set of registers in turn, i.e. virtually a contiguous space of physical addresses of a set of registers. The calculation unit SIMD uses the operand of the instruction as unit to carry out addressing, and when each operand of all the instructions of the work item set completes the addressing work, the current work item set is operated.
In the resource allocation method provided in at least one embodiment of the present disclosure, the selected P register segments are virtualized as sequentially continuous storage spaces, and the segment numbers of the P register segments are no longer required to be continuous, so that the allocation rate of the registers can be increased.
Fig. 9B is a schematic diagram illustrating an implementation process of a resource allocation method according to at least another embodiment of the present disclosure.
For example, in other embodiments, as shown in fig. 9B, when the sequencer SQ parses an instruction, in a case where there is a cross-register segment for the physical addresses of multiple registers VGPR corresponding to the operand of the instruction, the sequencer SQ directly sends the number string of the register segment to be used (i.e., the mask allocate _ mask) received from the shader input unit and the starting physical address VT of a set of registers (e.g., the value representing the starting physical address VT by the VGPR _ physical _ addr array) to the computation unit SIMD. The computation unit SIMD (e.g., the sub-computation unit paddr calc in the computation unit SIMD map) performs steps S310 to S360. Various embodiments of the present disclosure are not limited to the specific manner in which resource allocation is performed.
For example, at least one embodiment of the present disclosure further provides a resource allocation apparatus. Fig. 10 is a schematic diagram of a resource allocation apparatus according to at least one embodiment of the present disclosure.
For example, in some embodiments, as shown in fig. 10, the resource allocation apparatus 400 includes: a register segment number determination module 410, a register segment selection module 420, and a register segment virtual translation module 430.
For example, the register segment number determination module 410 is configured to determine the number P of register segments that need to be used based on the current work item set. For example, when a piece of execution program (e.g., a Kernel program (Kernel), which is a piece of parallel execution program that minimally can clearly express defined functions) is compiled, the compiler will analyze how many registers or register segments the piece of execution program needs to use. The register segment number determination module 410 may learn how many registers or register segments the work item set needs to use when receiving the current work item set.
For example, the register segment selection module 420 is configured to select P register segments from among the register segments currently in an idle state of the N register segments according to the number P of register segments required to be used, wherein segment number sequence numbers of the P register segments are not limited to be consecutive.
For example, in some embodiments, the resource allocation method is used for allocating registers in a computing unit, the computing unit includes M registers arranged according to register numbers, the M registers are sequentially segmented into N register segments according to register number sequence numbers, and the N register segments are positive integers greater than 0 according to the segment number arrangement M, N. As shown in fig. 3, P register segments are selected among the register segments in the idle state in fig. 3. The P register segments selected are not limited to being contiguous. In the case shown in fig. 3, for example, when the number of register segments that one work item set needs to use is 5, the register segment selection module 420 may select the 5 register segments with segment numbers SN of 1, 2, 4, 6, and 8.
For example, the register segment virtualization module 430 is configured to virtualize the selected P register segments as a sequentially contiguous storage space for running the current set of work items. For example, the register segment virtual conversion module 430 obtains a number string recording segment numbers of P register segments in N register segments, respectively, after selecting P register segments. As shown in fig. 4, when the colorant input unit selects 5 register segments with segment numbers SN of 1, 2, 4, 6, and 8 to be allocated to the current work item set, the number string (mask _ mask) of the segment number of the register segment that needs to be used is 16 ″ b0000_0001_0101_0110 (where 16 represents 16 bits in total, b represents binary, and lower bits in the encoding represent smaller segment numbers). That is, the register segment that needs to be allocated is marked as 1, and the other register segments are marked as 0. The register segment virtualization conversion module 430 uses the number string to virtualize the P register segments into a sequentially contiguous storage space to address the P register segments for running the current set of work items.
The resource allocation device provided by the embodiment of the disclosure virtualizes the selected P register segments as sequentially continuous storage spaces, and does not require that segment numbers of the P register segments are continuous, so that the allocation rate of the registers can be increased.
In addition to the above description, there are the following points to be explained:
(1) The drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to the common design.
(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A resource allocation method is used for allocating registers in a computing unit, wherein the computing unit comprises M registers arranged according to register numbers, the M registers are sequentially segmented into N register segments according to the register number sequence numbers, each of the N register segments comprises a plurality of registers with continuous register numbers, the N register segments are arranged according to segment numbers, M, N is a positive integer larger than 0,
the resource allocation method comprises the following steps:
determining the number P of register segments needed to be used according to the current work item set,
selecting P register segments from the register segments of the N register segments which are in idle state currently according to the number P of the register segments required to be used, wherein the segment number sequence numbers of the P register segments are not limited to be continuous,
virtualizing the selected P register segments as sequentially continuous storage space for running the current work item set.
2. The method of claim 1, wherein the segment number indices of at least two of the P register segments are not consecutive.
3. The resource allocation method of claim 2, further comprising:
and in the register segments in the idle state at present, starting from the register segment in the idle state with the minimum or maximum segment number, sequentially selecting the register segments in the idle state according to the ascending order or the descending order of the segment numbers to obtain the P register segments.
4. The resource allocation method according to any of claims 1-3, further comprising:
obtaining a number string recording the segment numbers of the P register segments in the N register segments respectively after the P register segments are selected,
virtualizing the P register segments as the sequentially contiguous storage space using the number string to address the P register segments for running the current set of work items.
5. The method of claim 4, wherein virtualizing the P register segments as the sequentially contiguous storage space using the number string to address the P register segments to run the current set of workitems comprises:
obtaining respective segment start physical addresses of the P register segments according to the number string,
and for an instruction included in the current work item set, performing physical address translation by using the segment numbers of the P register segments and the segment start physical addresses of the P register segments according to the number of registers of a group of registers required to be used by an operand of the instruction in the storage space, so as to obtain the physical addresses of the group of registers.
6. The method of claim 5, wherein the segment starting physical address of each of the P register segments corresponds to a register with a smallest register number sequence number in each of the P register segments.
7. The method of claim 5, wherein using the respective segment numbers and the respective starting physical addresses of the P register segments for physical address translation, further comprises:
arranging the segment numbers of each of the P register segments into a first sequence, correspondingly arranging the segment start physical addresses of each of the P register segments into a second sequence,
and according to the first sequence, the second sequence and the operand of the instruction, obtaining the starting physical address of the group of registers needed to be used by the operand in the storage space.
8. The method of claim 7, wherein the starting physical address of the set of registers corresponds to a register number of the set of registers having a smallest register number sequence number.
9. The method of claim 7, wherein the set of registers is located in two segment-numbered non-consecutive register segments of the P register segments.
10. The resource allocation method of claim 9, further comprising:
in response to the set of registers being located in the two segment-numbered non-consecutive register segments of the P register segments,
determining a position of the starting physical address of the group of registers in a register segment with a smaller segment number of the two register segments and a distance between the two register segments, and addressing using the position and the distance.
11. The resource allocation method according to claim 10,
and determining the position of the starting physical address of the group of registers in the register segment with smaller segment number in the two register segments according to the starting physical address of the group of registers and the number of the registers in each register segment.
12. The method of claim 11, wherein the set of registers is determined to be in the two segments of registers based on a location of the starting physical address of the set of registers in a segment of registers with a smaller segment number of the two segments of registers and a memory size of the set of registers.
13. The method of any of claims 10-12, wherein addressing using the position and the spacing comprises:
and obtaining the physical addresses of the registers in the group except the register corresponding to the initial physical address by using the position and the distance of the initial physical address of the group of registers in the register segment with smaller segment number in the two register segments and combining the initial physical address of the group of registers required to be used by the operand, so as to address the physical addresses of the group of registers in sequence.
14. A resource allocation device is used for allocating registers in a computing unit, wherein the computing unit comprises M registers which are arranged according to register numbers, the M registers are sequentially segmented into N register segments according to the register number serial numbers, each of the N register segments comprises a plurality of registers with continuous register numbers, the N register segments are arranged according to the segment numbers, and M and N are positive integers larger than 0;
the resource allocation apparatus includes:
a register segment number determining module configured to determine the number P of register segments to be used according to the current work item set,
a register segment selection module configured to select P register segments from the register segments currently in an idle state of the N register segments according to the number P of the register segments required to be used, wherein segment number sequence numbers of the P register segments are not limited to be continuous,
and the register segment virtual conversion module is configured to virtualize the selected P register segments into sequentially continuous storage spaces for operating the current work item set.
CN202011333492.8A 2020-11-25 2020-11-25 Resource allocation method and device Active CN112559169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011333492.8A CN112559169B (en) 2020-11-25 2020-11-25 Resource allocation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011333492.8A CN112559169B (en) 2020-11-25 2020-11-25 Resource allocation method and device

Publications (2)

Publication Number Publication Date
CN112559169A CN112559169A (en) 2021-03-26
CN112559169B true CN112559169B (en) 2022-11-08

Family

ID=75043518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011333492.8A Active CN112559169B (en) 2020-11-25 2020-11-25 Resource allocation method and device

Country Status (1)

Country Link
CN (1) CN112559169B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153753A (en) * 2021-12-06 2022-03-08 海光信息技术股份有限公司 Storage resource allocation method and device and non-transitory storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988229A (en) * 2020-08-19 2020-11-24 武汉中元华电软件有限公司 Compression storage and quick searching system and method for IP and MAC address mapping table

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421566B2 (en) * 2005-08-12 2008-09-02 International Business Machines Corporation Implementing instruction set architectures with non-contiguous register file specifiers
US9601199B2 (en) * 2007-01-26 2017-03-21 Intel Corporation Iterator register for structured memory

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111988229A (en) * 2020-08-19 2020-11-24 武汉中元华电软件有限公司 Compression storage and quick searching system and method for IP and MAC address mapping table

Also Published As

Publication number Publication date
CN112559169A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109522254B (en) Arithmetic device and method
JP5102758B2 (en) Method of forming an instruction group in a processor having a plurality of issue ports, and apparatus and computer program thereof
US9477465B2 (en) Arithmetic processing apparatus, control method of arithmetic processing apparatus, and a computer-readable storage medium storing a control program for controlling an arithmetic processing apparatus
JP4292198B2 (en) Method for grouping execution threads
TWI525540B (en) Mapping processing logic having data-parallel threads across processors
US9710306B2 (en) Methods and apparatus for auto-throttling encapsulated compute tasks
CN103226463A (en) Methods and apparatus for scheduling instructions using pre-decode data
JP2017157244A (en) Sort acceleration processors, methods, systems, and instructions
US20240004654A1 (en) Computing System with Hardware and Methods for Handling Immediate Operands in Machine Instructions
JP2008535074A5 (en)
WO2012078735A2 (en) Performing function calls using single instruction multiple data (simd) registers
US10503550B2 (en) Dynamic performance biasing in a processor
CN103207774A (en) Method And System For Resolving Thread Divergences
CN103778072A (en) Efficient memory virtualization in multi-threaded processing unit
JP2005227942A (en) Processor and compiler
CN103279379A (en) Methods and apparatus for scheduling instructions without instruction decode
US20180165092A1 (en) General purpose register allocation in streaming processor
EP2439635B1 (en) System and method for fast branching using a programmable branch table
CN112445616B (en) Resource allocation method and device
CN113454592A (en) Memory management system
CN112559169B (en) Resource allocation method and device
JP2021525919A (en) Scheduler queue allocation
EP2466452B1 (en) Register file and computing device using same
US10754652B2 (en) Processor and control method of processor for address generating and address displacement
US20130332703A1 (en) Shared Register Pool For A Multithreaded Microprocessor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221019

Address after: 610216 building 4, No. 171, hele'er street, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan

Applicant after: CHENGDU HAIGUANG MICROELECTRONICS TECHNOLOGY Co.,Ltd.

Address before: 300392 North 2-204 industrial incubation-3-8, 18 Haitai West Road, Huayuan Industrial Zone, Tianjin

Applicant before: Haiguang Information Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant