WO2024040750A1 - Access control method for scalar processing unit, and scalar processing unit - Google Patents

Access control method for scalar processing unit, and scalar processing unit Download PDF

Info

Publication number
WO2024040750A1
WO2024040750A1 PCT/CN2022/130376 CN2022130376W WO2024040750A1 WO 2024040750 A1 WO2024040750 A1 WO 2024040750A1 CN 2022130376 W CN2022130376 W CN 2022130376W WO 2024040750 A1 WO2024040750 A1 WO 2024040750A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
scalar
access
resource access
instructions
Prior art date
Application number
PCT/CN2022/130376
Other languages
French (fr)
Chinese (zh)
Inventor
李晶晶
Original Assignee
上海登临科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海登临科技有限公司 filed Critical 上海登临科技有限公司
Publication of WO2024040750A1 publication Critical patent/WO2024040750A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Definitions

  • the present application relates to the field of processor technology, and in particular to an access control method for a scalar processing unit, a scalar processing unit, and a resource access system.
  • GPU Graphics Processing Unit
  • display core also known as display core, visual processor, and display chip
  • display chip is a type of processor that is specially used on personal computers, workstations, game consoles, and some mobile devices (such as tablets, smartphones, etc.)
  • a microprocessor that performs image operations.
  • the GPU computing core is based on the SIMD (Single Instruction Multiple Data) hardware architecture to fully improve the parallelism of data operations.
  • SIMD Single Instruction Multiple Data
  • This type of data does not belong to a certain thread. Therefore, for a thread group, the operation of this type of global data only needs to be executed once and does not need to be executed once by each thread in the thread group. This type of data does not belong to a certain thread.
  • the operations are collectively referred to as scalar instruction operations and are sent to the scalar processing unit for execution.
  • the results of the operations are stored in scalar registers in the form of scalars. Limited by the capacity and bandwidth of scalar registers, there is competition for access to scalar register resources. Competition will cause blocking of the scalar processing unit instruction pipeline, thereby reducing the performance of the scalar processing unit.
  • Embodiments of the present application provide an access control method for a scalar processing unit to solve the problem of competition in accessing scalar register resources, resulting in blocking of the instruction pipeline of the scalar processing unit and reduced performance.
  • Embodiments of the present application provide an access control method for a scalar processing unit.
  • the scalar processing unit is connected to a resource access requester external to the scalar processing unit.
  • the method includes:
  • resource access instructions are added to the buffer queue, when the conflict is resolved, the resource access instructions in the buffer queue that do not have resource access conflicts are executed concurrently.
  • the priority of the resource access instruction may be related to the priority of the resource access requester.
  • the priority may be a fixed configuration or a dynamic configuration.
  • determining whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction may include:
  • the resource access instruction is added to the buffer queue.
  • the resource access instruction includes a scalar instruction and an access request; if there is a resource access conflict in the resource access instruction, and the priority of the resource access instruction is lower than the priority of the instruction to be executed by the access object, Priority, adding the resource access instructions to the buffer queue, may include:
  • the access request is added to the buffer queue.
  • the scalar processing unit includes a pre-control unit, a scalar register group, a post-control unit and an arithmetic unit connected in sequence
  • the resource access instructions include scalar instructions
  • the concurrent execution of the buffer queue Resource access instructions without resource access conflicts may include:
  • the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue;
  • the post-control unit temporarily stores the source operand corresponding to each scalar instruction in an internal buffer, and selects the corresponding type of scalar instruction to send to the operation unit according to the type and operating status of the operation unit;
  • the operation unit performs operations on the source operand corresponding to the scalar instruction according to the received scalar instruction.
  • the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
  • the front-end control unit selects multiple scalar instructions without resource access conflicts from the buffer queue according to the principle of selecting different instruction categories;
  • the selected multiple scalar instructions are sent in parallel to the scalar register corresponding to each instruction type in the scalar register group, so as to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  • the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
  • the front-end control unit selects scalar instructions without resource access conflicts from the buffer queue in order according to the first-in, first-out principle
  • the selected plurality of scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  • the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
  • the pre-control unit randomly selects scalar instructions without resource access conflicts from the buffer queue
  • the selected scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  • selecting a corresponding type of scalar instruction and sending it to the computing unit may include:
  • a scalar instruction of the corresponding type is selected and sent to the computing unit in the idle state.
  • the resource access requester includes an instruction scheduling unit or a scalar access client
  • the resource access instruction includes a scalar instruction sent by the instruction scheduling unit or an access request sent by the scalar access client
  • Determining whether a resource access conflict exists in the resource access instruction according to the access object of the resource access instruction may include:
  • the access object of the resource access instruction determine whether the access object of the resource access instruction has unfinished instructions
  • the access object has unfinished instructions, it is determined that the resource access instruction has a resource access conflict.
  • the resource access requester includes an instruction scheduling unit and a scalar access client
  • the resource access instruction includes a scalar instruction sent by the instruction scheduling unit and an access request sent by the scalar access client
  • Determining whether a resource access conflict exists in the resource access instruction according to the access object of the resource access instruction may include:
  • first access object and the second access object are different, determine whether there is a resource access conflict in the scalar instruction and the access request respectively.
  • determining whether there is a resource access conflict in each of the scalar instruction and the access request may include:
  • the second access object has unfinished instructions, it is determined that the access request contains a resource access conflict.
  • determining whether there is a resource access conflict in each of the scalar instruction and the access request may include:
  • the unexecuted instructions include instructions being executed and instructions to be executed.
  • the embodiment of the present application also provides a scalar processing unit, which may include:
  • a front-end control unit configured to connect to a resource access requester outside the scalar processing unit, receive a resource access instruction sent by the resource access requester, and determine whether the resource access instruction is based on the access object of the resource access instruction. There is a resource access conflict; if there is a resource access conflict in the resource access instruction, it is determined whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction; if the resource access instruction is added to the buffer queue, when When the conflict is resolved, resource access instructions that do not have resource access conflicts in the buffer queue are simultaneously sent;
  • a scalar register group connected to the pre-control unit, used to receive the resource access instruction and obtain the internally stored source operand corresponding to the resource access instruction;
  • a post-control unit connected to the scalar register, is used to temporarily store the source operand corresponding to each scalar instruction in an internal buffer when the resource access instruction includes a scalar instruction, and perform the operation according to the type and operation of the arithmetic unit. Status, select the corresponding type of scalar instruction and send it to the computing unit;
  • the operation unit is connected to the post-control unit and is used to perform operations on the source operands corresponding to the scalar instructions according to the received scalar instructions.
  • the embodiment of this application also provides a resource access system, which may include:
  • the resource access instructions include scalar instructions and access requests
  • An instruction dispatch unit is connected to the scalar processing unit and used to send scalar instructions to the scalar processing unit;
  • a scalar access client is connected to the scalar processing unit and used to send an access request to the scalar processing unit.
  • resource access instructions with resource access conflicts can quickly alleviate the instruction backlog in the buffer queue, compensate for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improve scalar register bandwidth utilization, and quickly alleviate subsequent potential resource access conflicts.
  • Figure 1 is a schematic architectural diagram of a resource access system provided by an embodiment of the present application.
  • Figure 2 is a schematic flow chart of an access control method for a scalar processing unit provided by an embodiment of the present application
  • FIG. 3 is a detailed flow chart of step 220 in the corresponding embodiment of Figure 2;
  • Figure 4 is a detailed flow chart of step 240 in the embodiment corresponding to Figure 2.
  • FIG. 1 is a schematic architectural diagram of a resource access system provided by an embodiment of the present application.
  • the resource access system may include a scalar processing unit 110, an instruction dispatch unit 120, and a scalar access client 130.
  • Scalar processing unit 110 may connect instruction dispatch unit 120 and scalar access client 130 .
  • the resource access system described above is located inside the GPU.
  • Scalar access client 130 may send an access request to scalar processing unit 110 .
  • the scalar processing unit 110 may process the access requests sent by the scalar access client 130 in order according to the priority of the scalar access client 130 .
  • the priority of the scalar access client 130 can be fixed or dynamically configured.
  • the scalar processing unit 110 may return the data to the scalar access client 130 .
  • the instruction scheduling unit 120 may receive instructions sent from outside the resource access system and determine the instruction type according to the decoding information of the instruction.
  • the instruction type may be a scalar instruction or a vector instruction.
  • the basic operation object of vector instructions is vector, that is, a group of numbers arranged in order.
  • the operation object of scalar instructions is a single number.
  • Instruction dispatch unit 120 may send scalar instructions to scalar processing unit 110 .
  • the scalar processing unit 110 can obtain the source operand corresponding to the scalar instruction from the internal scalar register, cache the source operand, and perform processing on the source operand by the internal arithmetic unit corresponding to the scalar instruction. Operation.
  • the scalar processing unit 110 includes a pre-control unit 111 , a scalar register group 112 , a post-control unit 113 and an arithmetic unit 114 .
  • the front-end control unit 111 may be used to connect a resource access requester external to the scalar processing unit 110 .
  • Resource access requestors may include instruction dispatch unit 120 and/or scalar access client 130.
  • the front-end control unit 111 may be configured to receive resource access instructions sent by the resource access requester; the resource access instructions may include scalar instructions sent by the instruction scheduling unit 120 and/or access requests sent by the scalar access client 130 .
  • the front-end control unit 111 can determine whether the resource access instruction has a resource access conflict according to the access object of the resource access instruction; if the resource access instruction has a resource access conflict, it determines whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction. , where the priority of the resource access instruction may be related to the priority of the resource access requester. For example, if the priority of the scalar access client 130 is higher than the priority of the instruction dispatch unit 120, then the priority of the access request is higher than the priority of the scalar instruction.
  • the priority of a certain scalar access client X is higher than the priority of other scalar access clients 130 , then the priority of the access request of this scalar access client X is higher than the priority of the access request of other scalar access clients 130 .
  • the resource access instruction is added to the buffer queue, when the conflict is resolved, the resource access instruction without resource access conflict in the buffer queue is simultaneously sent to the scalar register group 112 .
  • the execution process of the front-end control unit 111 is detailed in the method embodiment below and will not be described again here. When the conflict is resolved, the front-end control unit 111 simultaneously issues multiple non-conflicting instructions.
  • the scalar register group 112 can be connected to the front-end control unit 111.
  • the scalar register group 112 can include multiple scalar registers, can be used to receive resource access instructions, and can obtain source operands corresponding to internally stored resource access instructions. For example, the source operand stored corresponding to the address information can be obtained based on the address information contained in the resource access instruction.
  • the front-end control unit 111 returns the obtained source operand to the scalar access client 130.
  • the post control unit 113 can be connected to the scalar register group and the operation unit 114, and can be used to temporarily store the source operand corresponding to each scalar instruction in an internal buffer when the resource access instruction includes a scalar instruction, and perform the operation according to the operation unit 114 Based on the type and running status, a scalar instruction of the corresponding type is selected and sent to the computing unit 114 .
  • the types of operation unit 114 may include logical operation type, multiplication operation type, branch control type and memory access type.
  • the running status can be busy status and idle status.
  • scalar instructions of the same type as those of the arithmetic unit 114 can be sent to the arithmetic unit 114 in the idle state.
  • Multiple scalar instructions can share the buffer of the post-control unit 113, thus saving hardware overhead and reducing power consumption.
  • the operation unit 114 may be configured to perform operations on the source operands corresponding to the scalar instruction according to the received scalar instruction.
  • This solution does not increase the computing resource overhead of the computing unit 114, but by adding a small amount of shared buffers and control logic, multiple scalar instructions can be executed concurrently, thus effectively alleviating the blocking of the scalar instruction pipeline caused by pre-resource access conflicts. and the backlog of scalar instructions.
  • cached scalar instructions can be executed with the highest degree of concurrency, maximizing the use of scalar register bandwidth, and clearing the backlog of scalar instruction buffer queues to compensate for the previously lost clock cycles of pipeline blocking.
  • FIG. 2 is a schematic flowchart of an access control method for a scalar processing unit provided by an embodiment of the present application. This method may be performed by scalar processing unit 110.
  • the scalar processing unit 110 can connect to a resource access requester external to the scalar processing unit 110. As shown in FIG. 2, the method includes the following steps S210 to S240.
  • Step S210 Receive the resource access instruction sent by the resource access requester.
  • the resource access requester refers to the sender of the resource access instruction.
  • the resource access requester may be the instruction scheduling unit 120, the scalar access client 130, or both. Therefore, the resource access instruction may be a scalar instruction sent by the instruction scheduling unit 120, or it may be an access request sent by the scalar access client 130, or it may include both a scalar instruction and an access request.
  • the pre-control unit 111 of the scalar processing unit 110 may receive the resource access instruction sent by the resource access requester.
  • the resource access requester includes the instruction dispatch unit 120 and the scalar access client 130, so the resource access instruction may include a scalar instruction and an access request.
  • the above step S210 specifically includes: receiving the scalar instruction sent by the instruction scheduling unit 120 and receiving the access request sent by the scalar access client 130 .
  • Step S220 Determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction.
  • the access object refers to the scalar register specifically accessed by the resource access instruction.
  • Resource access conflict means that the access object of the resource access instruction has unfinished instructions (including instructions to be executed and instructions that are being executed), so the resource access instruction cannot be executed immediately. For example, if the access requests of the scalar access client group 130 all occupy the access bandwidth of the scalar register group 112, the scalar instructions of the instruction scheduling unit 120 will be blocked. At this time, the scalar instructions may be considered to have resource access conflicts.
  • the front-end control unit 111 of the scalar processing unit 110 After receiving the resource access instruction (which may be a scalar instruction or an access request), the front-end control unit 111 of the scalar processing unit 110 decodes and parses the access object from the resource access instruction, and then performs a resource check to determine whether there is a resource access conflict. .
  • the process of resource checking is as follows: according to the access object of the resource access instruction, it can be judged whether the access object of the resource access instruction has unfinished instructions; if the access object has unfinished instructions, it is determined that the resource access instruction has a resource access conflict.
  • the unexecuted instructions may include instructions being executed and instructions to be executed. Instructions to be executed can be temporarily stored in the buffer queue. Unfinished instructions may include access requests or scalar instructions. If the access object has instructions being executed and instructions to be executed in the buffer queue, the resource access instruction is considered to have a resource access conflict. On the contrary, it is considered that there is no resource access conflict and can be executed immediately.
  • Step S230 If there is a resource access conflict in the resource access instruction, determine whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction.
  • the execution sequence (i.e. priority) of scalar instructions and access requests can also be configured in advance.
  • the priority can be fixed or dynamically configured. That is to say, the priority can be always fixed or can be reconfigured with time or user instructions. Resource access instructions with high priority are executed first, so resource access instructions with low priority can be temporarily added to the buffer queue and wait for execution.
  • the resource access instruction is added to the buffer queue.
  • the pre-control unit 111 can compare the priorities between the resource access instructions and the access object instructions to be executed, and if the priority of the resource access instructions is not the highest, add the resource access instructions to the buffer queue. On the contrary, if the resource access instruction has the highest priority and the access object has no instructions being executed, the resource access instruction can be executed immediately. If the access object has an instruction being executed, the resource access instruction can be temporarily added to the buffer queue.
  • Step S240 If the resource access instruction is added to the buffer queue, when the conflict is resolved, concurrently execute the resource access instruction in the buffer queue that does not have a resource access conflict.
  • the conflict can be considered to be resolved. For example, if the access request of the scalar access client 130 fully occupies the access bandwidth of the scalar register group 112, a resource access conflict occurs. When the scalar access client 130 releases part of the scalar register access bandwidth, resource access instructions without resource access conflicts in the buffer queue can be executed concurrently.
  • Concurrent execution refers to selecting resource access instructions without resource access conflicts from the buffered queue for parallel processing. This compensates for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improves scalar register bandwidth utilization, and can also effectively alleviate or eliminate subsequent potential resource access conflicts between the scalar access client 130 group and the scalar instruction pipeline.
  • one or more resource access instructions without resource access conflicts may be selected from the buffer queue according to the scheduling policy and sent to the scalar register group 112 .
  • the scheduling strategy can be to select one or more resource access instructions without resource access conflicts from the buffer queue in order according to the first-in-first-out principle; it can also be randomly selected; it can also be based on the principle of selecting different instruction categories from the buffer queue. Multiple scalar instructions without resource access conflicts are selected from the queue, so that different types of instructions can be processed in parallel.
  • resource access instructions with resource access conflicts can quickly alleviate the instruction backlog in the buffer queue, compensate for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improve scalar register bandwidth utilization, and quickly alleviate subsequent potential resource access conflicts.
  • step S220 specifically includes:
  • Step S221 Based on the first access object of the scalar instruction and the second access object of the access request, determine whether the first access object and the second access object are the same.
  • the first access object refers to the access object of the scalar instruction sent by the instruction dispatch unit 120
  • the second access object refers to the access object of the access request sent by the metric access client 130.
  • they are called the first access object and the second access object respectively.
  • Step S222 If the first access object and the second access object are the same, it is determined that there is a resource access conflict between the scalar instruction and the access request.
  • the priorities between the scalar instruction and the access request can be compared, and the scalar instruction or access request with a lower priority can be added to the buffer queue.
  • the scalar instruction can be added to the buffer queue corresponding to the scalar instruction, and when the conflict is resolved, the scalar instructions in the buffer queue that do not have resource access conflicts are executed concurrently. If the access object has no unfinished instructions, the access request can be executed immediately. If the access object has unfinished instructions, the access request can be temporarily added to the buffer queue corresponding to the access request, and when the conflict is resolved, access requests in the buffer queue that do not have resource access conflicts are concurrently executed.
  • the access request can be added to the buffer queue corresponding to the access request. If the access object has no unfinished instructions, the scalar instruction can be executed immediately.
  • the relative priorities of scalar instructions and access requests can be fixed or dynamically configured.
  • Step S222' If the first access object and the second access object are different, determine whether there is a resource access conflict in the scalar instruction and the access request respectively.
  • Unexecuted instructions include instructions being executed and instructions to be executed. Uncompleted instructions may include previously received access requests and may also include previously received scalar instructions. If the access object of the scalar instruction (ie, the first access object) has an instruction being executed or an instruction to be executed, the scalar instruction has a resource access conflict. On the contrary, if there is no resource access conflict for the scalar instruction, the pre-control unit 111 can immediately send the scalar instruction to the scalar register group 112 . If the access object of the access request (ie, the second access object) has an instruction being executed or an instruction to be executed, the access request has a resource access conflict. On the contrary, if there is no resource access conflict in the access request, the pre-control unit 111 can immediately send the access request to the scalar register group 112 .
  • the access bandwidth of the first access object in order to determine whether there is a resource access conflict between the scalar instruction and the access request, it can also be determined whether the access bandwidth of the first access object is fully occupied. If the access bandwidth of the first access object is fully occupied, then the scalar There is a resource access conflict in the instruction. On the contrary, if the access bandwidth of the first access object is not fully occupied, it is considered that there is no conflict. In the same way, determine whether the access bandwidth of the second access object is fully occupied. If the access bandwidth of the second access object is fully occupied, the access request has a resource access conflict. On the contrary, if the access bandwidth of the second access object is not fully occupied. occupied, it is considered that there is no conflict.
  • the front-end control unit 111 can obtain the data corresponding to the access request and return the data to the scalar access client 130 that sent the access request.
  • the operation unit 114 inside the scalar processing unit 110 needs to perform operations on the source operands corresponding to the scalar instructions, and the operation results can be re-stored in the scalar register.
  • the scalar processing unit 110 includes a pre-control unit 111, a scalar register group 112, a post-control unit 113 and an arithmetic unit 114 connected in sequence.
  • the resource access instruction includes a scalar instruction
  • the above step S240 concurrently executes resource access instructions that do not have resource access conflicts in the buffer queue, specifically including the following steps:
  • Step S241 The front-end control unit 111 obtains the source operand corresponding to each scalar instruction from the scalar register group 112 according to the scalar instructions that do not have resource access conflicts in the buffer queue.
  • the front-end control unit 111 may send scalar instructions that do not have resource access conflicts in the buffer queue to the scalar register group 112 in parallel. After each issued scalar instruction passes one or more clock cycles, all source operands corresponding to each scalar instruction are obtained.
  • the source operand refers to the data at the specified address obtained according to the scalar instruction.
  • the pre-control unit can select multiple scalar instructions without resource access conflicts from the buffer queue according to the principle of selecting different instruction categories; and send the selected multiple scalar instructions to the scalar register group in parallel.
  • the scalar register corresponding to each instruction type is used to obtain the source operand corresponding to each scalar instruction in the scalar register group.
  • scalar instructions can be classified according to their functions (such as logical operation instructions, control instructions). Multiple scalar registers in a scalar register group can also be grouped, and scalar registers in the same group are used for processing. For scalar instructions of the same category, in order to improve access efficiency, scalar instructions of different categories without resource access conflicts can be selected and sent to different groups of scalar registers in parallel. In other embodiments, scalar instructions without resource access conflicts may be selected sequentially according to the first-in-first-out principle; or scalar instructions without resource access conflicts may be randomly selected.
  • Step S242 The post-control unit 113 temporarily stores the source operand corresponding to each scalar instruction in an internal buffer, and selects the corresponding type of scalar instruction and sends it to the operation unit 114 according to the type and operating status of the operation unit 114.
  • the post control unit 113 has a built-in shared buffer.
  • the buffer is shared by all types of scalar instructions.
  • the source operands obtained by all sent instructions are temporarily stored in the buffer. If a certain scalar instruction has obtained all source operands, they are stored in the buffer. Within the buffer, the scalar instruction enters the pending execution state.
  • the post control unit 113 can control the concurrent execution of different types of scalar instructions in a pending execution state.
  • a corresponding type of scalar instruction to be executed may be selected according to the type and operating status of the computing unit 114 and sent to the computing unit 114 .
  • the types of operation unit 114 include logical operation type, multiplication operation type, branch control type, and memory access type.
  • the operating status includes busy state and idle state. Therefore, the scalar instructions of the logical operation type to be executed can be sent to the idle status of the logical operation type.
  • the operation unit 114 of the multiplication operation class may send the scalar instruction of the multiplication operation class to be executed to the idle operation unit 114 of the multiplication operation class, and so on.
  • Step S243 The operation unit 114 operates on the source operand corresponding to the scalar instruction according to the received scalar instruction.
  • the operation units 114 of different types can perform operations on the source operation data corresponding to the scalar instruction, and then return the operation results to the scalar register.
  • different types of scalar instructions can be executed concurrently, alleviating the backlog of instructions and improving processing performance.
  • the technical solution provided by the above embodiments of the present application shares the decoding and resource checking functions of the front control unit 111 and the buffer of the rear control unit 113, thereby saving hardware costs and improving efficiency in a low-cost manner.
  • the concurrency of instruction execution of the scalar processing unit 110 compensates for the performance loss caused by blocking the instruction pipeline due to resource access conflicts, while effectively mitigating subsequent potential resource access conflicts.
  • the blocked scalar instructions can enter the buffer queue of the pre-control unit 111.
  • the scalar processing unit 110 executes multiple scalar register resource accesses with the highest degree of concurrency.
  • Conflicting instructions maximize the use of scalar register access bandwidth, increase the number of instructions executed in a single cycle, compensate for the performance loss caused by pipeline blocking caused by scalar register access conflicts, quickly alleviate the instruction backlog in the buffer queue, and alleviate subsequent potential Resource access violation.
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more executable functions for implementing the specified logical function instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • each functional module in each embodiment of the present application can be integrated together to form an independent part, each module can exist alone, or two or more modules can be integrated to form an independent part.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
  • the present application provides an access control method for a scalar processing unit and a scalar processing unit.
  • the scalar processing unit is connected to a resource access requester external to the scalar processing unit.
  • the method includes: receiving a resource access request sent by the resource access requester. Instruction; determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction; if there is a resource access conflict in the resource access instruction, determine whether to send the resource access instruction according to the priority of the resource access instruction.
  • the resource access instruction is added to the buffer queue; if the resource access instruction is added to the buffer queue, when the conflict is resolved, the resource access instructions in the buffer queue that do not have a resource access conflict are executed concurrently.
  • This solution can quickly alleviate the instruction backlog in the buffer queue, compensate for performance losses caused by resource access conflicts, improve scalar register bandwidth utilization, and alleviate subsequent potential resource access conflicts.
  • the access control method of the scalar processing unit and the scalar processing unit of the present application are reproducible and can be used in various industrial applications.
  • the access control method and scalar processing unit of the present application can be used in a processor that needs to use a scalar processing unit to perform image operations.

Abstract

An access control method for a scalar processing unit, and a scalar processing unit. The scalar processing unit is connected to a resource access requester outside the scalar processing unit. The method comprises: receiving a resource access instruction, which is sent by a resource access requester (S210); according to an access object of the resource access instruction, determining whether there is a resource access conflict in the resource access instruction (S220); if there is a resource access conflict in the resource access instruction, according to the priority of the resource access instruction, determining whether to add the resource access instruction to a buffer queue (S230); and if the resource access instruction is added to the buffer queue, and when the conflict is resolved, concurrently executing the resource access instruction in the buffer queue, in which resource access instruction there is no resource access conflict (S240). By means of the solution, instruction backlogs of a buffer queue can be quickly alleviated, and a performance loss caused by a resource access conflict is compensated for, thereby increasing the utilization rate of a bandwidth of a scalar register, and alleviating subsequent potential resource access conflicts.

Description

标量处理单元的访问控制方法及标量处理单元Access control method of scalar processing unit and scalar processing unit
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年08月26日提交中国国家知识产权局的申请号为202211028891.2、名称为“标量处理单元的访问控制方法及标量处理单元”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202211028891.2 and titled "Access Control Method for Scalar Processing Unit and Scalar Processing Unit" submitted to the State Intellectual Property Office of China on August 26, 2022, the entire content of which is incorporated by reference. incorporated in this application.
技术领域Technical field
本申请涉及处理器技术领域,特别涉及一种标量处理单元的访问控制方法及标量处理单元、资源访问系统。The present application relates to the field of processor technology, and in particular to an access control method for a scalar processing unit, a scalar processing unit, and a resource access system.
背景技术Background technique
图形处理器(Graphics Processing Unit,GPU),又称显示核心、视觉处理器、显示芯片,是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上进行图像运算工作的微处理器。Graphics Processing Unit (GPU), also known as display core, visual processor, and display chip, is a type of processor that is specially used on personal computers, workstations, game consoles, and some mobile devices (such as tablets, smartphones, etc.) A microprocessor that performs image operations.
目前GPU计算核心基于SIMD(Single Instruction Multiple Data,单指令多数据流)硬件架构,以充分提高数据运算的并行度。但是存在一些全局的数据类型,这类数据不属于某个线程,所以对于一个线程组,这类全局数据的运算只需要执行一次而不需要这个线程组内的每个线程都执行一次,这类运算统称标量指令运算并送至标量处理单元执行,运算的结果会以标量的形式存放在标量寄存器内。受限于标量寄存器的容量和带宽,标量寄存器资源的访问存在竞争,竞争会引起标量处理单元指令流水线的阻塞,从而降低标量处理单元性能。Currently, the GPU computing core is based on the SIMD (Single Instruction Multiple Data) hardware architecture to fully improve the parallelism of data operations. However, there are some global data types. This type of data does not belong to a certain thread. Therefore, for a thread group, the operation of this type of global data only needs to be executed once and does not need to be executed once by each thread in the thread group. This type of data does not belong to a certain thread. The operations are collectively referred to as scalar instruction operations and are sent to the scalar processing unit for execution. The results of the operations are stored in scalar registers in the form of scalars. Limited by the capacity and bandwidth of scalar registers, there is competition for access to scalar register resources. Competition will cause blocking of the scalar processing unit instruction pipeline, thereby reducing the performance of the scalar processing unit.
发明内容Contents of the invention
本申请实施例提供了标量处理单元的访问控制方法,用以解决标量寄存器资源的访问存在竞争,导致标量处理单元指令流水线阻塞,性能降低的问题。Embodiments of the present application provide an access control method for a scalar processing unit to solve the problem of competition in accessing scalar register resources, resulting in blocking of the instruction pipeline of the scalar processing unit and reduced performance.
本申请实施例提供了一种标量处理单元的访问控制方法,所述标量处理单元连接所述标量处理单元外部的资源访问请求方,所述方法包括:Embodiments of the present application provide an access control method for a scalar processing unit. The scalar processing unit is connected to a resource access requester external to the scalar processing unit. The method includes:
接收所述资源访问请求方发送的资源访问指令;Receive the resource access instruction sent by the resource access requester;
根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突;Determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction;
若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列;If there is a resource access conflict in the resource access instruction, determine whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction;
若所述资源访问指令被加入缓冲队列,当冲突解除时,并发执行所述缓冲队列中不存在资源访问冲突的资源访问指令。If the resource access instructions are added to the buffer queue, when the conflict is resolved, the resource access instructions in the buffer queue that do not have resource access conflicts are executed concurrently.
在一实施例中,所述资源访问指令的优先级可以与所述资源访问请求方的优先级相关。In an embodiment, the priority of the resource access instruction may be related to the priority of the resource access requester.
在一实施例中,所述优先级可以为固定配置或者动态配置。In an embodiment, the priority may be a fixed configuration or a dynamic configuration.
在一实施例中,所述若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列,可以包括:In one embodiment, if there is a resource access conflict in the resource access instruction, determining whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction may include:
若所述资源访问指令存在资源访问冲突,且所述资源访问指令的优先级低于所述访问对象待执行指令的优先级,将所述资源访问指令加入缓冲队列中。If there is a resource access conflict in the resource access instruction, and the priority of the resource access instruction is lower than the priority of the instruction to be executed by the access object, the resource access instruction is added to the buffer queue.
在一实施例中,所述资源访问指令包括标量指令和访问请求;所述若所述资源访问指令存在资源访问冲突,且所述资源访问指令的优先级低于所述访问对象待执行指令的优先级,将所述资源访问指令加入缓冲队列中,可以包括:In one embodiment, the resource access instruction includes a scalar instruction and an access request; if there is a resource access conflict in the resource access instruction, and the priority of the resource access instruction is lower than the priority of the instruction to be executed by the access object, Priority, adding the resource access instructions to the buffer queue, may include:
若所述标量指令与所述访问请求之间存在资源访问冲突,比较所述标量指令与所述访问请求之间的优先级;If there is a resource access conflict between the scalar instruction and the access request, compare the priorities between the scalar instruction and the access request;
若所述标量指令的优先级比所述访问请求的优先级低,将所述标量指令加入缓冲队列;If the priority of the scalar instruction is lower than the priority of the access request, add the scalar instruction to the buffer queue;
若所述访问请求的优先级比所述标量指令的优先级低,将所述访问请求加入缓冲队列。If the priority of the access request is lower than the priority of the scalar instruction, the access request is added to the buffer queue.
在一实施例中,所述标量处理单元包括依次连接的前置控制单元、标量寄存器组、后置控制单元和运算单元,所述资源访问指令包括标量指令,所述并发执行所述缓冲队列中不存在资源访问冲突的资源访问指令,可以包括:In one embodiment, the scalar processing unit includes a pre-control unit, a scalar register group, a post-control unit and an arithmetic unit connected in sequence, the resource access instructions include scalar instructions, and the concurrent execution of the buffer queue Resource access instructions without resource access conflicts may include:
所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数;The pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue;
所述后置控制单元将每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元的类型和运行状态,选取对应类型的标量指令发送到所述运算单元;The post-control unit temporarily stores the source operand corresponding to each scalar instruction in an internal buffer, and selects the corresponding type of scalar instruction to send to the operation unit according to the type and operating status of the operation unit;
所述运算单元根据接收到的标量指令,对所述标量指令对应的源操作数进行运算。The operation unit performs operations on the source operand corresponding to the scalar instruction according to the received scalar instruction.
在一实施例中,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,可以包括:In one embodiment, the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
所述前置控制单元按照选取不同指令类别的原则从所述缓冲队列中选取多条不存在资源访问冲突的标量指令;The front-end control unit selects multiple scalar instructions without resource access conflicts from the buffer queue according to the principle of selecting different instruction categories;
将选取的多条标量指令并行发送到所述标量寄存器组中每条指令类别对应的标量寄存器,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected multiple scalar instructions are sent in parallel to the scalar register corresponding to each instruction type in the scalar register group, so as to obtain the source operand corresponding to each scalar instruction from the scalar register group.
在一实施例中,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,可以包括:In one embodiment, the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
所述前置控制单元按照先进先出原则从所述缓冲队列中按序选取不存在资源访问冲突的标量指令;The front-end control unit selects scalar instructions without resource access conflicts from the buffer queue in order according to the first-in, first-out principle;
将选取的多条标量指令并行发送到所述标量寄存器组,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected plurality of scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
在一实施例中,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,可以包括:In one embodiment, the pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue, which may include:
所述前置控制单元从所述缓冲队列中随机选取不存在资源访问冲突的标量指令;The pre-control unit randomly selects scalar instructions without resource access conflicts from the buffer queue;
将选取的标量指令并行发送到所述标量寄存器组,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
在一实施例中,所述运算单元有多个,所述根据运算单元的类型和运行状态,选取对应类型的标量指令发送到所述运算单元,可以包括:In one embodiment, there are multiple computing units. According to the type and operating status of the computing unit, selecting a corresponding type of scalar instruction and sending it to the computing unit may include:
根据每个运算单元的类型和运行状态,选取对应类型的标量指令发送到处于空闲状态的运算单元。According to the type and operating status of each computing unit, a scalar instruction of the corresponding type is selected and sent to the computing unit in the idle state.
在一实施例中,所述资源访问请求方包括指令调度单元或标量访问客户端,所述资源访问指令包括所述指令调度单元发送的标量指令或所述标量访问客户端发送的访问请求;所述根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突,可以包括:In one embodiment, the resource access requester includes an instruction scheduling unit or a scalar access client, and the resource access instruction includes a scalar instruction sent by the instruction scheduling unit or an access request sent by the scalar access client; Determining whether a resource access conflict exists in the resource access instruction according to the access object of the resource access instruction may include:
根据所述资源访问指令的访问对象,判断所述资源访问指令的访问对象是否有未执行完毕的指令;According to the access object of the resource access instruction, determine whether the access object of the resource access instruction has unfinished instructions;
若所述访问对象有未执行完毕的指令,确定所述资源访问指令存在资源访问冲突。If the access object has unfinished instructions, it is determined that the resource access instruction has a resource access conflict.
在一实施例中,所述资源访问请求方包括指令调度单元和标量访问客户端,所述资源访问指令包括所述指令调度单元发送的标量指令和所述标量访问客户端发送的访问请求;所述根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突,可以包括:In one embodiment, the resource access requester includes an instruction scheduling unit and a scalar access client, and the resource access instruction includes a scalar instruction sent by the instruction scheduling unit and an access request sent by the scalar access client; Determining whether a resource access conflict exists in the resource access instruction according to the access object of the resource access instruction may include:
根据所述标量指令的第一访问对象和所述访问请求的第二访问对象,判断所述第一访问对象和所述第二访问对象是否相同;Determine whether the first access object and the second access object are the same according to the first access object of the scalar instruction and the second access object of the access request;
若所述第一访问对象和所述第二访问对象相同,确定所述标量指令和所述访问请求之间存在资源访问冲突;If the first access object and the second access object are the same, it is determined that there is a resource access conflict between the scalar instruction and the access request;
若所述第一访问对象和所述第二访问对象不相同,分别判断所述标量指令和所述访问请求各自是否存在资源访问冲突。If the first access object and the second access object are different, determine whether there is a resource access conflict in the scalar instruction and the access request respectively.
在一实施例中,所述分别判断所述标量指令和所述访问请求各自是否存在资源访问冲突,可以包括:In one embodiment, determining whether there is a resource access conflict in each of the scalar instruction and the access request may include:
分别判断所述第一访问对象和第二访问对象是否有未执行完毕的指令;Determine whether the first access object and the second access object have unfinished instructions respectively;
若所述第一访问对象有未执行完毕的指令,确定所述标量指令存在资源访问冲突;If the first access object has unfinished instructions, it is determined that the scalar instruction has a resource access conflict;
若所述第二访问对象有未执行完毕的指令,确定所述访问请求存在资源访问冲突。If the second access object has unfinished instructions, it is determined that the access request contains a resource access conflict.
在一实施例中,所述分别判断所述标量指令和所述访问请求各自是否存在资源访问冲 突,可以包括:In one embodiment, determining whether there is a resource access conflict in each of the scalar instruction and the access request may include:
分别判断所述第一访问对象的访问带宽和所述第二访问对象的访问带宽是否全部被占用;Determine whether the access bandwidth of the first access object and the access bandwidth of the second access object are all occupied;
若所述第一访问对象的访问带宽全部被占用,确定所述标量指令存在资源访问冲突;If the access bandwidth of the first access object is all occupied, it is determined that the scalar instruction has a resource access conflict;
若所述第二访问对象的访问带宽全部被占用,确定所述访问请求存在资源访问冲突。If all the access bandwidth of the second access object is occupied, it is determined that the access request contains a resource access conflict.
在一实施例中,所述未执行完毕的指令包括正在执行的指令和待执行的指令。In one embodiment, the unexecuted instructions include instructions being executed and instructions to be executed.
本申请实施例还提供了一种标量处理单元,可以包括:The embodiment of the present application also provides a scalar processing unit, which may include:
前置控制单元,用于连接所述标量处理单元外部的资源访问请求方,接收所述资源访问请求方发送的资源访问指令;根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突;若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列;若所述资源访问指令被加入缓冲队列,当冲突解除时,同时发送所述缓冲队列中不存在资源访问冲突的资源访问指令;A front-end control unit, configured to connect to a resource access requester outside the scalar processing unit, receive a resource access instruction sent by the resource access requester, and determine whether the resource access instruction is based on the access object of the resource access instruction. There is a resource access conflict; if there is a resource access conflict in the resource access instruction, it is determined whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction; if the resource access instruction is added to the buffer queue, when When the conflict is resolved, resource access instructions that do not have resource access conflicts in the buffer queue are simultaneously sent;
标量寄存器组,连接所述前置控制单元,用于接收所述资源访问指令,获取内部存储的所述资源访问指令对应的源操作数;A scalar register group, connected to the pre-control unit, used to receive the resource access instruction and obtain the internally stored source operand corresponding to the resource access instruction;
后置控制单元,连接所述标量寄存器,用于当所述资源访问指令包括标量指令时,将每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元的类型和运行状态,选取对应类型的标量指令发送到运算单元;A post-control unit, connected to the scalar register, is used to temporarily store the source operand corresponding to each scalar instruction in an internal buffer when the resource access instruction includes a scalar instruction, and perform the operation according to the type and operation of the arithmetic unit. Status, select the corresponding type of scalar instruction and send it to the computing unit;
所述运算单元,连接所述后置控制单元,用于根据接收到的标量指令,对所述标量指令对应的源操作数进行运算。The operation unit is connected to the post-control unit and is used to perform operations on the source operands corresponding to the scalar instructions according to the received scalar instructions.
本申请实施例还提供了一种资源访问系统,可以包括:The embodiment of this application also provides a resource access system, which may include:
上述实施例所述的标量处理单元;所述资源访问指令包括标量指令和访问请求;The scalar processing unit described in the above embodiments; the resource access instructions include scalar instructions and access requests;
指令调度单元,连接所述标量处理单元,用于发送标量指令到所述标量处理单元;An instruction dispatch unit is connected to the scalar processing unit and used to send scalar instructions to the scalar processing unit;
标量访问客户端,连接所述标量处理单元,用于发送访问请求到所述标量处理单元。A scalar access client is connected to the scalar processing unit and used to send an access request to the scalar processing unit.
本申请上述实施例提供的技术方案,当资源访问指令存在资源访问冲突时,根据资源访问指令的优先级,确定是否将资源访问指令加入缓冲队列,在冲突解除时,并发执行缓冲队列中不存在资源访问冲突的资源访问指令,从而快速缓解缓冲队列的指令积压,补偿因为资源访问冲突导致的指令流水线阻塞带来的性能损失,提高标量寄存器带宽利用率,快速缓解后续潜在的资源访问冲突。According to the technical solution provided by the above embodiments of the present application, when a resource access conflict exists in a resource access instruction, it is determined whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction. When the conflict is resolved, there is no concurrent execution buffer queue. Resource access instructions with resource access conflicts can quickly alleviate the instruction backlog in the buffer queue, compensate for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improve scalar register bandwidth utilization, and quickly alleviate subsequent potential resource access conflicts.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings required to be used in the embodiments of the present application will be briefly introduced below.
图1为本申请实施例提供的资源访问系统的架构示意图;Figure 1 is a schematic architectural diagram of a resource access system provided by an embodiment of the present application;
图2为本申请实施例提供的一种标量处理单元的访问控制方法的流程示意图;Figure 2 is a schematic flow chart of an access control method for a scalar processing unit provided by an embodiment of the present application;
图3是图2对应实施例中步骤220的细节流程图;Figure 3 is a detailed flow chart of step 220 in the corresponding embodiment of Figure 2;
图4是图2对应实施例中步骤240的细节流程图。Figure 4 is a detailed flow chart of step 240 in the embodiment corresponding to Figure 2.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。Similar reference numerals and letters refer to similar items in the following figures, so that once an item is defined in one figure, it does not need further definition or explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", etc. are only used to differentiate the description and cannot be understood as indicating or implying relative importance.
图1为本申请实施例提供的资源访问系统的架构示意图。资源访问系统可以包括标量处理单元110、指令调度单元120和标量访问客户端130。标量处理单元110可以连接指令调度单元120和标量访问客户端130。上述资源访问系统位于GPU内部。Figure 1 is a schematic architectural diagram of a resource access system provided by an embodiment of the present application. The resource access system may include a scalar processing unit 110, an instruction dispatch unit 120, and a scalar access client 130. Scalar processing unit 110 may connect instruction dispatch unit 120 and scalar access client 130 . The resource access system described above is located inside the GPU.
标量访问客户端130可以有一个或多个,标量访问客户端130是指GPU内部需要访问标量寄存器组112的模块。标量访问客户端130可以向标量处理单元110发送访问请求。标量处理单元110根据标量访问客户端130的优先级,可以按序处理标量访问客户端130发送的访问请求。标量访问客户端130的优先级可以固定也可以动态配置。标量处理单元110根据访问请求,获取到相应的数据后,可以将数据返回标量访问客户端130。There may be one or more scalar access clients 130 , and the scalar access clients 130 refer to modules inside the GPU that need to access the scalar register group 112 . Scalar access client 130 may send an access request to scalar processing unit 110 . The scalar processing unit 110 may process the access requests sent by the scalar access client 130 in order according to the priority of the scalar access client 130 . The priority of the scalar access client 130 can be fixed or dynamically configured. After obtaining the corresponding data according to the access request, the scalar processing unit 110 may return the data to the scalar access client 130 .
指令调度单元120可以接收资源访问系统外部发送的指令,根据指令的译码信息确定指令类型。指令类型可能是标量指令或矢量指令。矢量指令的基本操作对象是向量,即有序排列的一组数。而标量指令的操作对象是单个数。指令调度单元120可以将标量指令发送到标量处理单元110。标量处理单元110可以在标量指令没有资源访问冲突时,从内部的标量寄存器获取标量指令对应的源操作数,并对源操作数进行缓存,由内部与标量指令对应的运算单元对源操作数进行运算。The instruction scheduling unit 120 may receive instructions sent from outside the resource access system and determine the instruction type according to the decoding information of the instruction. The instruction type may be a scalar instruction or a vector instruction. The basic operation object of vector instructions is vector, that is, a group of numbers arranged in order. The operation object of scalar instructions is a single number. Instruction dispatch unit 120 may send scalar instructions to scalar processing unit 110 . When there is no resource access conflict for the scalar instruction, the scalar processing unit 110 can obtain the source operand corresponding to the scalar instruction from the internal scalar register, cache the source operand, and perform processing on the source operand by the internal arithmetic unit corresponding to the scalar instruction. Operation.
如图1所示,标量处理单元110包括前置控制单元111、标量寄存器组112、后置控制单元113以及运算单元114。As shown in FIG. 1 , the scalar processing unit 110 includes a pre-control unit 111 , a scalar register group 112 , a post-control unit 113 and an arithmetic unit 114 .
前置控制单元111可以用于连接标量处理单元110外部的资源访问请求方。资源访问请求方可以包括指令调度单元120和/或标量访问客户端130。前置控制单元111可以用于接收资源访问请求方发送的资源访问指令;资源访问指令可以包括指令调度单元120发送的标量指令和/或标量访问客户端130发送的访问请求。The front-end control unit 111 may be used to connect a resource access requester external to the scalar processing unit 110 . Resource access requestors may include instruction dispatch unit 120 and/or scalar access client 130. The front-end control unit 111 may be configured to receive resource access instructions sent by the resource access requester; the resource access instructions may include scalar instructions sent by the instruction scheduling unit 120 and/or access requests sent by the scalar access client 130 .
前置控制单元111根据资源访问指令的访问对象,可以确定资源访问指令是否存在资源访问冲突;若资源访问指令存在资源访问冲突,根据资源访问指令的优先级,确定是否将资源访问指令加入缓冲队列,其中,资源访问指令的优先级可以与资源访问请求方的优先级相关。举例来说,标量访问客户端130的优先级高于指令调度单元120的优先级,则 访问请求的优先级高于标量指令的优先级。某个标量访问客户端X的优先级高于其他标量访问客户端130的优先级,则该标量访问客户端X的访问请求的优先级高于其他标量访问客户端130的访问请求的优先级。若资源访问指令被加入缓冲队列,当冲突解除时,同时发送缓冲队列中不存在资源访问冲突的资源访问指令到标量寄存器组112。前置控制单元111的执行过程详见下文方法实施例详细展开,在此不再赘述。前置控制单元111在冲突解除时,同时发射多条不存在冲突的指令,因此可以补偿因为资源访问冲突导致的指令流水线阻塞带来的性能损失,提高标量寄存器带宽利用率,也能有效的缓解或者消除后续潜在的标量访问客户端130和指令调度单元120的资源访问冲突。由于资源访问指令在进行冲突检查前,先进行了译码,所有资源访问指令可以共享前置控制单元111的译码功能,因此节省了硬件开销,降低了功耗。The front-end control unit 111 can determine whether the resource access instruction has a resource access conflict according to the access object of the resource access instruction; if the resource access instruction has a resource access conflict, it determines whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction. , where the priority of the resource access instruction may be related to the priority of the resource access requester. For example, if the priority of the scalar access client 130 is higher than the priority of the instruction dispatch unit 120, then the priority of the access request is higher than the priority of the scalar instruction. If the priority of a certain scalar access client X is higher than the priority of other scalar access clients 130 , then the priority of the access request of this scalar access client X is higher than the priority of the access request of other scalar access clients 130 . If the resource access instruction is added to the buffer queue, when the conflict is resolved, the resource access instruction without resource access conflict in the buffer queue is simultaneously sent to the scalar register group 112 . The execution process of the front-end control unit 111 is detailed in the method embodiment below and will not be described again here. When the conflict is resolved, the front-end control unit 111 simultaneously issues multiple non-conflicting instructions. Therefore, it can compensate for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improve the scalar register bandwidth utilization, and effectively alleviate the problem. Or eliminate subsequent potential resource access conflicts between the scalar access client 130 and the instruction dispatch unit 120 . Since the resource access instructions are decoded before conflict checking, all resource access instructions can share the decoding function of the front-end control unit 111, thus saving hardware overhead and reducing power consumption.
标量寄存器组112可以连接前置控制单元111,标量寄存器组112可以包括多个标量寄存器,可以用于接收资源访问指令,可以获取内部存储的资源访问指令对应的源操作数。举例来说,可以根据资源访问指令包含的地址信息,获取该地址信息对应存储的源操作数。当资源访问指令包括标量访问客户端130发送的访问请求时,前置控制单元111将获取的源操作数返回标量访问客户端130。The scalar register group 112 can be connected to the front-end control unit 111. The scalar register group 112 can include multiple scalar registers, can be used to receive resource access instructions, and can obtain source operands corresponding to internally stored resource access instructions. For example, the source operand stored corresponding to the address information can be obtained based on the address information contained in the resource access instruction. When the resource access instruction includes an access request sent by the scalar access client 130, the front-end control unit 111 returns the obtained source operand to the scalar access client 130.
后置控制单元113可以连接标量寄存器组和运算单元114,可以用于当资源访问指令包括标量指令时,将每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元114的类型和运行状态,选取对应类型的标量指令发送到运算单元114。运算单元114的类型可以有逻辑运算类、乘法运算类、分支控制类以及访问内存类。运行状态可以有忙碌状态和空闲状态。每种类型的运算单元114可以有多个(图1中仅画了一个用作示意)。故可以将与运算单元114的类型一样的标量指令发送到空闲状态的运算单元114。多条标量指令可以共享后置控制单元113的缓冲区,因此节省了硬件开销,降低了功耗。The post control unit 113 can be connected to the scalar register group and the operation unit 114, and can be used to temporarily store the source operand corresponding to each scalar instruction in an internal buffer when the resource access instruction includes a scalar instruction, and perform the operation according to the operation unit 114 Based on the type and running status, a scalar instruction of the corresponding type is selected and sent to the computing unit 114 . The types of operation unit 114 may include logical operation type, multiplication operation type, branch control type and memory access type. The running status can be busy status and idle status. There may be multiple computing units 114 of each type (only one is shown in FIG. 1 for illustration). Therefore, scalar instructions of the same type as those of the arithmetic unit 114 can be sent to the arithmetic unit 114 in the idle state. Multiple scalar instructions can share the buffer of the post-control unit 113, thus saving hardware overhead and reducing power consumption.
运算单元114可以用于根据接收到的标量指令,对标量指令对应的源操作数进行运算。该方案没有增加运算单元114的运算资源开销,而是通过增加少量共享的缓冲区和控制逻辑,可以实现多条标量指令并发执行,因此有效缓解因为前置资源访问冲突导致的标量指令流水的阻塞和标量指令的积压。一旦指令流水阻塞解除,就可以最高并发度执行缓存的标量指令,最大限度利用标量寄存器带宽,清空积压的标量指令的缓冲队列,以补偿之前损失的流水阻塞的时钟周期。The operation unit 114 may be configured to perform operations on the source operands corresponding to the scalar instruction according to the received scalar instruction. This solution does not increase the computing resource overhead of the computing unit 114, but by adding a small amount of shared buffers and control logic, multiple scalar instructions can be executed concurrently, thus effectively alleviating the blocking of the scalar instruction pipeline caused by pre-resource access conflicts. and the backlog of scalar instructions. Once the instruction pipeline blocking is lifted, cached scalar instructions can be executed with the highest degree of concurrency, maximizing the use of scalar register bandwidth, and clearing the backlog of scalar instruction buffer queues to compensate for the previously lost clock cycles of pipeline blocking.
图2是本申请实施例提供的一种标量处理单元的访问控制方法的流程示意图。该方法可以由标量处理单元110执行。该标量处理单元110可以连接标量处理单元110外部的资源访问请求方,如图2所示,该方法包括以下步骤S210-步骤S240。FIG. 2 is a schematic flowchart of an access control method for a scalar processing unit provided by an embodiment of the present application. This method may be performed by scalar processing unit 110. The scalar processing unit 110 can connect to a resource access requester external to the scalar processing unit 110. As shown in FIG. 2, the method includes the following steps S210 to S240.
步骤S210:接收资源访问请求方发送的资源访问指令。Step S210: Receive the resource access instruction sent by the resource access requester.
资源访问请求方是指资源访问指令的发送方,资源访问请求方可能是指令调度单元120,也可能是标量访问客户端130,也可能两者都有。故资源访问指令可能是指令调度单元120发送的标量指令,也可能是标量访问客户端130发送的访问请求,也可能既包括标量指令还包括访问请求。具体的,可以由标量处理单元110的前置控制单元111接收资源访问请求方发送的资源访问指令。The resource access requester refers to the sender of the resource access instruction. The resource access requester may be the instruction scheduling unit 120, the scalar access client 130, or both. Therefore, the resource access instruction may be a scalar instruction sent by the instruction scheduling unit 120, or it may be an access request sent by the scalar access client 130, or it may include both a scalar instruction and an access request. Specifically, the pre-control unit 111 of the scalar processing unit 110 may receive the resource access instruction sent by the resource access requester.
在一实施例中,资源访问请求方包括指令调度单元120和标量访问客户端130,故资源访问指令可以包括标量指令和访问请求。上述步骤S210具体包括:接收指令调度单元120发送的标量指令以及接收标量访问客户端130发送的访问请求。In one embodiment, the resource access requester includes the instruction dispatch unit 120 and the scalar access client 130, so the resource access instruction may include a scalar instruction and an access request. The above step S210 specifically includes: receiving the scalar instruction sent by the instruction scheduling unit 120 and receiving the access request sent by the scalar access client 130 .
步骤S220:根据资源访问指令的访问对象,确定资源访问指令是否存在资源访问冲突。Step S220: Determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction.
访问对象是指资源访问指令具体访问的标量寄存器。资源访问冲突是指资源访问指令的访问对象有未执行完毕的指令(包括待执行以及正在执行的指令),从而资源访问指令无法立即执行。举例来说,如果标量访问客户端130组的访问请求全占标量寄存器组112的访问带宽,则指令调度单元120的标量指令就会发生阻塞,此时,可以认为标量指令存在资源访问冲突。举例来说,如果标量指令和访问请求同一时间访问相同的访问对象,那么对于标量指令来说,访问对象存在未执行完毕的访问请求,对于访问请求来说,访问对象存在未执行完毕的标量指令,则标量指令和访问请求之间存在资源访问冲突。The access object refers to the scalar register specifically accessed by the resource access instruction. Resource access conflict means that the access object of the resource access instruction has unfinished instructions (including instructions to be executed and instructions that are being executed), so the resource access instruction cannot be executed immediately. For example, if the access requests of the scalar access client group 130 all occupy the access bandwidth of the scalar register group 112, the scalar instructions of the instruction scheduling unit 120 will be blocked. At this time, the scalar instructions may be considered to have resource access conflicts. For example, if a scalar instruction and an access request access the same access object at the same time, then for the scalar instruction, there is an unfinished access request on the access object, and for the access request, there is an unfinished scalar instruction on the access object. , then there is a resource access conflict between the scalar instruction and the access request.
标量处理单元110的前置控制单元111接收到资源访问指令(可能是标量指令或访问请求)后,经过译码,从资源访问指令中解析得到访问对象,进而进行资源检查确定是否存在资源访问冲突。资源检查的过程如下:可以根据资源访问指令的访问对象,判断资源访问指令的访问对象是否有未执行完毕的指令;若访问对象有未执行完毕的指令,确定资源访问指令存在资源访问冲突。After receiving the resource access instruction (which may be a scalar instruction or an access request), the front-end control unit 111 of the scalar processing unit 110 decodes and parses the access object from the resource access instruction, and then performs a resource check to determine whether there is a resource access conflict. . The process of resource checking is as follows: according to the access object of the resource access instruction, it can be judged whether the access object of the resource access instruction has unfinished instructions; if the access object has unfinished instructions, it is determined that the resource access instruction has a resource access conflict.
其中,未执行完毕的指令可以包括正在执行的指令和待执行的指令。待执行的指令可以暂存在缓冲队列中。未执行完毕的指令可能包括访问请求,也可能包括标量指令。如果访问对象有正在执行的指令以及缓冲队列中待执行的指令,则认为资源访问指令存在资源访问冲突。相反的,则认为不存在资源访问冲突,可以立即执行。The unexecuted instructions may include instructions being executed and instructions to be executed. Instructions to be executed can be temporarily stored in the buffer queue. Unfinished instructions may include access requests or scalar instructions. If the access object has instructions being executed and instructions to be executed in the buffer queue, the resource access instruction is considered to have a resource access conflict. On the contrary, it is considered that there is no resource access conflict and can be executed immediately.
步骤S230:若资源访问指令存在资源访问冲突,根据资源访问指令的优先级,确定是否将资源访问指令加入缓冲队列。Step S230: If there is a resource access conflict in the resource access instruction, determine whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction.
多个标量访问客户端130由于配置了优先级,故响应不同标量访问客户端130发送的访问请求存在先后顺序。标量指令和访问请求的执行先后顺序(即优先级)也可以提前进行配置。在一实施例中,优先级可以固定也可以动态配置,也就是说,优先级可以始终固定不变,也可以随着时间变化或者用户指令重新配置。优先级高的资源访问指令优先执行,故优先级低的资源访问指令可以暂时加入缓冲队列,等待执行。Since multiple scalar access clients 130 are configured with priorities, there is a sequence in which they respond to access requests sent by different scalar access clients 130 . The execution sequence (i.e. priority) of scalar instructions and access requests can also be configured in advance. In one embodiment, the priority can be fixed or dynamically configured. That is to say, the priority can be always fixed or can be reconfigured with time or user instructions. Resource access instructions with high priority are executed first, so resource access instructions with low priority can be temporarily added to the buffer queue and wait for execution.
在一实施例中,若资源访问指令存在资源访问冲突,且资源访问指令的优先级低于访问对象待执行指令的优先级,将资源访问指令加入缓冲队列中。In one embodiment, if there is a resource access conflict in the resource access instruction and the priority of the resource access instruction is lower than the priority of the instruction to be executed by the access object, the resource access instruction is added to the buffer queue.
前置控制单元111可以比较资源访问指令与访问对象待执行指令之间的优先级,如果资源访问指令的优先级不是最高的,则将资源访问指令加入缓冲队列中。相反的,如果资源访问指令的优先级最高,且访问对象没有正在执行的指令,则可以立即执行该资源访问指令。如果访问对象有正在执行的指令,则可以暂时将资源访问指令加入缓冲队列。The pre-control unit 111 can compare the priorities between the resource access instructions and the access object instructions to be executed, and if the priority of the resource access instructions is not the highest, add the resource access instructions to the buffer queue. On the contrary, if the resource access instruction has the highest priority and the access object has no instructions being executed, the resource access instruction can be executed immediately. If the access object has an instruction being executed, the resource access instruction can be temporarily added to the buffer queue.
步骤S240:若资源访问指令被加入缓冲队列,当冲突解除时,并发执行缓冲队列中不存在资源访问冲突的资源访问指令。Step S240: If the resource access instruction is added to the buffer queue, when the conflict is resolved, concurrently execute the resource access instruction in the buffer queue that does not have a resource access conflict.
当访问对象处于空闲状态时,可以认为冲突解除。举例来说,如果因为标量访问客户端130的访问请求全占标量寄存器组112的访问带宽,导致出现资源访问冲突。可以在标量访问客户端130释放部分标量寄存器访问带宽时,并发执行缓冲队列中不存在资源访问冲突的资源访问指令。When the access object is in an idle state, the conflict can be considered to be resolved. For example, if the access request of the scalar access client 130 fully occupies the access bandwidth of the scalar register group 112, a resource access conflict occurs. When the scalar access client 130 releases part of the scalar register access bandwidth, resource access instructions without resource access conflicts in the buffer queue can be executed concurrently.
并发执行是指从所缓冲队列中选取不存在资源访问冲突的资源访问指令进行并行处理。从而补偿因为资源访问冲突导致的指令流水线阻塞带来的性能损失,提高标量寄存器带宽利用率,也能有效的缓解或者消除后续潜在的标量访问客户端130组和标量指令流水的资源访问冲突。Concurrent execution refers to selecting resource access instructions without resource access conflicts from the buffered queue for parallel processing. This compensates for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improves scalar register bandwidth utilization, and can also effectively alleviate or eliminate subsequent potential resource access conflicts between the scalar access client 130 group and the scalar instruction pipeline.
在一实施例中,可以从缓冲队列中根据调度策略挑选一条或多条没有资源访问冲突的资源访问指令发送到标量寄存器组112。调度策略可以是按照先进先出原则,从缓冲队列中按序挑选一条或多条没有资源访问冲突的资源访问指令;也可以是随机选取;还可以是按照选取不同指令类别的原则从所述缓冲队列中选取多条不存在资源访问冲突的标量指令,从而可以并行处理不同类型的指令。In one embodiment, one or more resource access instructions without resource access conflicts may be selected from the buffer queue according to the scheduling policy and sent to the scalar register group 112 . The scheduling strategy can be to select one or more resource access instructions without resource access conflicts from the buffer queue in order according to the first-in-first-out principle; it can also be randomly selected; it can also be based on the principle of selecting different instruction categories from the buffer queue. Multiple scalar instructions without resource access conflicts are selected from the queue, so that different types of instructions can be processed in parallel.
本申请上述实施例提供的技术方案,当资源访问指令存在资源访问冲突时,根据资源访问指令的优先级,确定是否将资源访问指令加入缓冲队列,在冲突解除时,并发执行缓冲队列中不存在资源访问冲突的资源访问指令,从而快速缓解缓冲队列的指令积压,补偿因为资源访问冲突导致的指令流水线阻塞带来的性能损失,提高标量寄存器带宽利用率,快速缓解后续潜在的资源访问冲突。According to the technical solution provided by the above embodiments of the present application, when a resource access conflict exists in a resource access instruction, it is determined whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction. When the conflict is resolved, there is no concurrent execution buffer queue. Resource access instructions with resource access conflicts can quickly alleviate the instruction backlog in the buffer queue, compensate for the performance loss caused by instruction pipeline blocking caused by resource access conflicts, improve scalar register bandwidth utilization, and quickly alleviate subsequent potential resource access conflicts.
在一实施例中,当资源访问指令包括标量指令和访问请求时;如图3所示,上述步骤S220具体包括:In one embodiment, when the resource access instruction includes a scalar instruction and an access request; as shown in Figure 3, the above step S220 specifically includes:
步骤S221:根据标量指令的第一访问对象和访问请求的第二访问对象,判断第一访问对象和第二访问对象是否相同。Step S221: Based on the first access object of the scalar instruction and the second access object of the access request, determine whether the first access object and the second access object are the same.
第一访问对象是指指令调度单元120发送的标量指令的访问对象,第二访问对象是指标量访问客户端130发送的访问请求的访问对象。为进行区分,分别称为第一访问对象和 第二访问对象。The first access object refers to the access object of the scalar instruction sent by the instruction dispatch unit 120, and the second access object refers to the access object of the access request sent by the metric access client 130. For distinction, they are called the first access object and the second access object respectively.
步骤S222:若第一访问对象和第二访问对象相同,确定标量指令和访问请求之间存在资源访问冲突。Step S222: If the first access object and the second access object are the same, it is determined that there is a resource access conflict between the scalar instruction and the access request.
在一实施例中,若标量指令与访问请求之间存在资源访问冲突,可以比较标量指令与访问请求之间的优先级,将优先级更低的标量指令或访问请求加入缓冲队列。In one embodiment, if there is a resource access conflict between a scalar instruction and an access request, the priorities between the scalar instruction and the access request can be compared, and the scalar instruction or access request with a lower priority can be added to the buffer queue.
如果标量指令的优先级更低,则可以将标量指令加入标量指令对应的缓冲队列,并在冲突解除时,并发执行缓冲队列中不存在资源访问冲突的标量指令。如果访问对象没有未执行完毕的指令,则可以立即执行访问请求。如果访问对象有未执行完毕的指令,则可以暂时将访问请求加入访问请求对应的缓冲队列,并在冲突解除时,并发执行缓冲队列中不存在资源访问冲突的访问请求。If the priority of the scalar instruction is lower, the scalar instruction can be added to the buffer queue corresponding to the scalar instruction, and when the conflict is resolved, the scalar instructions in the buffer queue that do not have resource access conflicts are executed concurrently. If the access object has no unfinished instructions, the access request can be executed immediately. If the access object has unfinished instructions, the access request can be temporarily added to the buffer queue corresponding to the access request, and when the conflict is resolved, access requests in the buffer queue that do not have resource access conflicts are concurrently executed.
同理,如果访问请求的优先级更低,则可以将访问请求加入访问请求对应的缓冲队列。如果访问对象没有未执行完毕的指令,则可以立即执行标量指令。标量指令和访问请求的优先级的相对高低可以是固定的,也可以动态配置。Similarly, if the access request has a lower priority, the access request can be added to the buffer queue corresponding to the access request. If the access object has no unfinished instructions, the scalar instruction can be executed immediately. The relative priorities of scalar instructions and access requests can be fixed or dynamically configured.
步骤S222’:若第一访问对象和第二访问对象不相同,分别判断标量指令和访问请求各自是否存在资源访问冲突。Step S222': If the first access object and the second access object are different, determine whether there is a resource access conflict in the scalar instruction and the access request respectively.
在一实施例中,可以分别判断第一访问对象和第二访问对象是否有未执行完毕的指令;若第一访问对象有未执行完毕的指令,确定标量指令存在资源访问冲突;若第二访问对象有未执行完毕的指令,确定访问请求存在资源访问冲突。In one embodiment, it can be determined whether the first access object and the second access object have unfinished instructions. If the first access object has unfinished instructions, it is determined that the scalar instruction has a resource access conflict. If the second access object has unfinished instructions, it is determined that the scalar instruction has a resource access conflict. The object has unfinished instructions, and it is determined that the access request has a resource access conflict.
未执行完毕的指令包括正在执行的指令和待执行的指令。未执行完毕的指令可能包括之前接收的访问请求,还可能包括之前接收的标量指令。如果标量指令的访问对象(即第一访问对象)有正在执行的指令或待执行的指令,则标量指令存在资源访问冲突。相反的,则标量指令不存在资源访问冲突,前置控制单元111可以立即将该标量指令发送到标量寄存器组112。如果访问请求的访问对象(即第二访问对象)有正在执行的指令或待执行的指令,则访问请求存在资源访问冲突。相反的,则访问请求不存在资源访问冲突,前置控制单元111可以立即将该访问请求发送到标量寄存器组112。Unexecuted instructions include instructions being executed and instructions to be executed. Uncompleted instructions may include previously received access requests and may also include previously received scalar instructions. If the access object of the scalar instruction (ie, the first access object) has an instruction being executed or an instruction to be executed, the scalar instruction has a resource access conflict. On the contrary, if there is no resource access conflict for the scalar instruction, the pre-control unit 111 can immediately send the scalar instruction to the scalar register group 112 . If the access object of the access request (ie, the second access object) has an instruction being executed or an instruction to be executed, the access request has a resource access conflict. On the contrary, if there is no resource access conflict in the access request, the pre-control unit 111 can immediately send the access request to the scalar register group 112 .
在其他实施例中,为了判断标量指令和访问请求各自是否存在资源访问冲突,还可以通过判断第一访问对象的访问带宽是否全部被占用,如果第一访问对象的访问带宽全部被占用,则标量指令存在资源访问冲突,相反的,如果第一访问对象的访问带宽未全部被占用,则认为不存在冲突。同理,判断第二访问对象的访问带宽是否全部被占用,如果第二访问对象的访问带宽全部被占用,则访问请求存在资源访问冲突,相反的,如果第二访问对象的访问带宽未全部被占用,则认为不存在冲突。In other embodiments, in order to determine whether there is a resource access conflict between the scalar instruction and the access request, it can also be determined whether the access bandwidth of the first access object is fully occupied. If the access bandwidth of the first access object is fully occupied, then the scalar There is a resource access conflict in the instruction. On the contrary, if the access bandwidth of the first access object is not fully occupied, it is considered that there is no conflict. In the same way, determine whether the access bandwidth of the second access object is fully occupied. If the access bandwidth of the second access object is fully occupied, the access request has a resource access conflict. On the contrary, if the access bandwidth of the second access object is not fully occupied. occupied, it is considered that there is no conflict.
需要说明的是,对于访问请求,前置控制单元111可以获取到访问请求对应的数据, 并将该数据返回发送访问请求的标量访问客户端130。对于标量指令,则需要标量处理单元110内部的运算单元114对标量指令对应的源操作数进行运算,运算结果可以重新存储在标量寄存器中。It should be noted that for an access request, the front-end control unit 111 can obtain the data corresponding to the access request and return the data to the scalar access client 130 that sent the access request. For scalar instructions, the operation unit 114 inside the scalar processing unit 110 needs to perform operations on the source operands corresponding to the scalar instructions, and the operation results can be re-stored in the scalar register.
在一实施例中,参见图1对应实施例,标量处理单元110包括依次连接的前置控制单元111、标量寄存器组112、后置控制单元113和运算单元114,当资源访问指令包括标量指令时,如图4所示,上述步骤S240并发执行缓冲队列中不存在资源访问冲突的资源访问指令,具体包括以下步骤:In one embodiment, referring to the corresponding embodiment of Figure 1, the scalar processing unit 110 includes a pre-control unit 111, a scalar register group 112, a post-control unit 113 and an arithmetic unit 114 connected in sequence. When the resource access instruction includes a scalar instruction , as shown in Figure 4, the above step S240 concurrently executes resource access instructions that do not have resource access conflicts in the buffer queue, specifically including the following steps:
步骤S241:前置控制单元111根据缓冲队列中不存在资源访问冲突的标量指令,从标量寄存器组112获取每条标量指令对应的源操作数。Step S241: The front-end control unit 111 obtains the source operand corresponding to each scalar instruction from the scalar register group 112 according to the scalar instructions that do not have resource access conflicts in the buffer queue.
具体的,前置控制单元111可以将缓冲队列中不存在资源访问冲突的标量指令并行发送到标量寄存器组112。每条已发送的标量指令经过一个或者多个时钟周期,获得每条标量指令对应的全部源操作数。源操作数是指根据标量指令获取的指定地址的数据。Specifically, the front-end control unit 111 may send scalar instructions that do not have resource access conflicts in the buffer queue to the scalar register group 112 in parallel. After each issued scalar instruction passes one or more clock cycles, all source operands corresponding to each scalar instruction are obtained. The source operand refers to the data at the specified address obtained according to the scalar instruction.
在一实施例中,前置控制单元可以按照选取不同指令类别的原则从所述缓冲队列中选取多条不存在资源访问冲突的标量指令;将选取的多条标量指令并行发送到标量寄存器组中每条指令类别对应的标量寄存器,以标量寄存器组获取每条标量指令对应的源操作数。In one embodiment, the pre-control unit can select multiple scalar instructions without resource access conflicts from the buffer queue according to the principle of selecting different instruction categories; and send the selected multiple scalar instructions to the scalar register group in parallel. The scalar register corresponding to each instruction type is used to obtain the source operand corresponding to each scalar instruction in the scalar register group.
举例来说,可以按照标量指令的功能(例如逻辑运算指令、控制指令),对标量指令进行分类,标量寄存器组中的多个标量寄存器也可进行分组,划分在同一组的标量寄存器用于处理同一类别的标量指令,为提高访问效率,可以选取不同类别的不存在资源访问冲突的标量指令,并行发送到不同组的标量寄存器。在其他实施例,也可以按照先进先出原则,按序选取不存在资源访问冲突的标量指令;或者,随机选取不存在资源访问冲突的标量指令。For example, scalar instructions can be classified according to their functions (such as logical operation instructions, control instructions). Multiple scalar registers in a scalar register group can also be grouped, and scalar registers in the same group are used for processing. For scalar instructions of the same category, in order to improve access efficiency, scalar instructions of different categories without resource access conflicts can be selected and sent to different groups of scalar registers in parallel. In other embodiments, scalar instructions without resource access conflicts may be selected sequentially according to the first-in-first-out principle; or scalar instructions without resource access conflicts may be randomly selected.
步骤S242:后置控制单元113将每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元114的类型和运行状态,选取对应类型的标量指令发送到运算单元114。Step S242: The post-control unit 113 temporarily stores the source operand corresponding to each scalar instruction in an internal buffer, and selects the corresponding type of scalar instruction and sends it to the operation unit 114 according to the type and operating status of the operation unit 114.
后置控制单元113内置共享的缓冲区,缓冲区被所有类型的标量指令共享,所有已发送指令获取的源操作数暂存于该缓冲区内部,如果某条标量指令已获取所有源操作数存储于该缓冲区内,则该标量指令进入待执行状态。后置控制单元113可以控制处于待执行状态的不同类型的标量指令并发执行。The post control unit 113 has a built-in shared buffer. The buffer is shared by all types of scalar instructions. The source operands obtained by all sent instructions are temporarily stored in the buffer. If a certain scalar instruction has obtained all source operands, they are stored in the buffer. Within the buffer, the scalar instruction enters the pending execution state. The post control unit 113 can control the concurrent execution of different types of scalar instructions in a pending execution state.
具体的,可以将根据运算单元114的类型和运行状态,选择对应类型的待执行的标量指令,发送至运算单元114。运算单元114的类型有逻辑运算类、乘法运算类、分支控制类、访问内存类,运行状态有忙碌状态和空闲状态,故可以将待执行的逻辑运算类的标量指令发送到逻辑运算类的空闲的运算单元114,可以将待执行的乘法运算类的标量指令发送到乘法运算类的空闲的运算单元114,以此类推。Specifically, a corresponding type of scalar instruction to be executed may be selected according to the type and operating status of the computing unit 114 and sent to the computing unit 114 . The types of operation unit 114 include logical operation type, multiplication operation type, branch control type, and memory access type. The operating status includes busy state and idle state. Therefore, the scalar instructions of the logical operation type to be executed can be sent to the idle status of the logical operation type. The operation unit 114 of the multiplication operation class may send the scalar instruction of the multiplication operation class to be executed to the idle operation unit 114 of the multiplication operation class, and so on.
步骤S243:运算单元114根据接收到的标量指令,对标量指令对应的源操作数进行运算。Step S243: The operation unit 114 operates on the source operand corresponding to the scalar instruction according to the received scalar instruction.
不同类型的运算单元114接收到相应类型的标量指令后,可以对标量指令对应的源操作数据进行运算,之后可以将运算结果返回标量寄存器。从而不同类型的标量指令可以并发执行,缓解指令的积压,提高处理性能。After receiving the corresponding type of scalar instruction, the operation units 114 of different types can perform operations on the source operation data corresponding to the scalar instruction, and then return the operation results to the scalar register. As a result, different types of scalar instructions can be executed concurrently, alleviating the backlog of instructions and improving processing performance.
本申请上述实施例提供的技术方案,共享前置控制单元111的译码和资源检查功能,共享后置控制单元113的缓冲区,节省了硬件成本,通过用一种低成本的方式,提高了标量处理单元110指令执行的并发度,补偿因为资源访问冲突而引起的指令流水的阻塞而引入的性能损失,同时有效缓解后续潜在的资源访问冲突。The technical solution provided by the above embodiments of the present application shares the decoding and resource checking functions of the front control unit 111 and the buffer of the rear control unit 113, thereby saving hardware costs and improving efficiency in a low-cost manner. The concurrency of instruction execution of the scalar processing unit 110 compensates for the performance loss caused by blocking the instruction pipeline due to resource access conflicts, while effectively mitigating subsequent potential resource access conflicts.
如果出现标量寄存器资源访问冲突而阻塞指令流水,被阻塞的标量指令可以进入前置控制单元111的缓冲队列中,当阻塞解除时,标量处理单元110以最高并发度执行多条没有标量寄存器资源访问冲突的指令,最大化利用标量寄存器的访问带宽,提高单周期执行指令数,补偿之前因为标量寄存器访问冲突而引起的流水线阻塞带来的性能丢失,快速的缓解缓冲队列的指令积压,缓解后续潜在的资源访问冲突。If a scalar register resource access conflict occurs and blocks the instruction pipeline, the blocked scalar instructions can enter the buffer queue of the pre-control unit 111. When the blocking is released, the scalar processing unit 110 executes multiple scalar register resource accesses with the highest degree of concurrency. Conflicting instructions maximize the use of scalar register access bandwidth, increase the number of instructions executed in a single cycle, compensate for the performance loss caused by pipeline blocking caused by scalar register access conflicts, quickly alleviate the instruction backlog in the buffer queue, and alleviate subsequent potential Resource access violation.
在本申请所提供的几个实施例中,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。In the several embodiments provided in this application, the disclosed devices and methods can also be implemented in other ways. The device embodiments described above are only illustrative. For example, the flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions and functions of the devices, methods and computer program products according to multiple embodiments of the present application. operate. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more executable functions for implementing the specified logical function instruction. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
另外,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in each embodiment of the present application can be integrated together to form an independent part, each module can exist alone, or two or more modules can be integrated to form an independent part.
功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器 (RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
工业实用性Industrial applicability
本申请提供了一种标量处理单元的访问控制方法及标量处理单元,该标量处理单元连接所述标量处理单元外部的资源访问请求方,该方法包括:接收所述资源访问请求方发送的资源访问指令;根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突;若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列;若所述资源访问指令被加入缓冲队列,当冲突解除时,并发执行所述缓冲队列中不存在资源访问冲突的资源访问指令。该方案可以快速缓解缓冲队列的指令积压,补偿因为资源访问冲突导致的性能损失,提高标量寄存器带宽利用率,缓解后续潜在的资源访问冲突。The present application provides an access control method for a scalar processing unit and a scalar processing unit. The scalar processing unit is connected to a resource access requester external to the scalar processing unit. The method includes: receiving a resource access request sent by the resource access requester. Instruction; determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction; if there is a resource access conflict in the resource access instruction, determine whether to send the resource access instruction according to the priority of the resource access instruction. The resource access instruction is added to the buffer queue; if the resource access instruction is added to the buffer queue, when the conflict is resolved, the resource access instructions in the buffer queue that do not have a resource access conflict are executed concurrently. This solution can quickly alleviate the instruction backlog in the buffer queue, compensate for performance losses caused by resource access conflicts, improve scalar register bandwidth utilization, and alleviate subsequent potential resource access conflicts.
此外,可以理解的是,本申请的标量处理单元的访问控制方法及标量处理单元是可以重现的,并且可以用在多种工业应用中。例如,本申请的标量处理单元的访问控制方法及标量处理单元可以用于需要用标量处理单元进行图像运算工作的处理器中。In addition, it can be understood that the access control method of the scalar processing unit and the scalar processing unit of the present application are reproducible and can be used in various industrial applications. For example, the access control method and scalar processing unit of the present application can be used in a processor that needs to use a scalar processing unit to perform image operations.

Claims (16)

  1. 一种标量处理单元的访问控制方法,其特征在于,所述方法包括:An access control method for a scalar processing unit, characterized in that the method includes:
    接收资源访问请求方发送的资源访问指令;Receive resource access instructions sent by the resource access requester;
    根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突;Determine whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction;
    若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列;If there is a resource access conflict in the resource access instruction, determine whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction;
    若所述资源访问指令被加入缓冲队列,当冲突解除时,并发执行所述缓冲队列中不存在资源访问冲突的资源访问指令;If the resource access instruction is added to the buffer queue, when the conflict is resolved, concurrently execute the resource access instruction in the buffer queue that does not have a resource access conflict;
    所述标量处理单元包括依次连接的前置控制单元、标量寄存器组、后置控制单元和运算单元,所述资源访问指令包括标量指令,所述并发执行所述缓冲队列中不存在资源访问冲突的资源访问指令,包括:The scalar processing unit includes a pre-control unit, a scalar register group, a post-control unit and an arithmetic unit connected in sequence, the resource access instructions include scalar instructions, and there is no resource access conflict in the concurrent execution of the buffer queue. Resource access instructions, including:
    所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数;The pre-control unit obtains the source operand corresponding to each scalar instruction from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue;
    所述后置控制单元将每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元的类型和运行状态,选取对应类型的标量指令发送到所述运算单元;The post-control unit temporarily stores the source operand corresponding to each scalar instruction in an internal buffer, and selects the corresponding type of scalar instruction to send to the operation unit according to the type and operating status of the operation unit;
    所述运算单元根据接收到的标量指令,对所述标量指令对应的源操作数进行运算。The operation unit performs operations on the source operand corresponding to the scalar instruction according to the received scalar instruction.
  2. 根据权利要求1所述的方法,其特征在于,所述资源访问指令的优先级与所述资源访问请求方的优先级相关。The method according to claim 1, characterized in that the priority of the resource access instruction is related to the priority of the resource access requester.
  3. 根据权利要求1或2所述的方法,其特征在于,所述优先级为固定配置或者动态配置。The method according to claim 1 or 2, characterized in that the priority is a fixed configuration or a dynamic configuration.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列,包括:The method according to any one of claims 1 to 3, characterized in that if there is a resource access conflict in the resource access instruction, it is determined whether to send the resource access instruction according to the priority of the resource access instruction. Join the buffer queue, including:
    若所述资源访问指令存在资源访问冲突,且所述资源访问指令的优先级低于所述访问对象待执行指令的优先级,将所述资源访问指令加入缓冲队列中。If there is a resource access conflict in the resource access instruction, and the priority of the resource access instruction is lower than the priority of the instruction to be executed by the access object, the resource access instruction is added to the buffer queue.
  5. 根据权利要求4所述的方法,其特征在于,所述资源访问指令包括标量指令和访问请求;所述若所述资源访问指令存在资源访问冲突,且所述资源访问指令的优先级低于所述访问对象待执行指令的优先级,将所述资源访问指令加入缓冲队列中,包括:The method of claim 4, wherein the resource access instruction includes a scalar instruction and an access request; if there is a resource access conflict in the resource access instruction, and the priority of the resource access instruction is lower than the The priority of the instruction to be executed on the access object is specified, and the resource access instruction is added to the buffer queue, including:
    若所述标量指令与所述访问请求之间存在资源访问冲突,比较所述标量指令与所述访问请求之间的优先级;If there is a resource access conflict between the scalar instruction and the access request, compare the priorities between the scalar instruction and the access request;
    若所述标量指令的优先级比所述访问请求的优先级低,将所述标量指令加入缓冲队列;If the priority of the scalar instruction is lower than the priority of the access request, add the scalar instruction to the buffer queue;
    若所述访问请求的优先级比所述标量指令的优先级低,将所述访问请求加入缓冲队列。If the priority of the access request is lower than the priority of the scalar instruction, the access request is added to the buffer queue.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,包括:The method according to any one of claims 1 to 5, characterized in that the pre-control unit obtains each scalar from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue. The source operands corresponding to the instruction include:
    所述前置控制单元按照选取不同指令类别的原则从所述缓冲队列中选取多条不存在资源访问冲突的标量指令;The front-end control unit selects multiple scalar instructions without resource access conflicts from the buffer queue according to the principle of selecting different instruction categories;
    将选取的多条标量指令并行发送到所述标量寄存器组中每条指令类别对应的标量寄存器,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected multiple scalar instructions are sent in parallel to the scalar register corresponding to each instruction type in the scalar register group, so as to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  7. 根据权利要求1至5中任一项所述的方法,其特征在于,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,包括:The method according to any one of claims 1 to 5, characterized in that the pre-control unit obtains each scalar from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue. The source operands corresponding to the instruction include:
    所述前置控制单元按照先进先出原则从所述缓冲队列中按序选取不存在资源访问冲突的标量指令;The front-end control unit selects scalar instructions without resource access conflicts from the buffer queue in order according to the first-in, first-out principle;
    将选取的多条标量指令并行发送到所述标量寄存器组,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected plurality of scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  8. 根据权利要求1至5中任一项所述的方法,其特征在于,所述前置控制单元根据所述缓冲队列中不存在资源访问冲突的标量指令,从所述标量寄存器组获取每条标量指令对应的源操作数,包括:The method according to any one of claims 1 to 5, characterized in that the pre-control unit obtains each scalar from the scalar register group according to the scalar instructions that do not have resource access conflicts in the buffer queue. The source operands corresponding to the instruction include:
    所述前置控制单元从所述缓冲队列中随机选取不存在资源访问冲突的标量指令;The pre-control unit randomly selects scalar instructions without resource access conflicts from the buffer queue;
    将选取的标量指令并行发送到所述标量寄存器组,以从所述标量寄存器组获取每条标量指令对应的源操作数。The selected scalar instructions are sent to the scalar register group in parallel to obtain the source operand corresponding to each scalar instruction from the scalar register group.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述运算单元有多个,所述根据运算单元的类型和运行状态,选取对应类型的标量指令发送到所述运算单元,包括:The method according to any one of claims 1 to 8, characterized in that there are multiple computing units, and according to the type and operating status of the computing unit, a corresponding type of scalar instruction is selected and sent to the computing unit ,include:
    根据每个运算单元的类型和运行状态,选取对应类型的标量指令发送到处于空闲状态的运算单元。According to the type and operating status of each computing unit, a scalar instruction of the corresponding type is selected and sent to the computing unit in the idle state.
  10. 根据权利要求1至4中任一项所述的方法,其特征在于,所述资源访问请求方包括指令调度单元或标量访问客户端,所述资源访问指令包括所述指令调度单元发送的标量指令或所述标量访问客户端发送的访问请求;所述根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突,包括:The method according to any one of claims 1 to 4, characterized in that the resource access requester includes an instruction scheduling unit or a scalar access client, and the resource access instruction includes a scalar instruction sent by the instruction scheduling unit. Or the access request sent by the scalar access client; determining whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction, including:
    根据所述资源访问指令的访问对象,判断所述资源访问指令的访问对象是否有未执行完毕的指令;According to the access object of the resource access instruction, determine whether the access object of the resource access instruction has unfinished instructions;
    若所述访问对象有未执行完毕的指令,确定所述资源访问指令存在资源访问冲突。If the access object has unfinished instructions, it is determined that the resource access instruction has a resource access conflict.
  11. 根据权利要求1至9中任一项所述的方法,其特征在于,所述资源访问请求方包括指令调度单元和标量访问客户端,所述资源访问指令包括所述指令调度单元发送的标量指令和所述标量访问客户端发送的访问请求;所述根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突,包括:The method according to any one of claims 1 to 9, characterized in that the resource access requester includes an instruction scheduling unit and a scalar access client, and the resource access instruction includes a scalar instruction sent by the instruction scheduling unit and the access request sent by the scalar access client; determining whether there is a resource access conflict in the resource access instruction according to the access object of the resource access instruction, including:
    根据所述标量指令的第一访问对象和所述访问请求的第二访问对象,判断所述第一访问对象和所述第二访问对象是否相同;Determine whether the first access object and the second access object are the same according to the first access object of the scalar instruction and the second access object of the access request;
    若所述第一访问对象和所述第二访问对象相同,确定所述标量指令和所述访问请求之间存在资源访问冲突;If the first access object and the second access object are the same, it is determined that there is a resource access conflict between the scalar instruction and the access request;
    若所述第一访问对象和所述第二访问对象不相同,分别判断所述标量指令和所述访问请求各自是否存在资源访问冲突。If the first access object and the second access object are different, determine whether there is a resource access conflict in the scalar instruction and the access request respectively.
  12. 根据权利要求11所述的方法,其特征在于,所述分别判断所述标量指令和所述访问请求各自是否存在资源访问冲突,包括:The method according to claim 11, wherein the step of determining whether there is a resource access conflict in each of the scalar instruction and the access request includes:
    分别判断所述第一访问对象和第二访问对象是否有未执行完毕的指令;Determine whether the first access object and the second access object have unfinished instructions respectively;
    若所述第一访问对象有未执行完毕的指令,确定所述标量指令存在资源访问冲突;If the first access object has unfinished instructions, it is determined that the scalar instruction has a resource access conflict;
    若所述第二访问对象有未执行完毕的指令,确定所述访问请求存在资源访问冲突。If the second access object has unfinished instructions, it is determined that the access request contains a resource access conflict.
  13. 根据权利要求11所述的方法,其特征在于,所述分别判断所述标量指令和所述访问请求各自是否存在资源访问冲突,包括:The method according to claim 11, wherein the step of determining whether there is a resource access conflict in each of the scalar instruction and the access request includes:
    分别判断所述第一访问对象的访问带宽和所述第二访问对象的访问带宽是否全部被占用;Determine whether the access bandwidth of the first access object and the access bandwidth of the second access object are all occupied;
    若所述第一访问对象的访问带宽全部被占用,确定所述标量指令存在资源访问冲突;If the access bandwidth of the first access object is all occupied, it is determined that the scalar instruction has a resource access conflict;
    若所述第二访问对象的访问带宽全部被占用,确定所述访问请求存在资源访问冲突。If all the access bandwidth of the second access object is occupied, it is determined that the access request contains a resource access conflict.
  14. 根据权利要求10或12所述的方法,其特征在于,所述未执行完毕的指令包括正在执行的指令和待执行的指令。The method according to claim 10 or 12, characterized in that the unexecuted instructions include instructions being executed and instructions to be executed.
  15. 一种标量处理单元,其特征在于,包括:A scalar processing unit, characterized by including:
    前置控制单元,用于连接所述标量处理单元外部的资源访问请求方,接收所述资源访问请求方发送的资源访问指令;根据所述资源访问指令的访问对象,确定所述资源访问指令是否存在资源访问冲突;若所述资源访问指令存在资源访问冲突,根据所述资源访问指令的优先级,确定是否将所述资源访问指令加入缓冲队列;若所述资源访问指令被加入缓冲队列,当冲突解除时,同时发送所述缓冲队列中不存在资源访问冲突的资源访问指令;A front-end control unit, configured to connect to a resource access requester outside the scalar processing unit, receive a resource access instruction sent by the resource access requester, and determine whether the resource access instruction is based on the access object of the resource access instruction. There is a resource access conflict; if there is a resource access conflict in the resource access instruction, it is determined whether to add the resource access instruction to the buffer queue according to the priority of the resource access instruction; if the resource access instruction is added to the buffer queue, when When the conflict is resolved, resource access instructions that do not have resource access conflicts in the buffer queue are simultaneously sent;
    标量寄存器组,连接所述前置控制单元,用于接收所述资源访问指令,获取内部存储的所述资源访问指令对应的源操作数;A scalar register group, connected to the pre-control unit, used to receive the resource access instruction and obtain the internally stored source operand corresponding to the resource access instruction;
    后置控制单元,连接所述标量寄存器,用于当所述资源访问指令包括标量指令时,将 每条标量指令对应的源操作数暂存于内部的缓冲区,并根据运算单元的类型和运行状态,选取对应类型的标量指令发送到运算单元;A post-control unit, connected to the scalar register, is used to temporarily store the source operand corresponding to each scalar instruction in an internal buffer when the resource access instruction includes a scalar instruction, and perform the operation according to the type and operation of the arithmetic unit. Status, select the corresponding type of scalar instruction and send it to the computing unit;
    所述运算单元,连接所述后置控制单元,用于根据接收到的标量指令,对所述标量指令对应的源操作数进行运算。The operation unit is connected to the post-control unit and is used to perform operations on the source operands corresponding to the scalar instructions according to the received scalar instructions.
  16. 一种资源访问系统,其特征在于,包括:A resource access system, characterized by including:
    权利要求15所述的标量处理单元;所述资源访问指令包括标量指令和访问请求;The scalar processing unit of claim 15; the resource access instructions include scalar instructions and access requests;
    指令调度单元,连接所述标量处理单元,用于发送标量指令到所述标量处理单元;An instruction dispatch unit is connected to the scalar processing unit and used to send scalar instructions to the scalar processing unit;
    标量访问客户端,连接所述标量处理单元,用于发送访问请求到所述标量处理单元。A scalar access client is connected to the scalar processing unit and used to send an access request to the scalar processing unit.
PCT/CN2022/130376 2022-08-26 2022-11-07 Access control method for scalar processing unit, and scalar processing unit WO2024040750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211028891.2 2022-08-26
CN202211028891.2A CN115129480B (en) 2022-08-26 2022-08-26 Scalar processing unit and access control method thereof

Publications (1)

Publication Number Publication Date
WO2024040750A1 true WO2024040750A1 (en) 2024-02-29

Family

ID=83387658

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130376 WO2024040750A1 (en) 2022-08-26 2022-11-07 Access control method for scalar processing unit, and scalar processing unit

Country Status (2)

Country Link
CN (1) CN115129480B (en)
WO (1) WO2024040750A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129480B (en) * 2022-08-26 2022-11-08 上海登临科技有限公司 Scalar processing unit and access control method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101005486A (en) * 2006-12-28 2007-07-25 金蝶软件(中国)有限公司 Resource access control method and system
CN105426160A (en) * 2015-11-10 2016-03-23 北京时代民芯科技有限公司 Instruction classified multi-emitting method based on SPRAC V8 instruction set
CN105471881A (en) * 2015-12-07 2016-04-06 北京奇虎科技有限公司 Method, device and system for locking and unlocking requests
US20170039139A1 (en) * 2014-09-26 2017-02-09 Intel Corporation Hardware apparatuses and methods to control access to a multiple bank data cache
CN115129480A (en) * 2022-08-26 2022-09-30 上海登临科技有限公司 Scalar processing unit and access control method thereof

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06168263A (en) * 1992-11-30 1994-06-14 Fujitsu Ltd Vector processor
US7437521B1 (en) * 2003-08-18 2008-10-14 Cray Inc. Multistream processing memory-and barrier-synchronization method and apparatus
CN102156637A (en) * 2011-05-04 2011-08-17 中国人民解放军国防科学技术大学 Vector crossing multithread processing method and vector crossing multithread microprocessor
US9501227B2 (en) * 2014-08-21 2016-11-22 Wisconsin Alumni Research Foundation Memory controller for heterogeneous computer
US10372458B2 (en) * 2015-04-01 2019-08-06 Huawei Technologies Co., Ltd Method and apparatus for a self-clocked, event triggered superscalar processor
CN109062604B (en) * 2018-06-26 2021-07-23 飞腾技术(长沙)有限公司 Emission method and device for mixed execution of scalar and vector instructions
CN111459543B (en) * 2019-01-21 2022-09-13 上海登临科技有限公司 Method for managing register file unit
CN112925567A (en) * 2019-12-06 2021-06-08 中科寒武纪科技股份有限公司 Method and device for distributing register, compiling method and device and electronic equipment
CN113934455A (en) * 2020-06-29 2022-01-14 华为技术有限公司 Instruction conversion method and device
US20220206862A1 (en) * 2020-12-25 2022-06-30 Intel Corporation Autonomous and extensible resource control based on software priority hint

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101005486A (en) * 2006-12-28 2007-07-25 金蝶软件(中国)有限公司 Resource access control method and system
US20170039139A1 (en) * 2014-09-26 2017-02-09 Intel Corporation Hardware apparatuses and methods to control access to a multiple bank data cache
CN105426160A (en) * 2015-11-10 2016-03-23 北京时代民芯科技有限公司 Instruction classified multi-emitting method based on SPRAC V8 instruction set
CN105471881A (en) * 2015-12-07 2016-04-06 北京奇虎科技有限公司 Method, device and system for locking and unlocking requests
CN115129480A (en) * 2022-08-26 2022-09-30 上海登临科技有限公司 Scalar processing unit and access control method thereof

Also Published As

Publication number Publication date
CN115129480B (en) 2022-11-08
CN115129480A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
US6532509B1 (en) Arbitrating command requests in a parallel multi-threaded processing system
US10002031B2 (en) Low overhead thread synchronization using hardware-accelerated bounded circular queues
US7111296B2 (en) Thread signaling in multi-threaded processor
US8954986B2 (en) Systems and methods for data-parallel processing
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US6560667B1 (en) Handling contiguous memory references in a multi-queue system
US9588810B2 (en) Parallelism-aware memory request scheduling in shared memory controllers
US8963933B2 (en) Method for urgency-based preemption of a process
US9158595B2 (en) Hardware scheduling of ordered critical code sections
US20120229481A1 (en) Accessibility of graphics processing compute resources
US20120147021A1 (en) Graphics compute process scheduling
US9176795B2 (en) Graphics processing dispatch from user mode
WO2014166404A1 (en) Network data packet processing method and device
US20130185728A1 (en) Scheduling and execution of compute tasks
WO2024040750A1 (en) Access control method for scalar processing unit, and scalar processing unit
US8959319B2 (en) Executing first instructions for smaller set of SIMD threads diverging upon conditional branch instruction
US20130185725A1 (en) Scheduling and execution of compute tasks
US7765548B2 (en) System, method and medium for using and/or providing operating system information to acquire a hybrid user/operating system lock
US9442759B2 (en) Concurrent execution of independent streams in multi-channel time slice groups
US10152329B2 (en) Pre-scheduled replays of divergent operations
US20120221831A1 (en) Accessing Common Registers In A Multi-Core Processor
US11941440B2 (en) System and method for queuing commands in a deep learning processor
CN112114967B (en) GPU resource reservation method based on service priority
CN114564420B (en) Method for sharing parallel bus by multi-core processor
CN112559403B (en) Processor and interrupt controller therein

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22956287

Country of ref document: EP

Kind code of ref document: A1