CN111638965A - Command queue inter-fence synchronization method and system based on fence replication and handshake - Google Patents
Command queue inter-fence synchronization method and system based on fence replication and handshake Download PDFInfo
- Publication number
- CN111638965A CN111638965A CN202010441746.1A CN202010441746A CN111638965A CN 111638965 A CN111638965 A CN 111638965A CN 202010441746 A CN202010441746 A CN 202010441746A CN 111638965 A CN111638965 A CN 111638965A
- Authority
- CN
- China
- Prior art keywords
- command
- fence
- barrier
- queues
- queue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/522—Barrier synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a command inter-queue barrier synchronization method and system based on barrier replication and handshake, the method of the invention dispatches and distributes the command in a single command stream to N command queues with specified quantity for processing, when encountering barrier command, the method executes the barrier command replication to replicate the barrier command in the command stream into multiple copies and sends the replicated barrier command into each command queue; the distributed instructions of each command queue are executed respectively, and the fence command of all the queues is ensured to be executed simultaneously through fence handshake when the fence command is executed. The invention can reduce the command stream processing stagnation caused by the barrier command and distribute the processing overhead of the barrier command into each command queue, thereby improving the overall command stream processing efficiency.
Description
Technical Field
The invention relates to the field of integrated circuit chip design, in particular to a command inter-queue fence synchronization method and system based on fence replication and handshake.
Background
The fence synchronization operation is one of the commands that must be processed correctly in the design of a microprocessor, and is about crucial to the functional correctness of the processor. The specific meaning of the fence synchronization operation is: there may be a special command (barrier command) in a command stream, and when processing the barrier command, it is necessary to ensure that the commands before the barrier command are all executed before the commands after the barrier command are executed. Generally, each command in the command stream needs to enter a different command queue according to its type, then starts to be executed, and after the execution is completed, is deleted from the respective command queue.
After the commands in the command stream enter different queues, the precedence relationship of the commands cannot be judged according to the positions of the commands in the queues, so that the processing of the barrier commands is generally performed according to the following method: when the barrier command is met, the processing of the command stream is suspended, and after all the command queues are emptied, the processing of the command stream is restarted. This approach interrupts the processing of subsequent commands after encountering a barrier command, often because some long-latency commands (such as memory access commands) cause a long processing stall and are inefficient.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a command queue inter-barrier synchronization method and system based on barrier replication and handshake.
In order to solve the technical problems, the invention adopts the technical scheme that:
a command inter-queue barrier synchronization method based on barrier replication and handshake comprises the following implementation steps:
dispatching and distributing the commands in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
the distributed instructions of each command queue are executed respectively, and the fence command of all the queues is ensured to be executed simultaneously through fence handshake when the fence command is executed.
Optionally, the detailed step of allocating the command schedule in the single command stream to a specified number N of command queues for processing includes:
A1) judging whether the command to be dispatched in the command stream is a barrier command, and if not, jumping to execute the step A2); if the command is a barrier command, jumping to execute step A3);
A2) normally allocating to a corresponding command queue in N command queues with specified quantity for processing; jumping to perform step a 1);
A3) copying N parts of barrier commands and distributing the barrier commands to N command queues with the specified number; jump execution step a 1).
Optionally, the detailed step of executing the allocated instruction by each command queue includes:
B1) taking out a command from the command queue as a current command;
B2) judging whether the current command is a barrier command, and if not, jumping to execute the step B3); if the command is a barrier command, jumping to execute step B4);
B3) processing the current command, and jumping to execute the step B1 after the current command is processed);
B4) setting a fence starting mark of the self to be 1 and setting a fence ending mark of the self to be 0;
B5) monitoring fence starting marks of all command queues, and setting a fence ending mark of the command queue to be 1 when the fence starting mark positions of all command queues are monitored to be 1; meanwhile, other command queues monitor the fence starting marks of all the command queues at the same time, and when the fence starting mark positions of all the command queues are monitored to be 1, the fence ending mark is set to be 1;
B6) after the fence ending marks of all the command queues are set to be 1, beginning to monitor fence ending marks of other command queues, and setting the fence starting mark of the command queue to be 0 when the fence ending marks of all the command queues are set to be 1;
B7) and after the self barrier starting mark is set to be 0, deleting the current command from the command queue.
Optionally, the barrier start flag and the barrier end flag are implemented by using two registers, respectively, and the two registers of each command queue are respectively sent to the queue controllers of the other command queues through hardware connection lines.
The invention also provides a command queue inter-fence synchronization system based on fence replication and handshake, which comprises:
the command scheduling and distributing program module is used for distributing the command scheduling in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
and the queue instruction processing program module is used for respectively executing the instructions distributed by each command queue and ensuring that the fence commands of all the queues are executed simultaneously through fence handshake when the fence commands are executed.
The present invention also provides a microprocessor programmed or configured to perform the steps of the fence-copy and handshake based command queue-to-fence synchronization method.
The invention also provides a microprocessor, wherein a firmware program programmed or configured to execute the command queue-to-queue fence synchronization method based on fence copy and handshake is stored in a memory of the microprocessor.
The invention also provides a computer device, which at least comprises a processor and a memory, wherein the processor in the computer device is the microprocessor.
The present invention also provides a computer readable storage medium having stored therein a firmware program programmed or configured to perform the fence-copy and handshake based command queue-to-fence synchronization method.
Compared with the prior art, the invention has the following advantages:
1. the invention avoids the waiting caused by the fence command in the command stream through the fence replication, can not wait for the emptying of each command queue when encountering the fence command, but replicate the fence command, add one copy to each queue respectively, and then can continue to process the subsequent commands.
2. The method finishes the fence synchronization operation among a plurality of queues through the fence handshake among the queues, respectively executes a plurality of queues distributed with the fence synchronization operation, and can properly schedule each command in the processing process so as to improve the processing efficiency of the fence command.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a process of copying barrier commands according to an embodiment of the present invention.
Fig. 3 is a diagram illustrating a barrier handshake process according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the implementation steps of the command inter-queue barrier synchronization method based on barrier replication and handshake of the present embodiment include:
dispatching and distributing the commands in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
the distributed instructions of each command queue are executed respectively, and the fence command of all the queues is ensured to be executed simultaneously through fence handshake when the fence command is executed.
In this embodiment, the detailed step of allocating the command scheduling in a single command stream to the N command queues with the specified number includes:
A1) judging whether the command to be dispatched in the command stream is a barrier command, and if not, jumping to execute the step A2); if the command is a barrier command, jumping to execute step A3);
A2) normally allocating to a corresponding command queue in N command queues with specified quantity for processing; jumping to perform step a 1);
A3) copying N parts of barrier commands and distributing the barrier commands to N command queues with the specified number; jump execution step a 1).
Taking the dual command queue with N =2 shown in fig. 2 as an example, when a single command stream encounters a barrier command during execution, a write request is simultaneously issued to two queues, and if both queues have a free entry at the same time, the command is directly sent to the two queues and deleted from the command stream; if only one of the queues has a free entry, the command is enqueued until the other queue also has a free entry, the command is enqueued to the other queue, and the barrier command is deleted. The commands in each command queue are executed respectively, all the commands before the barrier command can be executed in parallel, after any command is executed, the command can be deleted from the command queue, and before the command before the barrier command is executed, other commands after the barrier command wait in the command queue. After all commands preceding the barrier command have been executed (at which point the commands may not have been removed from the command queue), processing of the barrier command may begin and the barrier handshake process may begin.
In this embodiment, the detailed steps of each command queue executing the allocated instruction respectively include:
B1) taking out a command from the command queue as a current command;
B2) judging whether the current command is a barrier command, and if not, jumping to execute the step B3); if the command is a barrier command, jumping to execute step B4);
B3) processing the current command, and jumping to execute the step B1 after the current command is processed);
B4) setting a fence starting mark of the self to be 1 and setting a fence ending mark of the self to be 0;
B5) monitoring fence starting marks of all command queues, and setting a fence ending mark of the command queue to be 1 when the fence starting mark positions of all command queues are monitored to be 1; meanwhile, other command queues monitor the fence starting marks of all the command queues at the same time, and when the fence starting mark positions of all the command queues are monitored to be 1, the fence ending mark is set to be 1;
B6) after the fence ending marks of all the command queues are set to be 1, beginning to monitor fence ending marks of other command queues, and setting the fence starting mark of the command queue to be 0 when the fence ending marks of all the command queues are set to be 1;
B7) and after the self barrier starting mark is set to be 0, deleting the current command from the command queue.
In this embodiment, the barrier start flag and the barrier end flag are implemented by using two registers, and the two registers of each command queue are respectively sent to the queue controllers of the other command queues through hardware connection lines. Each command queue is provided with two flag signals, namely a fence starting flag and a fence ending flag, and the values of the two flag bits are 0 initially.
Taking the dual command queue of N =2 shown in fig. 2 as an example, the barrier start flag and the barrier end flag for implementing the barrier handshake process are implemented by using two registers, respectively, as shown in fig. 3, the two registers are sent to the controller of the other queue through a hard wire, when the value of one flag changes, the controller of the other queue listens for the current value of the flag through the hard wire, and once it is detected that the barrier start flag bits of the two queues both become 1, the own barrier end flag is set to 1, and the listening for the barrier end flag of the other queue is started. When the barrier end mark of the opposite side is also set to be 1, the barrier start mark position of the opposite side is set to be 0, the barrier command is deleted from the queue, and meanwhile, the barrier end mark is cleared to be 0.
In summary, the main application scenarios of the command queue-to-queue barrier synchronization method based on barrier replication and handshake in the present embodiment are as follows: when a barrier command exists in the command stream, the command before the barrier command needs to be executed before the command after the barrier command. When meeting the fence command in the command stream, the method firstly copies the fence command by N parts and respectively sends the fence command into N command queues; then, each command queue starts to execute the fence command after all commands before the fence command are required to be executed, and when the fence command is executed, the queues perform mutual handshake through two zone bits, fence starting and fence ending, so that the fence commands of all the queues are ensured to be executed simultaneously; finally all queues delete the barrier command and start executing subsequent commands. The command queue-to-command queue barrier synchronization method based on barrier replication and handshake in the embodiment does not need to wait for all independent command queues to be drained when processing barrier commands in a single command stream, and can improve the execution efficiency of the whole command stream.
In addition, this embodiment further provides a command inter-queue barrier synchronization system based on barrier replication and handshake, including:
the command scheduling and distributing program module is used for distributing the command scheduling in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
and the queue instruction processing program module is used for respectively executing the instructions distributed by each command queue and ensuring that the fence commands of all the queues are executed simultaneously through fence handshake when the fence commands are executed.
Furthermore, the present embodiment also provides a microprocessor programmed or configured with the steps of the aforementioned fence copy and handshake based command queue-to-fence synchronization method.
In addition, the present embodiment also provides a microprocessor, wherein a memory in the microprocessor stores a firmware program programmed or configured with the above-mentioned fence synchronization method between command queues based on fence replication and handshake.
In addition, this embodiment further provides a computer device, which at least includes a processor and a memory, where the processor in the computer device is the foregoing microprocessor.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a firmware program programmed or configured with the aforementioned fence-copy and handshake-based inter-command queue fence synchronization method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application wherein instructions, which execute via a flowchart and/or a processor of the computer program product, create means for implementing functions specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
Claims (9)
1. A command inter-queue barrier synchronization method based on barrier replication and handshake is characterized by comprising the following implementation steps:
dispatching and distributing the commands in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
the distributed instructions of each command queue are executed respectively, and the fence command of all the queues is ensured to be executed simultaneously through fence handshake when the fence command is executed.
2. The method of claim 1, wherein the detailed step of allocating command scheduling in a single command stream for processing into a specified number N of command queues comprises:
A1) judging whether the command to be dispatched in the command stream is a barrier command, and if not, jumping to execute the step A2); if the command is a barrier command, jumping to execute step A3);
A2) normally allocating to a corresponding command queue in N command queues with specified quantity for processing; jumping to perform step a 1);
A3) copying N parts of barrier commands and distributing the barrier commands to N command queues with the specified number; jump execution step a 1).
3. The method for fence synchronization between command queues based on fence replication and handshake as claimed in claim 1, wherein the detailed steps of executing the allocated instructions by the respective command queues respectively comprise:
B1) taking out a command from the command queue as a current command;
B2) judging whether the current command is a barrier command, and if not, jumping to execute the step B3); if the command is a barrier command, jumping to execute step B4);
B3) processing the current command, and jumping to execute the step B1 after the current command is processed);
B4) setting a fence starting mark of the self to be 1 and setting a fence ending mark of the self to be 0;
B5) monitoring fence starting marks of all command queues, and setting a fence ending mark of the command queue to be 1 when the fence starting mark positions of all command queues are monitored to be 1; meanwhile, other command queues monitor the fence starting marks of all the command queues at the same time, and when the fence starting mark positions of all the command queues are monitored to be 1, the fence ending mark is set to be 1;
B6) after the fence ending marks of all the command queues are set to be 1, beginning to monitor fence ending marks of other command queues, and setting the fence starting mark of the command queue to be 0 when the fence ending marks of all the command queues are set to be 1;
B7) and after the self barrier starting mark is set to be 0, deleting the current command from the command queue.
4. The method for fence synchronization between command queues based on fence replication and handshake as claimed in claim 3, wherein the fence start flag and the fence end flag are implemented by using two registers, respectively, and the two registers of each command queue are respectively sent to the queue controllers of the other command queues through hardware connection.
5. An inter-command queue barrier synchronization system based on barrier replication and handshake, comprising:
the command scheduling and distributing program module is used for distributing the command scheduling in a single command stream to a specified number N of command queues for processing, and when encountering a barrier command, executing barrier command copying to copy a plurality of barrier commands in the command stream and send the barrier commands into each command queue;
and the queue instruction processing program module is used for respectively executing the instructions distributed by each command queue and ensuring that the fence commands of all the queues are executed simultaneously through fence handshake when the fence commands are executed.
6. A microprocessor, characterized in that the microprocessor is programmed or configured to perform the steps of the fence synchronization method between command queues based on fence replication and handshake as claimed in any one of claims 1 to 4.
7. A microprocessor, wherein a memory in the microprocessor stores a firmware program programmed or configured to perform the fence synchronization method between command queues based on fence replication and handshake as claimed in any one of claims 1 to 4.
8. A computer device comprising at least a processor and a memory, characterized in that the processor in the computer device is a microprocessor according to claim 6 or 7.
9. A computer readable storage medium having stored therein a firmware program programmed or configured to perform the fence synchronization method between command queues based on fence replication and handshake of any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010441746.1A CN111638965B (en) | 2020-05-22 | 2020-05-22 | Command queue inter-fence synchronization method and system based on fence replication and handshake |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010441746.1A CN111638965B (en) | 2020-05-22 | 2020-05-22 | Command queue inter-fence synchronization method and system based on fence replication and handshake |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111638965A true CN111638965A (en) | 2020-09-08 |
CN111638965B CN111638965B (en) | 2023-04-11 |
Family
ID=72329297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010441746.1A Active CN111638965B (en) | 2020-05-22 | 2020-05-22 | Command queue inter-fence synchronization method and system based on fence replication and handshake |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111638965B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114116015A (en) * | 2022-01-21 | 2022-03-01 | 上海登临科技有限公司 | Method and system for managing hardware command queue |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6047334A (en) * | 1997-06-17 | 2000-04-04 | Intel Corporation | System for delaying dequeue of commands received prior to fence command until commands received before fence command are ordered for execution in a fixed sequence |
-
2020
- 2020-05-22 CN CN202010441746.1A patent/CN111638965B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6047334A (en) * | 1997-06-17 | 2000-04-04 | Intel Corporation | System for delaying dequeue of commands received prior to fence command until commands received before fence command are ordered for execution in a fixed sequence |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114116015A (en) * | 2022-01-21 | 2022-03-01 | 上海登临科技有限公司 | Method and system for managing hardware command queue |
Also Published As
Publication number | Publication date |
---|---|
CN111638965B (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11003489B2 (en) | Cause exception message broadcast between processing cores of a GPU in response to indication of exception event | |
US7689809B2 (en) | Transparent return to parallel mode by rampoline instruction subsequent to interrupt processing to accommodate slave processor not supported by operating system | |
CN100568190C (en) | The method of operating and the system that regularly have various durations | |
US20120096205A1 (en) | Inter-virtual machine profiling | |
CN100382033C (en) | Computer system | |
US10861125B2 (en) | Preparing and executing command streams in data processing systems | |
JP2009025939A (en) | Task control method and semiconductor integrated circuit | |
TWI498820B (en) | Processor with second jump execution unit for branch misprediction | |
WO2015131542A1 (en) | Data processing method, device and system | |
CN111638965B (en) | Command queue inter-fence synchronization method and system based on fence replication and handshake | |
CN107368324A (en) | A kind of component upgrade methods, devices and systems | |
US7607133B2 (en) | Interrupt processing control | |
US20130117757A1 (en) | Method and apparatus for scheduling application programs | |
CN111831408A (en) | Asynchronous task processing method and device, electronic equipment and medium | |
CN113342512B (en) | IO task silencing and driving method and device and related equipment | |
CN105487847B (en) | A kind of function dissemination method and equipment | |
JP3524961B2 (en) | Multiprocessor system and task scheduling method in the system | |
CN110955507B (en) | Method for multitask access to same IIC bus based on vxWorks system | |
JPH0845805A (en) | Production control system | |
JPS63636A (en) | Task control system | |
CN115840622A (en) | Task scheduling method and device | |
JP6940283B2 (en) | DMA transfer control device, DMA transfer control method, and DMA transfer control program | |
JP5998851B2 (en) | Data processing system, data processing method, and data processing program | |
CN116719883A (en) | Data synchronization method, device, server and storage medium | |
CN103942096B (en) | A kind of multithreading of data fault-tolerant speculates method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |