WO2017201693A1 - 内存访问指令的调度方法、装置及计算机系统 - Google Patents

内存访问指令的调度方法、装置及计算机系统 Download PDF

Info

Publication number
WO2017201693A1
WO2017201693A1 PCT/CN2016/083339 CN2016083339W WO2017201693A1 WO 2017201693 A1 WO2017201693 A1 WO 2017201693A1 CN 2016083339 W CN2016083339 W CN 2016083339W WO 2017201693 A1 WO2017201693 A1 WO 2017201693A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
memory access
instruction
access instruction
packet
Prior art date
Application number
PCT/CN2016/083339
Other languages
English (en)
French (fr)
Inventor
胡杏
方运潭
肖世海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2016/083339 priority Critical patent/WO2017201693A1/zh
Priority to CN201680004199.2A priority patent/CN108027727B/zh
Publication of WO2017201693A1 publication Critical patent/WO2017201693A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, and a computer system for scheduling memory access instructions.
  • the memory system usually runs a multi-version mechanism to update the memory data, that is, does not directly update the original memory data, but creates a new version of the original memory data for the new version.
  • the data is updated.
  • the memory system accesses the memory data according to the received memory access instruction.
  • the execution order of multiple memory access instructions needs to be strictly controlled to ensure that the memory data can be performed in the event of a system failure. restore.
  • the multi-version mechanism that the memory system runs can have many types, such as shadow updates, redo-logging, undo-logging, etc. These multi-version mechanisms can use memory barrier instructions. To control the execution order of multiple memory access instructions.
  • a memory barrier instruction may be inserted after the first memory access instruction and before the second memory access instruction, so that after the execution of the first memory access instruction is completed
  • the second memory access instruction begins to execute, thereby ensuring that the second memory access instruction can only be executed after the execution of the first memory access instruction, and is not performed in parallel or in advance with the first memory access instruction.
  • the memory barrier instruction can control the execution order of multiple memory access instructions, it also causes some memory access instructions that can be executed in parallel to be executed only serially, which increases the memory access time and affects the memory access performance.
  • the memory write requests in the redo logging multi-version mechanism are classified into: Data (persistent data write request), Log (log) Data write request) and Commit (log control write request), and based on this classification, the following scheduling rules are proposed:
  • Each Commit has a memory barrier instruction that allows the Log that belongs to the same transaction as Commit to be executed in parallel with other transactions before being dispatched to the corresponding memory barrier instruction.
  • the above scheduling method only optimizes the memory scheduling according to the semantics of the redo logging multi-version mechanism, and is only applicable to the redo-logging multi-version mechanism, and is not applicable to other multi-version mechanisms. Therefore, there is a need for a memory access instruction scheduling method suitable for multiple multi-version mechanisms, so that when the multi-version mechanism of the memory system is different, it can still provide memory order guarantee and improve memory access performance.
  • the present invention provides a method, an apparatus, and a computer system for scheduling memory access instructions.
  • the technical solution is as follows:
  • a scheduling method for a memory access instruction is provided, the scheduling method being applied to a computer system, the computer system comprising a memory controller, a scheduler, and a plurality of processor cores; the scheduler and the memory controller respectively And the plurality of processor core connections, wherein a plurality of scheduling queues are cached in the scheduler, and each scheduling queue is used to cache a memory access instruction to be scheduled.
  • a first type of memory barrier instruction and a second type of memory barrier instruction are provided: a first type of memory barrier instruction and a second type of memory barrier instruction, the first type of memory barrier instruction is used to control the order of multiple memory access instructions of the processor core, scoped Is a processor core, the second type of memory barrier instruction is used to control the order of multiple memory access instructions of the entire processor, and the scope is a processor, and the scheduler can access the received memory according to the type of the memory barrier instruction.
  • the instructions are scheduled.
  • the scheduler when the scheduler receives the first memory access instruction sent by the first processor core and the first memory barrier instruction after the first memory access instruction, the first Whether the memory barrier instruction is the first type of memory barrier instruction or the second memory barrier instruction; when it is determined that the first memory barrier instruction is the first type of memory barrier instruction, indicating that the scope of the first memory barrier instruction is the first processing And the scheduler may schedule the first memory access instruction and the first memory barrier instruction to a first scheduling queue in the multiple scheduling queues, where the first scheduling queue refers to the first processor Corresponding scheduling queue for buffering memory access instructions sent by the first processor core. Similarly, memory access instructions and first type memory barrier instructions sent by other processor cores can be scheduled in the same manner.
  • each scheduling queue may be cached with a first type of memory barrier instruction.
  • the scheduler may first determine at least one memory access instruction of the plurality of scheduling queues before the first first type of memory barrier instruction. And send the determined memory barrier instructions to the memory controller together, thereby reducing the memory scheduling time.
  • the sequence of memory access instructions of the processor core does not limit the memory access instructions of other processor cores, thereby reducing the impact of memory barrier instructions on memory performance, improving parallelism, and, by The memory access instruction in the dispatch queue before the first first type of memory barrier instruction is sent to the memory controller, which can further improve the degree of parallelism, and the method does not optimize the memory scheduling according to the semantics of any multi-version mechanism, and can be applied.
  • memory order guarantees and memory access performance can be improved in different types of multi-version mechanisms.
  • the scheduler when scheduling a memory access instruction, may first determine a priority of each memory access instruction in the at least one memory access instruction, The priority is represented by the minimum value of the number of memory access instructions of each bank bank to be accessed after the memory access instruction is sent to the memory controller, and can reflect the degree of parallelism after scheduling the memory access instruction.
  • the highest priority memory access instruction corresponding to each bank is selected, and the selected memory access is sent to the memory controller.
  • the instruction then sends a second type of memory barrier instruction to the memory controller.
  • the determining, by the first priority, the process of determining the priority of the at least one memory access instruction may include: Dividing the location of the type memory barrier instruction to obtain a plurality of packets, each packet including at least one memory access instruction, and then obtaining a memory access instruction set T to be scheduled according to the first packet in each scheduling queue, in the statistics T Accessing the number of memory access instructions of each bank, and counting the number of memory access instructions for accessing each bank in each packet in the T, thereby the number of memory access instructions for accessing each bank according to the T and the T The number of memory access instructions per bank is accessed in each packet, the priority of each packet is calculated, and the priority of each packet is taken as the priority of the memory access instruction within each packet.
  • the priority of the packet s in the memory access instruction set T may be calculated by using the following formula:
  • W_b_s min ⁇ (X 0 -Y s_0 +Y s+1_0 ),(X 1 -Y s_1 +Y s+1_1 ),...(X n-1 -Y s_n-1 +Y s+1_n-1 ) ⁇ ;
  • b denotes the sequence number of the scheduling queue
  • s denotes the sequence number of the currently scheduled packet in the corresponding scheduling queue
  • n denotes the serial number of the bank
  • W_b_s denotes the priority of the packet s
  • X n-1 represents the number of memory access instructions accessing bank n-1 in the T;
  • Y s_n-1 represents the number of memory requests for accessing bank n-1 in the packet s in T;
  • the priority of each packet is calculated, so that the calculation of the priority of each packet is more accurate and the accuracy is improved.
  • the scheduler may further receive the second memory that is sent by the second processor core of the multiple processor cores Accessing the instruction and the second memory barrier instruction after the second memory access instruction, and the second memory barrier instruction is the first type of memory barrier instruction, and in the subsequent process, according to the second memory access instruction and the first memory
  • the second memory access instruction is scheduled by accessing the association of the thread to which the instruction belongs. Specifically, when it is determined that the second thread to which the second memory access instruction belongs is associated with the first thread to which the first memory access instruction belongs, the scheduler may wait for the memory access instruction in the first scheduling queue to be executed. When it is determined that the memory access instruction does not exist in the first scheduling queue, the second memory access instruction and the second memory barrier instruction may be scheduled to the first scheduling queue.
  • the same scheduling queue can schedule the memory access instructions of the associated thread in the same scheduling queue, which reduces the impact of the memory barrier instruction on the entire processor and improves the memory access performance.
  • a fifth possible implementation manner of the first aspect at least one memory access in the plurality of scheduling queues before the first first type memory barrier instruction
  • the scheduler can also use the second type of memory.
  • a barrier instruction is sent to the memory controller to insert a second type of memory barrier instruction after the at least one memory access instruction.
  • the memory controller By inserting the second type of memory barrier instruction after the memory access instruction of the instruction, so that only the second type of memory barrier instruction exists in the memory controller, the memory controller only needs to follow the memory barrier mechanism corresponding to the second type of memory barrier instruction. Scheduling, easy to operate.
  • the memory barrier instruction after determining the memory access instruction sent by any processor core is the second type memory barrier instruction
  • the scheduler can be scheduled according to a memory barrier mechanism corresponding to the second type of memory barrier instruction. Specifically, the scheduler may wait for all memory access instructions in the dispatch queue to be sent to the memory controller, and when it is determined that there is no memory access instruction in all the dispatch queues, the memory access instruction received this time may be A second type of memory barrier instruction following the memory access instruction is dispatched to a scheduling queue corresponding to the processor core.
  • the scheduler may also send a stop sending notification to the first processor core to notify the first processor core to stop transmitting the memory access instruction.
  • the impact of barrier instructions on the entire processor improves memory access performance and saves memory access time.
  • the scheduler may also be configured to determine that the second memory access instruction is scheduled to the first scheduling queue
  • the processor core sends a permission to send a notification to notify the first processor core to send a memory access instruction.
  • the scheduler is located before the first first type of memory barrier instruction in the plurality of scheduling queues After at least one memory access instruction is sent to the memory controller, the scheduler can also delete the first type of memory barrier instruction at the forefront of any of the scheduling queues.
  • the scheduler can continue to perform the next scheduling, improving scheduling efficiency.
  • the third memory barrier instruction sent by the third processor core is received by the scheduler, and the third memory is determined
  • the scheduler may also send a stop sending notification to another processor core other than the third processor core. To notify other processor cores to stop sending memory access instructions.
  • the scheduling The device may also send an allow notification to the processor cores other than the third processor core to notify the other processor cores to send the memory access instruction when it is determined that the third memory access instruction scheduling is completed.
  • the transmission permission notification is sent to other processor cores other than the third processor core, and the limitation of the memory access instruction to other processor cores is released, so that other processor cores can be Normally send memory access instructions to ensure the orderly execution of memory access instructions.
  • a scheduler is provided, the scheduler being applied to a computer system, the computer system comprising a memory controller, the scheduler, and a plurality of processor cores; wherein the scheduler caches a plurality of schedule queues, Each scheduling queue is configured to cache a memory access instruction to be scheduled, and the scheduler includes a module for executing a scheduling method of the memory access instruction provided by the first aspect above.
  • a computer system comprising a processor and a memory controller, the processor comprising a scheduler and a plurality of processor cores, wherein a plurality of scheduling queues are cached in the scheduler, each scheduling The queue is used to cache a memory access instruction to be scheduled; the scheduler is configured to execute the scheduling method of the memory access instruction provided by the first aspect above.
  • the present application provides a computer program product comprising a computer readable storage medium storing program code, the program code comprising instructions for performing the method described in the first aspect above Any method of scheduling memory access instructions.
  • FIG. 1 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for scheduling a memory access instruction according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of scheduling results of a set of memory access instructions provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a scheduling queue provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of scheduling a memory access instruction according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for scheduling a memory access instruction according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of scheduling results of a memory access instruction according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a scheduler according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a scheduler according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a computer system according to an embodiment of the present invention.
  • the computer system includes a memory controller 101, a scheduler 102, and a plurality of processor cores 103.
  • the scheduler 102 is respectively connected to the memory controller 101 and the plurality of processor cores 103, and a plurality of scheduling queues are cached in the scheduler 102, and each scheduling queue is used to cache memory accesses to be scheduled. instruction.
  • the scheduler 102 is configured to maintain the multiple scheduling queues, that is, the scheduler 102 can receive a memory access instruction sent by any processor core 103, and dispatch the received memory access instruction to a corresponding scheduling queue, and The memory access instruction in the dispatch queue is sent to the memory controller 101 according to the corresponding rules.
  • the memory controller 101 can also cache a memory scheduling queue. After receiving the memory access instruction sent by the scheduler 102, the memory controller 101 can also schedule the memory access instruction to the memory scheduling queue.
  • the computer system may further include a memory 104 connected to the memory controller 101, and the memory controller 101 may send a memory access instruction to the memory 104 to implement access to the memory 104.
  • the memory access instruction sent by any processor core 103 will first enter the scheduler 102, and the scheduler 102 will dispatch the memory access instruction to any scheduling queue, and then according to the corresponding
  • the scheduling rule sends a memory access instruction in the scheduling queue to the memory controller 101, and then is scheduled by the memory controller 101 to implement access to the memory 104.
  • the memory 101 can include a plurality of banks (memory banks) 101, each of which can access any bank in the memory 101. And memory access instructions that access different banks can be executed in parallel.
  • the scheduler 1031 is configured to execute the scheduling method shown in the following embodiments.
  • an embodiment of the present invention provides a scheduling method for a memory access instruction applied to the computer system.
  • 2 is a flowchart of a method for scheduling a memory access instruction according to an embodiment of the present invention.
  • the execution body of the method is a scheduler as shown in FIG. 1.
  • the method for scheduling the memory access instruction includes the following steps:
  • the scheduler receives a first memory access instruction sent by a first processor core of the plurality of processor cores and a first memory barrier instruction after the first memory access instruction.
  • memory barrier instructions can be used to control the execution order of multiple memory access instructions to ensure correct program semantics, but at the same time, memory barrier instructions also affect memory access performance.
  • multiple banks can be configured in memory, and memory access instructions that access different banks can be executed in parallel, and memory barrier instructions can cause memory access instructions that can be executed in parallel to be executed only serially, extending memory access time.
  • Figure 3 includes memory access instructions: A, B, C, D, E, F, and G, and A, C, F, and G access bank0, B, D, and E access bank1, C is inserted. After the memory barrier instruction b1, F, the memory barrier instruction b2 is inserted. Although C and D access different banks and can execute in parallel, since there is a memory barrier instruction b1 between C and D, D can only start execution after C execution is completed, and C and D cannot be executed in parallel.
  • the embodiment of the present invention provides a scheduling method for the memory access instruction that supports the general multi-version mechanism.
  • a scheduler is added.
  • the memory access instruction and the memory barrier instruction after the memory access instruction are first sent to the scheduler, and the scheduler according to the scheduler
  • the received memory barrier instruction sends the received memory access instruction to the memory controller.
  • the scheduler caches a scheduling queue, and the scheduling queue is configured to cache a memory access instruction to be scheduled, and the scheduler can schedule a memory access instruction to the memory controller through the scheduling queue.
  • the memory controller cache has a memory scheduling queue, and the memory access instruction sent by the scheduler can be scheduled through the memory scheduling queue to access the memory.
  • the scheduler can cache multiple scheduling queues, and the memory controller can also cache multiple memory scheduling queues. And each scheduling queue and each memory scheduling queue can be a first in first out queue.
  • the scheduler determines that the first memory barrier instruction is a first type of memory barrier instruction, and the first type of memory barrier instruction is used to control a sequence of a plurality of memory access instructions of the processor core.
  • the embodiment of the present invention provides two types of memory barrier instructions: a first type memory barrier instruction and a second type memory barrier instruction.
  • the first type of memory barrier instruction is used to control the order of the plurality of memory access instructions of the processor core, and the scope is the processor core, that is, if any processor core inserts the first type of memory barrier instruction after the transmitted memory access instruction , indicating that after the execution of the memory access instruction before the first type of memory barrier instruction is completed, the memory access instruction sent by the specified processor core after the first type of memory barrier instruction can be executed, and at this time, in addition to the specified processor core
  • the memory access instructions sent by other processor cores can be executed in parallel with the above memory access instructions.
  • the specified processor core is the same processor core or associated processor core as the processor core that sends the memory access instruction, and may include a single processor core or multiple processor cores. Make a limit.
  • the second type of memory barrier instruction is used to control the order of multiple memory access instructions of the entire processor, and the scope is a processor, that is, if any processor core inserts a second type of memory barrier instruction after the transmitted memory access instruction, After the execution of the memory access instruction before the second type of memory barrier instruction is completed, the memory access instruction sent by any processor core in the processor after the second type of memory barrier instruction can be executed.
  • the processor includes mutually independent processor cores X and Y. If the processor core X sends a memory access instruction 1 to the scheduler and sends a first type of memory barrier instruction after the memory access instruction 1, indicating the same processor. The memory access instruction sent by the core and the associated processor core needs to be executed after the memory access instruction 1 is executed. Therefore, the memory access instruction sent after the processor core X is 2 The memory access instruction 1 cannot be executed in parallel with the memory access instruction 1, and the memory access instruction 3 sent by the processor core Y can be executed in parallel with the memory access instruction 1.
  • the processor core X sends the memory access instruction 1 to the scheduler and sends the second type memory barrier instruction after the memory access instruction 1, it indicates that the memory access instruction sent by any processor core in the processor needs to be in the memory.
  • the memory access instruction 2 sent after the processor core X and the memory access instruction 3 sent by the processor core Y cannot be executed in parallel with the memory access instruction 1, and can only wait until the memory access instruction 1 Execution can be performed after the execution is completed.
  • primitives can be added to the system library to provide programmers with an API (Application Programming Interface), so that programmers can use the
  • API Application Programming Interface
  • the API writes and controls the program running on the processor, writes different types of memory barrier instructions for the program, and implements support for multiple multi-version mechanisms.
  • programmers can choose different types of memory barrier instructions based on the semantics and memory access performance of the multi-version mechanism. Specifically, when a thread running by a certain processor core is exclusive, indicating that the thread running by the processor core cannot be executed in parallel with the thread running by another processor core, the second type memory barrier instruction may be selected. To control the read and write order of the thread; and when a thread running by a processor core is not non-exclusive, the first type of memory barrier instruction can be selected to control the read and write order of the thread.
  • the scheduler when the scheduler receives the memory access instruction sent by any processor core and the memory barrier instruction after the memory access instruction, the memory may be determined first.
  • the type of barrier instruction that is, whether the memory barrier instruction is a first type of memory barrier instruction or a second type of memory barrier instruction, and then performs different steps according to different types of memory barrier instructions.
  • the first processor core may be any one of the multiple processor cores, which is not limited in this embodiment of the present invention.
  • the first memory barrier instruction sent by the first processor core is the first type of memory barrier instruction, and the first memory barrier instruction is determined to be the first type of memory barrier.
  • the scheduler may also receive the second type of memory barrier instruction, and the memory access instruction corresponding to the second type of memory barrier instruction may be scheduled according to the method in the embodiment shown in FIG. The embodiment does not limit this.
  • the scheduler dispatches the first memory access instruction and the first memory barrier instruction to the A first scheduling queue in the scheduling queue, the first scheduling queue is configured to cache a memory access instruction sent by the first processor core.
  • the scheduler may use the first memory access instruction and the first memory barrier instruction Scheduling to a first scheduling queue corresponding to the first processor core without restricting other processor cores from transmitting memory access instructions.
  • the thread running by the processor core is not exclusive, the following two situations exist:
  • the thread running by a certain processor is an independent conflict-free thread, that is, the thread running by the processor core and the thread running by any other processor core are not associated threads, and the processor core corresponds to an independent thread.
  • a scheduling queue that is only used to cache memory access instructions sent by the processor core. Then, when the processor core is running the thread, if a memory access instruction is sent to the scheduler, the scheduler directly dispatches the memory access instruction to a scheduling queue corresponding to the processor core.
  • the scheduler dispatches the memory access instruction sent by the two processor cores to the same scheduling queue, that is, the scheduling queue is used to cache the two processors.
  • the memory access instruction sent by the core is sent to the scheduler.
  • two threads are associated threads, which means that the two threads have shared data.
  • Whether the two threads are associated threads can be judged by the compiler of the computer system. For example, the compiler can pre-determine whether the threads are associated with each other, and store the threads that are associated with each other in the associated list. Then, the scheduler can determine whether any two threads are related threads by looking up the association list.
  • the two processor cores correspond to the same scheduling queue. If a memory access instruction sent by one of the processor cores is scheduled to the scheduling queue, in order to ensure the orderly execution of the memory access instruction, the other processor core cannot send the memory again. Access instructions.
  • the scheduler receives the first memory access instruction and the first memory barrier. At the time of the instruction, a stop transmission notification is sent to the other associated processor cores, and the stop transmission notification is used to notify the associated processor core to stop transmitting the memory access instruction.
  • the scheduler may wait for the memory access instruction in the first scheduling queue to be scheduled to be completed, and determine that the first memory access instruction has been sent to the memory controller, and the first memory access instruction and the first Memory barrier instruction Scheduling to a first one of the plurality of scheduling queues.
  • the scheduler may send an allow transmission notification to the associated processor core, the permission to send a notification
  • the processor core that notifies the association sends a memory access instruction.
  • the thread running by the first processor core and the thread running by the second processor core are associated threads, and the second processor core may be the first processor except the plurality of processor cores. Any processor core other than the core is not limited in this embodiment of the present invention.
  • the scheduler determines that a memory access instruction in the first scheduling queue has been sent to the memory controller, and the first memory access instruction and the The first memory barrier instruction is dispatched to a first one of the plurality of scheduling queues. And sending a stop sending notification to the second processor core, the stop sending notification is used to notify the second processor core to stop sending the memory access instruction.
  • an allow transmission notification may be sent to the second processor core, the permission to send notification for notifying the second processing
  • the core sends a memory access instruction.
  • the scheduler sends at least one memory access instruction of the plurality of scheduling queues before the first first type of memory barrier instruction to the memory controller.
  • each scheduling queue may have one or more memory access instructions cached, and the threads to which the memory access instructions in any two scheduling queues belong are not associated threads, that is, memory accesses in different scheduling queues.
  • the instructions can be executed in parallel, so the scheduler can send at least one memory access instruction prior to the first type of memory barrier instruction in all of the dispatch queues to the memory controller.
  • the scheduler can combine the memory access instructions before the first first type of memory barrier instruction in all the scheduling queues, send them to the memory controller together, and then the second type A memory barrier instruction is sent to the memory controller to dispatch the memory access instruction sequence and the second type memory barrier instruction to the memory scheduling queue. Thereafter, for the next scheduling, the scheduler can delete the first type of memory barrier instructions located at the forefront of any of the scheduling queues.
  • the memory access instruction before the first first type memory barrier instruction is scheduled together by using the parallelism of the thread, and the plurality of first type memory barrier instructions are replaced with a second type memory barrier instruction.
  • the memory access instruction may include different types of requests, such as a read request and a write request.
  • different types of memory access instructions may be scheduled together, and different types of memory access instructions may be separately scheduled. After all types of memory access instructions are scheduled, another type of memory access instruction is dispatched. This embodiment of the present invention does not limit this.
  • each processor core runs 1 thread, and the 4 threads use different logging multi-version mechanisms, and are not associated threads with each other, wherein the 4 threads use logging
  • the multi-version mechanism is redo-logging, locking, locking, and undo-loging, respectively, and the scheduler can separately allocate the memory access instructions sent by the four processor cores to the scheduling queue corresponding to each processor core, that is, Yes, BROI 1, BROI 2, BROI3, and BROI4 are shown in Figure a of Figure 4.
  • the first row in the a diagram in FIG. 4 represents the scheduling queues BROI 1, BROI 2, BROI3, and BROI4, the dotted line between the memory access instructions represents the memory barrier instruction in the prior art; the second row represents the corresponding After the semantics of the multi-version mechanism is classified, the memory access instruction class to be scheduled in each scheduling queue; the third row indicates the bank serial number to be accessed by the corresponding memory access instruction in each scheduling queue, and the broken line indicates the embodiment in the embodiment of the present invention.
  • the first type of memory barrier instruction; the fourth line indicates the sequence number of the memory access instruction in each scheduling queue, for example, 1.1 indicates the first memory access instruction in BROI 1, and 2.2 indicates the second memory access instruction in BROI 2, etc. .
  • 3.1, 3.2, and 3.3 in the above memory access instruction are write requests, and others are read requests.
  • the scheduling rules based on different logging multi-version mechanisms may be used, according to the second row in the a diagram of FIG. 4
  • the type of memory access instruction shown dispatches the memory access instruction to the memory controller's memory scheduling queue.
  • the scheduling result is shown in Figure 4b. It can be seen from the b diagram in FIG. 4 that due to the semantic confounding of different multi-version mechanisms, it is difficult to optimize scheduling, and the parallel access degree of the memory access instructions after scheduling in each bank is small and takes a long time.
  • the scheduler may precede the first first type of memory barrier instruction in BROI 1, BROI 2, BROI3, and BROI4.
  • Memory access instructions: 1.1, 1.2, 2.1, 4.1 are dispatched together to the memory controller, and during the scheduling process, a second type of memory barrier instruction is inserted after the memory access instruction, and sent to the memory controller's memory scheduling queue, after which The scheduler can clear the first type of memory barrier instruction at the forefront of all scheduling queues, and perform the next scheduling until the end of the scheduling.
  • step 204 includes steps 2041-2044:
  • the process of determining the priority of the at least one memory access instruction includes:
  • the first packet in each scheduling queue is the current to-be-scheduled packet of each scheduling queue, and the scheduler may add the first packet in each scheduling queue to a set to obtain a memory access instruction set to be scheduled.
  • the embodiment of the present invention indicates that the set of memory access instructions to be scheduled is represented by T.
  • T After adding the first packet in each scheduling queue to T, T is ⁇ 1.1, 1.2, 2.1, 4.1 ⁇ .
  • the number of memory access instructions for accessing each bank in the statistics T is also the number of memory access instructions corresponding to each bank in T.
  • the number of memory access instructions for accessing bank0, bank1...bankn-1 in the statistics T is X 0 and X 1 respectively. . . X n-1 .
  • the number of memory access instructions Y 0_0 for accessing bank0 in Seg0 is 0, and the number of memory access instructions Y 0_1 accessing bank1 in Seg0 is 2.
  • the priority of the packet s in T can be calculated by the following formula:
  • W_b_s min ⁇ (X 0 -Y s_0 +Y s+1_0 ),(X 1 -Y s_1 +Y s+1_1 ),...(X n-1 -Y s_n-1 +Y s+1_n-1 ) ⁇ ;
  • b represents the sequence number of the scheduling queue
  • s represents the sequence number of the currently scheduled packet in the corresponding scheduling queue
  • n represents the serial number of the bank
  • W_b_s represents the priority of the packet s
  • the packet s can be any packet in the T. That is, the current to-be-scheduled packet of any one of the scheduling queues;
  • X n-1 represents the number of memory access instructions accessing bank n-1 in T;
  • Y s_n-1 represents the number of memory requests for accessing bank n-1 in the packet s in T;
  • Y s+1_n-1 represents the number of memory requests for accessing the bank n-1 in the packet s+1.
  • X n-1 -Y s_n-1 +Y s+1_n-1 means: Suppose that the memory access instruction in the packet s is dispatched to the memory scheduling queue, and the memory access instruction in the packet s+1 is added to the T After forming a new memory access instruction set T', the number of memory access instructions accessing bank n-1 in T'. If the memory includes bank0 and bank1, and the number of memory access instructions accessing bank0 in T' is a large value j, but the number of memory access instructions accessing bank1 is a small value k, then bank0 can be accessed in parallel in T' The memory access instruction with bank1 is k.
  • the degree of parallelism when scheduling T' depends on the minimum value min ⁇ (X 0 -Y s_0 +Y s+1_0 ) of the number of memory access instructions accessing bank n-1 in T', (X 1 -Y s_1 +Y s+1_1 ),...(X n-1 -Y s_n-1 +Y s+1_n-1 ) ⁇ .
  • the priority W_b_s of each packet can be calculated, and the priority W_b_s of each packet can be taken as the priority of the memory access instruction in each packet. For example, if the priority W_b_s of Seg0: ⁇ 1.1, 1.2 ⁇ is 4, the priority of the memory access instructions 1.1 and 1.2 in Seg0 is 4. The higher the value of W_b_s is, the higher the priority of the memory access instruction in the packet is, and the scheduler can preferentially schedule the memory access instruction in the packet.
  • the calculated priority may also represent the parallelism of the new set of memory access instructions T to be scheduled after the memory access instruction is scheduled to the memory scheduling queue.
  • This degree of parallelism refers to the number of banks that can access in parallel.
  • the T formed after scheduling is different, and the degree of parallelism is also different. The greater the number of banks that can access the bank in parallel, the greater the degree of parallelism and the higher the priority. Conversely, the smaller the number of banks that can access in parallel, the smaller the degree of parallelism and the lower the priority.
  • the memory access instruction includes a read request and a write request
  • the read request is generally preferentially scheduled, and then the write request is scheduled, therefore,
  • the priority of the read request can be set to a higher value
  • the priority of the write request is set to a lower value
  • the read request is preferentially scheduled
  • the write request is scheduled in a subsequent process. Not limited.
  • the scheduler After the scheduler obtains the priority of each memory access instruction in the T, the request for accessing each bank in the T can be scheduled according to the priority.
  • the scheduler can determine the bank accessed by each memory access instruction, and according to the priority of each memory access instruction and the bank accessed by each memory access instruction. Select the highest priority memory access instruction for each bank. For example, if the memory includes bank0-bank3, the scheduler can respectively select the memory access instruction with the highest priority among the memory access instructions accessing bank0, the memory access instruction with the highest priority among the memory access instructions accessing bank1, and access the bank2. The memory access instruction with the highest priority among the memory access instructions and the memory access instruction with the highest priority among the memory access instructions accessing bank3.
  • the scheduler may randomly select any one or more of the multiple memory access instructions, which is not in this embodiment of the present invention. Make a limit.
  • the scheduler can send the selected memory access instruction to the memory controller together, and after receiving the memory access instruction, the memory controller can schedule the memory access instruction to the memory schedule in the memory controller. queue.
  • the scheduler can directly send the memory access instruction to the memory controller without calculating the priority.
  • one or more memory access instructions located before the first first type memory barrier instruction are not limited.
  • the second type of memory barrier instruction may be located after the selected memory access instruction in the memory scheduling queue, or the second type of memory barrier instruction enters the memory scheduling queue later than the selected memory access instruction, the present invention
  • the embodiment does not limit this.
  • the scheduler may insert a second type of memory barrier instruction after the selected memory access instruction, and send the selected memory access instruction to the memory together with the second type of memory barrier instruction after the selected memory access instruction.
  • the controller may send a second type of memory barrier instruction to the memory controller after the selected memory access instruction is sent to the memory controller, which is not limited in this embodiment of the present invention.
  • the memory controller may schedule the memory access instruction and the second type memory barrier instruction to a memory scheduling queue, and then, The memory access instruction in the memory scheduling queue can be sent to the memory according to the scheduling rule corresponding to the second type of memory barrier instruction, thereby implementing access to the memory.
  • the scheduler may also add the next packet in the same scheduling queue as any one of the packets to T, and the next packet becomes the current scheduling queue.
  • the group to be scheduled that is, when all the memory access instructions before the first type of memory barrier instruction in a scheduling queue are scheduled, the scheduler can activate the memory access instruction after the first type of memory barrier instruction, and then The memory access instruction is added to the memory access instruction set, so that during the scheduling process, the memory access instruction set is continuously updated until the scheduling ends.
  • the scheduler will access the memory access instructions 3.1, 3.2 and 3.3
  • the memory access instruction 3.4 can be activated, that is, the memory access instruction 3.4 is dispatched to the T, and the T is updated to ⁇ 1.1, 1.2, 2.1, 3.4, 4.1 ⁇ .
  • the scheduler may repeatedly perform steps 2041-2044 to continue scheduling the memory access instructions in the memory scheduling queue until the scheduling ends.
  • the scheduling queue shown in the a diagram in FIG. 4 is combined and scheduled by using the second possible implementation manner described above, the scheduling result is as shown in FIG. 4 .
  • the first possible implementation is simple and easy to implement, but may result in incomplete use of bank parallelism, while the second possible implementation may make full use of the bank. Parallelism further saves memory access time.
  • the scheduler receives a second memory access instruction sent by a second processor core of the plurality of processor cores and a second memory barrier instruction after the second memory access instruction, the second memory barrier finger Order the memory barrier instruction for this first type.
  • the scheduler determines that the second thread to which the second memory access instruction belongs is associated with the first thread to which the first memory access instruction belongs, and determines that a memory access instruction in the first scheduling queue has been sent to the memory.
  • the controller dispatches the second memory access instruction and the second memory barrier instruction to the first scheduling queue.
  • the first scheduling queue is used to cache the first processor. And a memory access instruction sent by the second processor core, and the second memory access instruction and the first memory access instruction cannot be executed in parallel. Therefore, when receiving the second memory access instruction sent by the second processor core and the second memory barrier instruction after the second memory access instruction, and determining that the memory access instruction in the first scheduling queue has been sent to the memory The controller then sends the second memory access instruction and the second memory barrier instruction to a first scheduling queue for buffering memory access instructions sent by the first processor core and the second processor core.
  • the scheduler may send a stop sending notification to the first processor core, where The stop sending notification is used to notify the first processor core to stop transmitting the memory access instruction.
  • the scheduler can wait for the memory access instruction scheduling in the first scheduling queue to complete, and when it is determined that the second memory access instruction has been sent to the memory controller, the scheduler can send the permission to the first processor core. Sending a notification, the permission to send a notification is used to notify the first processor core to send a memory access instruction.
  • the impact of the memory barrier instructions in the prior art on the overall performance of the processor can be reduced, for example, When a processor core issues a first type of memory barrier instruction, it only affects the operation of the associated processor core, and does not affect the operation of the entire processor.
  • the order of the memory access instructions of the processor core can be controlled according to the first type of memory barrier instruction, and the memory of other processor cores is not used.
  • Access instructions impose restrictions that reduce the impact of memory barrier instructions on memory performance, increase parallelism, and send memory access instructions that precede the first first type of memory barrier instruction to multiple dispatch queues
  • the memory controller can further improve the degree of parallelism, and the method does not optimize the memory scheduling according to the semantics of any multi-version mechanism, and can be applied to multiple multi-version mechanisms, and can provide memory order guarantee in different types of multi-version mechanisms. And improve memory access Ask performance.
  • FIG. 6 is a flowchart of a method for scheduling a memory access instruction according to an embodiment of the present invention.
  • the execution body of the method is a scheduler as shown in FIG. 1. Referring to FIG. 6, the method includes:
  • the scheduler receives a third memory access instruction sent by a third processor core of the plurality of processor cores and a third memory barrier instruction after the third memory access instruction.
  • the third processor core may be any processor core of the multiple processor cores, which is not limited in this embodiment of the present invention.
  • the scheduler determines that the third memory barrier instruction is a second type of memory barrier instruction.
  • the third memory barrier instruction is a first type of memory barrier instruction or a second type of memory barrier instruction, and then performs different transfer steps according to the determination result.
  • the embodiment of the present invention is described by taking the third memory barrier instruction as the second type memory barrier instruction as an example.
  • the scheduler determines that all memory access instructions in the multiple scheduling queues have been sent to the memory controller, and the third memory access instruction and the third memory barrier instruction are scheduled to a third scheduling queue.
  • the three scheduling queues are used to cache memory access instructions sent by the third processor core.
  • the third memory barrier instruction is a second type of memory barrier instruction, indicating that the third memory barrier instruction is a processor
  • the third memory access instruction cannot be executed in parallel with other memory access instructions, and therefore Determining that all memory access instructions in the plurality of scheduling queues have been sent to the memory controller, that is, when the memory access instruction scheduling in all scheduling queues is completed, the third memory access instruction and the third memory are The barrier instruction is dispatched to the scheduler of the scheduler.
  • the third scheduling queue that caches the memory access instruction sent by the third processor core may be pre-allocated, which is not limited in this embodiment of the present invention.
  • the scheduler includes four scheduling queues: BROI1, BROI2, BROI3, and BROI4, when all memory access instructions in the multiple scheduling queues have been sent to the memory controller.
  • the scheduler can schedule the second type of memory barrier instruction B after the memory access instructions 4.1 and 4.1 to the scheduling queue BROI4 corresponding to the third processor core.
  • the scheduler may further send a stop sending notification to another processor core other than the third processor core, and wait for the scheduler to already The memory access instruction dispatch scheduled to the dispatch queue is completed.
  • the stop sending notification is used to notify other processor cores to stop sending memory access instructions. Then, when the other processor core receives the stop sending notification, it can stop sending the memory access instruction and the memory barrier instruction to the scheduler.
  • the scheduler can schedule for a plurality of scheduling queues, that is, send a third memory access instruction to the memory controller, and send the third memory barrier instruction to the memory controller.
  • the scheduler can send to other processors than the third processor core.
  • the core send allows the notification to be sent, which is used to notify other processor cores to send a memory access instruction.
  • the memory access instruction can be normally sent to the scheduler.
  • scheduling according to the memory barrier mechanism corresponding to the second type memory barrier instruction ensures that the scope of the second type memory barrier instruction is the processor. , effectively controls the execution order of exclusive memory access instructions.
  • the embodiment of the present invention uses a scheduler to perform memory barrier order control and a combined scheduling of memory access instructions, and implements a two-level memory barrier mechanism to control the execution order of memory access instructions.
  • the scheduler includes a control logic and a scheduling queue, and the control logic can schedule memory access instructions to enter and exit the scheduling queue based on a memory barrier mechanism corresponding to the first type of memory barrier instruction and a memory barrier mechanism corresponding to the second type of memory barrier instruction.
  • the scheduler can also schedule memory access instructions together, and only a second type of memory barrier instruction exists after scheduling the memory access instruction sequence in the memory controller, thereby making the memory of the memory controller Only the second type of memory barrier instruction exists in the degree queue, and the memory controller can schedule the memory access instruction in the memory scheduling queue according to the memory barrier mechanism for the second type of memory barrier instruction, and the operation is relatively simple.
  • embodiments of the present invention provide two types of memory barrier instructions: a first type of memory barrier instruction and a second type of memory barrier instruction.
  • the first type of memory barrier instruction can be used to control the execution order of the memory access instruction; for the independent conflict-free thread, the second type of memory barrier instruction can be used to control the execution order of the memory access instruction.
  • Embodiments of the present invention reduce the impact of memory barrier instructions on memory access performance by providing two types of memory barrier instructions and utilizing parallel characteristics of threads to schedule memory access instructions corresponding to memory barrier instructions of the first type, and
  • the two-level memory barrier mechanism, the software can support a plurality of multi-version mechanisms, and the embodiments of the present invention can support various technical solutions according to the prior art for performing semantic analysis on a specific multi-version mechanism and scheduling according to transactions. Multi-version mechanism, with the advantages of generalization.
  • the embodiment of the present invention reduces the number of memory barrier instructions sent to the memory controller by scheduling the memory access instructions before the first type of memory barrier instructions. Bank parallelism saves memory access time.
  • the method provided by the embodiment of the present invention provides a processor core by using two types of memory barrier instructions: a first type memory barrier instruction and a second type memory barrier instruction.
  • the scope of the second type of memory barrier instruction is the entire processor, and according to the parallelism of the thread, the memory access instruction before the first type of memory barrier instruction is mobilized together, thereby reducing the influence of the memory barrier instruction on the memory performance, and improving Parallelism reduces memory access time, and the method does not optimize memory scheduling according to the semantics of any multi-version mechanism. It can be applied to multiple multi-version mechanisms, and can provide memory order guarantee in different types of multi-version mechanisms. And improve memory access performance.
  • FIG. 8 is a schematic structural diagram of a scheduler according to an embodiment of the present invention, where the scheduler is applied to a computer system, where the computer system includes a memory controller, the scheduler, and a plurality of processor cores; and the cacher is cached in the scheduler There are multiple scheduling queues, and each scheduling queue is used to cache memory access instructions to be scheduled.
  • the scheduler includes:
  • the receiving module 801 is configured to receive a first memory access instruction sent by a first processor core of the plurality of processor cores and a first memory barrier instruction after the first memory access instruction;
  • a determining module 802 configured to determine that the first memory barrier instruction is a first type of memory barrier instruction, The first type of memory barrier instruction is used to control the order of the plurality of memory access instructions of the processor core;
  • the scheduling module 803 is configured to schedule the first memory access instruction and the first memory barrier instruction to a first scheduling queue of the plurality of scheduling queues, where the first scheduling queue is used to buffer the sending by the first processor core Memory access instruction;
  • the sending module 804 is configured to send, to the memory controller, at least one memory access instruction that is located before the first first type memory barrier instruction of the plurality of scheduling queues.
  • the scheduler provided by the embodiment of the present invention provides two types of memory barrier instructions: a first type memory barrier instruction and a second type memory barrier instruction, wherein the scope of the first type memory barrier instruction is a processor core, and the second type of memory
  • the scope of the barrier instruction is the entire processor, and according to the parallelism of the thread, the memory access instruction before the first type of memory barrier instruction is mobilized together, which reduces the influence of the memory barrier instruction on the memory performance, and improves the parallelism and the reduction.
  • the memory access time is small, and the scheduler does not optimize memory scheduling according to the semantics of any multi-version mechanism. It can be applied to multiple multi-version mechanisms, and can provide memory order guarantee and improve memory in different types of multi-version mechanisms. Access performance.
  • the scheduler further includes:
  • a priority determining module 805, configured to determine a priority of the at least one memory access instruction, the priority being a minimum number of memory access instructions to be accessed by each memory bank after the memory access instruction is sent to the memory controller Value representation;
  • the selecting module 806 is configured to select, according to the priority of each memory access instruction and the bank accessed by each memory access instruction, the memory access instruction with the highest priority corresponding to each bank from the at least one memory access instruction;
  • the sending module 804 is further configured to send the selected memory access instruction to the memory controller, and send a second type memory barrier instruction to the memory controller, where the second type memory barrier instruction is used to control multiple The order of memory access instructions.
  • the priority determining module 805 is further configured to:
  • the priority determining module 805 is specifically configured to calculate a priority of the packet s in the T by using the following formula:
  • W_b_s min ⁇ (X 0 -Y s_0 +Y s+1_0 ),(X 1 -Y s_1 +Y s+1_1 ),...(X n-1 -Y s_n-1 +Y s+1_n-1 ) ⁇ ;
  • b denotes the sequence number of the scheduling queue
  • s denotes the sequence number of the currently scheduled packet in the corresponding scheduling queue
  • n denotes the serial number of the bank
  • W_b_s denotes the priority of the packet s
  • X n-1 represents the number of memory access instructions accessing bank n-1 in the T;
  • Y s_n-1 represents the number of memory requests for accessing bank n-1 in the packet s in T;
  • the receiving module 801 is further configured to receive a second memory access instruction sent by a second processor core of the plurality of processor cores and a second memory barrier instruction after the second memory access instruction, the second memory barrier instruction For the first type of memory barrier instruction;
  • the determining module 802 is further configured to:
  • the scheduling module 803 is further configured to schedule the second memory access instruction and the second memory barrier instruction to the first scheduling queue.
  • the sending module 804 is further configured to:
  • the second Type memory barrier instructions are used to control the order of multiple memory access instructions for the entire processor.
  • the receiving module 801 is further configured to receive a third memory access instruction sent by a third processor core of the plurality of processor cores and a third memory barrier instruction after the third memory access instruction, the third memory barrier instruction a second type of memory barrier instruction, the second type of memory barrier instruction is used to control the order of the plurality of memory access instructions of the entire processor;
  • the determining module 802 is further configured to determine that all memory access instructions in the multiple scheduling queues have been sent to the memory controller;
  • the scheduling module 803 is further configured to schedule the third memory access instruction and the third memory barrier instruction to a third scheduling queue, where the third scheduling queue is configured to cache a memory access instruction sent by the third processor core.
  • the scheduler provided in FIG. 8 and FIG. 9 can refer to the scheduling method of the memory access instruction described in the foregoing embodiment. For details, refer to the related description of the scheduler in the foregoing embodiment. I will not repeat them here.
  • the embodiment of the present invention further provides a computer program product for a method for scheduling a memory access instruction, comprising: a computer readable storage medium storing program code, the program code comprising instructions for performing the method described in any one of the foregoing method embodiments Method flow.
  • a computer readable storage medium storing program code, the program code comprising instructions for performing the method described in any one of the foregoing method embodiments Method flow.
  • the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (RAM), a solid state disk (SSD), or other nonvolatiles.
  • a non-transitory machine readable medium that can store program code, such as non-volatile memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本申请揭示了一种内存访问指令的调度方法、装置及计算机系统。该方法包括:调度器接收第一处理器核发送的第一内存访问指令以及第一内存屏障指令;如果该第一内存屏障指令为第一类型内存屏障指令,将该第一内存访问指令和该第一内存屏障指令调度至用于缓存第一处理器核所发送内存访问指令的第一调度队列;将多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给内存控制器。本发明减小了内存屏障指令对内存性能的影响,提高了并行度,且该方法未根据任一多版本机制的语义来优化内存调度,能够适用于多种多版本机制,在不同类型的多版本机制中均能提供内存顺序保证并提高内存访问性能。

Description

内存访问指令的调度方法、装置及计算机系统 技术领域
本发明涉及计算机技术领域,特别涉及一种内存访问指令的调度方法、装置及计算机系统。
背景技术
现有技术中内存系统通常会运行多版本机制对内存数据进行更新,也即是,不直接对原有内存数据进行更新操作,而是为原有内存数据创建一个新的版本,对新版本的数据进行更新操作。另外,内存系统根据接收到的内存访问指令对内存数据进行访问,而在持久保存数据的场景下,需要严格控制多个内存访问指令的执行先后顺序,才能保证在系统故障时能够对内存数据进行恢复。
内存系统所运行的多版本机制可以有多种类型,如shadow updates(影更新),redo-logging(重做日志),undo-logging(撤销日志)等,这些多版本机制均可采用内存屏障指令来控制多个内存访问指令的执行先后顺序。
以控制第一内存访问指令和第二内存访问指令的执行先后顺序为例,在第一内存访问指令之后、第二内存访问指令之前可以插入内存屏障指令,使得该第一内存访问指令执行完成之后,第二内存访问指令才开始执行,从而保证了第二内存访问指令只能在第一内存访问指令执行之后执行,而不会与第一内存访问指令并行执行或提前执行。虽然内存屏障指令能够控制多个内存访问指令的执行先后顺序,但是也会导致一些原本可以并行执行的内存访问指令只能够串行执行,增加了内存访问时间,影响了内存访问性能。
为此,针对于常用的redo-logging多版本机制,提供了一种提高内存访问性能的方法,将redo logging多版本机制中的内存写请求分类为:Data(持久数据写请求)、Log(日志数据写请求)和Commit(日志控制项写请求),并基于此分类提出以下调度规则:
1、每个Commit之前有一个内存屏障指令,允许与Commit属于同一事务的Log调度至相应的内存屏障指令之前,与其他事务并行执行;
2、允许内存屏障指令之后的Commit调度至内存屏障指令之前,与其他事务的Data或Log并行执行。
然而,上述调度方法仅是根据redo logging多版本机制的语义来优化内存调度,仅适用于redo-logging多版本机制,而不适用于其它多版本机制。因此,亟需一种适用于多种多版本机制的内存访问指令调度方法,使得在内存系统的多版本机制不同时,仍能为其提供内存顺序保证且提高内存访问性能。
发明内容
为了克服现有技术中存在的问题,本发明提供一种内存访问指令的调度方法、装置及计算机系统。所述技术方案如下:
第一方面,提供了一种内存访问指令的调度方法,该调度方法应用于计算机系统中,该计算机系统包括内存控制器、调度器以及多个处理器核;该调度器分别与该内存控制器以及该多个处理器核连接,在该调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令。
为了优化内存调度,提供两种类型的内存屏障指令:第一类型内存屏障指令和第二内存屏障指令,第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序,作用域是处理器核,第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序,作用域是处理器,则该调度器可以根据内存屏障指令的类型,对接收到的内存访问指令进行调度。
具体为,以第一处理器核为例,当调度器接收到第一处理器核发送的第一内存访问指令以及在第一内存访问指令之后的第一内存屏障指令时,可以先判断该第一内存屏障指令是第一类型内存屏障指令还是第二内存屏障指令;当确定该第一内存屏障指令是第一类型内存屏障指令时,表示该第一内存屏障指令的作用域是该第一处理器核,则该调度器即可将该第一内存访问指令和该第一内存屏障指令调度至该多个调度队列中的第一调度队列,该第一调度队列是指与该第一处理器对应的、用于缓存该第一处理器核发送的内存访问指令的调度队列。同理地,可以采用相同的方式对其他处理器核发送的内存访问指令和第一类型内存屏障指令进行调度。
之后,每个调度队列可能都缓存有第一类型内存屏障指令,为了优化调度,该调度器可以先确定该多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令,并将确定出的内存屏障指令一起发送给该内存控制器,从而减小内存调度时间。
通过提供第一类型内存屏障指令,可以根据第一类型内存屏障指令控制处 理器核的内存访问指令的先后顺序,而不会对其他处理器核的内存访问指令造成限制,从而可以减小内存屏障指令对内存性能的影响,提高了并行度,而且,通过将多个调度队列中位于第一个第一类型内存屏障指令之前的内存访问指令一起发送至内存控制器,可以进一步提高并行度,且该方法未根据任一多版本机制的语义来优化内存调度,能够适用于多种多版本机制,在不同类型的多版本机制中均能提供内存顺序保证并提高内存访问性能。
结合第一方面,在第一方面的第一种可能实现方式中,在对内存访问指令进行调度时,该调度器可以先确定该至少一个内存访问指令中每个内存访问指令的优先级,该优先级由在内存访问指令被发送给该内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示,能够体现调度该内存访问指令之后的并行度。后续过程中,可以根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,选取每个bank对应的优先级最高的内存访问指令,并向该内存控制器发送所选取的内存访问指令,之后再向该内存控制器发送第二类型内存屏障指令。
通过根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,选取每个bank对应的优先级最高的内存访问指令,并向该内存控制器发送所选取的内存访问指令,使得调度器可以根据内存访问指令的优先级对各个bank的内存访问指令进行优化调度,进一步提高了并行度,节省了内存访问时间。
结合第一方面的上述任一种可能实现方式,在第一方面的第二种可能实现方式中,确定该至少一个内存访问指令的优先级的过程具体可以包括:按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分组,每个分组包括至少一个内存访问指令,之后,根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T,统计T中访问每个bank的内存访问指令个数,并统计该T中每个分组中访问每个bank的内存访问指令个数,从而根据该T中访问每个bank的内存访问指令个数以及该T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的优先级作为每个分组内的内存访问指令的优先级。
通过根据内存访问指令集合中访问每个bank的内存访问指令个数以及每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,作为每个分组内的内存访问指令的优先级,从而可以利用内存访问指令被发送至内存控制器后新的调度队列的并行度来表示内存访问指令的优先级,保证了按照 内存访问指令的优先级进行调度时,新的调度队列的并行度更高,进一步提高了整体的并行度,提高了整体的内存访问性能。
结合第一方面的上述任一种可能实现方式,在第一方面的第三种可能实现方式中,可以采用以下公式,计算内存访问指令集合T中分组s的优先级:
W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级;
Xn-1表示该T中访问bank n-1的内存访问指令个数;
Ys_n-1表示该T中分组s中访问bankn-1的内存请求个数;
Ys+1_n-1表示分组s+1中访问bankn-1的内存请求个数,其中分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最后一个分组,则Ys+1_n-1=0。
通过采用上述公式,计算每个分组的优先级,使得每个分组的优先级的计算更为精确,提高了精确度。
结合第一方面的上述任一种可能实现方式,在第一方面的第四种可能实现方式中,该调度器还可以接收该多个处理器核中的第二处理器核发送的第二内存访问指令以及该第二内存访问指令之后的第二内存屏障指令,且该第二内存屏障指令为该第一类型内存屏障指令,后续过程中,可以根据该第二内存访问指令和该第一内存访问指令所属线程的关联性,对该第二内存访问指令进行调度。具体为,当确定该第二内存访问指令所属的第二线程与该第一内存访问指令所属的第一线程是关联线程时,该调度器可以等待该第一调度队列中的内存访问指令执行完成,当确定该第一调度队列中已不存在内存访问指令时,即可将该第二内存访问指令以及该第二内存屏障指令调度至该第一调度队列。
通过判断不同处理器核发送的第一类型内存屏障指令对应的内存访问指令所属的线程是否为关联线程,并将所属的线程为关联线程的内存访问指令和对应的第一类型内存屏障指令调度至同一调度队列,能够将关联线程的内存访问指令在同一调度队列中进行调度,减小了内存屏障指令对整个处理器的影响,提高了内存访问性能。
结合第一方面的上述任一种可能实现方式,在第一方面的第五种可能实现方式中,在将该多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给该内存控制器之后,该调度器还可以将第二类型内存 屏障指令发送给该内存控制器,从而在该至少一个内存访问指令之后插入第二类型内存屏障指令。
通过在该指令一个内存访问指令之后插入第二类型内存屏障指令,使得内存控制器中只存在第二类型内存屏障指令,则内存控制器仅需按照第二类型内存屏障指令对应的内存屏障机制进行调度,操作简便。
结合第一方面的上述任一种可能实现方式,在第一方面的第六种可能实现方式中,当确定任一处理器核发送的内存访问指令之后的内存屏障指令是第二类型内存屏障指令时,由于该第二类型内存屏障指令的作用域为整个处理器,则该调度器可以按照第二类型内存屏障指令对应的内存屏障机制进行调度。具体为,该调度器可以先等待所有调度队列中的内存访问指令均发送给内存控制器,当确定所有调度队列中均不存在内存访问指令时,即可将本次接收到的内存访问指令和该内存访问指令之后的第二类型内存屏障指令调度至与该处理器核对应的调度队列。
结合第一方面的上述任一种可能实现方式,在第一方面的第七种可能实现方式中,在确定第二内存访问指令所属的第二线程与第一内存访问指令所属的第一线程为关联线程之后,该调度器还可以向该第一处理器核发送停止发送通知,以通知该第一处理器核停止发送内存访问指令。
通过向该第一处理器核发送停止发送通知,对第一处理器核的内存访问指令进行控制,保证了第一类型内存屏障指令的作用域为运行关联线程的处理器核,减小了内存屏障指令对整个处理器的影响,提高了内存访问性能,节省了内存访问时间。
结合第一方面的上述任一种可能实现方式,在第一方面的第八种可能实现方式中,该调度器还可以在确定第二内存访问指令已调度至第一调度队列时,向第一处理器核发送允许发送通知,以通知用于通知第一处理器核发送内存访问指令。
通过在第二内存访问指令已调度至该第二调度队列时,向第一处理器核发送允许发送通知,解除了对第一处理器核发送内存访问指令的限制,保证了在第一调度队列中,当第一类型内存屏障指令对应的内存访问指令调度完成时,可以开始执行其他处理器核发送的内存访问指令。
结合第一方面的上述任一种可能实现方式,在第一方面的第九种可能实现方式中,在调度器将多个调度队列中位于第一个第一类型内存屏障指令之前的 至少一个内存访问指令发送给内存控制器之后,该调度器还可以将位于任一调度队列最前端的第一类型内存屏障指令删除。
通过将位于任一第一调度队列最前端的第一类型内存屏障指令删除,使得调度器可以继续执行下一次调度,提高了调度效率。
结合第一方面的上述任一种可能实现方式,在第一方面的第十种可能实现方式中,在调度器接收到第三处理器核发送的第三内存屏障指令,且确定该第三内存屏障指令为第二类型内存屏障指令时,表示第三内存屏障指令的作用域为整个处理器核,则该调度器还可以向除第三处理器核以外的其他处理器核发送停止发送通知,以通知其他处理器核停止发送内存访问指令。
通过在确定第三内存屏障指令为第二类型内存屏障指令时,向除第三处理器核以外的其他处理器核发送停止发送通知,保证了第二类型内存屏障指令的作用域为处理器,有效控制了排他性内存访问指令的执行。
结合第一方面的上述任一种可能实现方式,在第一方面的第十一种可能实现方式中,在将第三内存访问指令和第三内存屏障指令调度至第三调度队列之后,该调度器还可以在确定第三内存访问指令调度完成时,向除第三处理器核以外的其他处理器核发送允许发送通知,以通知其他处理器核发送内存访问指令。
通过当确定第三内存访问指令调度完成时,向除第三处理器核以外的其他处理器核发送允许发送通知,解除了对其他处理器核的内存访问指令的限制,使得其他处理器核可以正常发送内存访问指令,保证了内存访问指令的有序执行。
第二方面,提供了一种调度器,该调度器应用于计算机系统中,该计算机系统包括内存控制器、该调度器以及多个处理器核;在该调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令,该调度器包括用于执行上述第一方面提供的内存访问指令的调度方法的模块。
第三方面,提供了一种计算机系统,该计算机系统包括处理器和内存控制器,该处理器包括调度器和多个处理器核,在该调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令;该调度器用于执行上述第一方面提供的内存访问指令的调度方法。
第四方面,本申请提供了一种计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行上述第一方面中描述的 任意一种内存访问指令的调度方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例的附图。
图1是本发明实施例提供的一种计算机系统的结构示意图;
图2是本发明实施例提供的一种内存访问指令的调度方法流程图;
图3是本发明实施例提供的一组内存访问指令的调度结果示意图;
图4是本发明实施例提供的调度队列示意图;
图5是本发明实施例提供的对内存访问指令的调度示意图;
图6是本发明实施例提供的一种内存访问指令的调度方法流程图;
图7是本发明实施例提供的内存访问指令的调度结果示意图;
图8是本发明实施例提供的一种调度器的结构示意图;
图9是本发明实施例提供的一种调度器的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种计算机系统的结构示意图,参见图1,该计算机系统包括内存控制器101、调度器102和多个处理器核103。
其中,该调度器102分别与该内存控制器101以及该多个处理器核103连接,且,在该调度器102中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令。
该调度器102用于维护该多个调度队列,也即是,该调度器102可以接收任一处理器核103发送的内存访问指令,将接收到的内存访问指令调度至对应的调度队列,并根据相应的规则将调度队列中的内存访问指令发送至内存控制器101。
另外,该内存控制器101还可以缓存有内存调度队列,该内存控制器101接收到调度器102发送的内存访问指令后,还可以将该内存访问指令调度至该内存调度队列。
另外,该计算机系统还可以包括内存104,该内存104与该内存控制器101连接,该内存控制器101可以向内存104发送内存访问指令,实现对内存104的访问。
具体地,当要访问内存104时,任一处理器核103发送的内存访问指令,将先进入该调度器102,调度器102会将该内存访问指令调度至任一调度队列,再根据相应的调度规则,将该调度队列中的内存访问指令发送至内存控制器101,之后再由该内存控制器101进行调度,实现对内存104的访问。
进一步地,该内存101可以包括多个bank(内存库)101,每个内存访问指令可以访问内存101中的任一bank。且访问不同bank的内存访问指令可以并行执行。
该调度器1031用于执行下述实施例所示的调度方法。
在图1所示的计算机系统的基础上,本发明实施例提供了一种应用于该计算机系统的内存访问指令的调度方法。图2是本发明实施例提供的一种内存访问指令的调度方法流程图,该方法的执行主体为如图1所示的调度器,参见图2,该内存访问指令的调度方法包括以下步骤:
201、该调度器接收该多个处理器核中的第一处理器核发送的第一内存访问指令以及该第一内存访问指令之后的第一内存屏障指令。
现有技术中可以采用内存屏障指令来控制多个内存访问指令的执行先后顺序,以保证正确的程序语义,但同时,内存屏障指令也会影响内存访问性能。例如,内存中可以配置多个bank,访问不同bank的内存访问指令可以并行执行,而内存屏障指令可能导致本来可以并行执行的内存访问指令只能串行执行,延长了内存访问时间。
以图3为例,图3中包括内存访问指令:A、B、C、D、E、F和G,且A、C、F和G访问bank0,B、D和E访问bank1,C后插入了内存屏障指令b1,F后插入了内存屏障指令b2。则虽然C和D访问不同的bank,能并行执行,但由于C和D之间存在内存屏障指令b1,所以只能在C执行完成之后,D才能开始执行,而不能并行执行C和D。
为了在采用内存屏障指令控制内存访问指令的执行先后顺序的同时,提高内存访问性能,本发明实施例提供了一种支持通用多版本机制的内存访问指令的调度方法。
本发明实施例中添加了调度器,在任一处理器核向内存控制器发送内存访问指令的过程中,先将内存访问指令和内存访问指令之后的内存屏障指令发送至调度器,由调度器根据接收的内存屏障指令将接收到的内存访问指令发送至内存控制器。其中,调度器缓存有调度队列,该调度队列用于缓存待调度的内存访问指令,该调度器可以将内存访问指令通过该调度队列调度至内存控制器。内存控制器缓存有内存调度队列,可以将调度器发送的内存访问指令通过内存调度队列进行调度,以访问内存。
其中,调度器可以缓存有多个调度队列,内存控制器也可以缓存有多个内存调度队列。且每个调度队列和每个内存调度队列可以为先进先出队列。
202、该调度器确定该第一内存屏障指令为第一类型内存屏障指令,该第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序。
为了进一步解决现有技术中内存屏障指令导致的内存访问性能较低的问题,本发明实施例提供了两种类型的内存屏障指令:第一类型内存屏障指令和第二类型内存屏障指令。
第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序,作用域为处理器核,即如果任一处理器核在发送的内存访问指令之后插入了第一类型内存屏障指令,表示该第一类型内存屏障指令之前的内存访问指令执行完成以后,才能执行该第一类型内存屏障指令之后的、指定处理器核发送的内存访问指令,此时,除指定处理器核之外的其他处理器核发送的内存访问指令可以与上述内存访问指令并行执行。其中,该指定处理器核为与发送该内存访问指令的处理器核相同的处理器核或者关联的处理器核,可以包括单个处理器核或者多个处理器核,本发明实施例对此不做限定。
第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序,作用域为处理器,即如果任一处理器核在发送的内存访问指令之后插入了第二类型内存屏障指令,表示该第二类型内存屏障指令之前的内存访问指令执行完成以后,才能执行该第二类型内存屏障指令之后的、处理器中任一处理器核发送的内存访问指令。
例如,处理器中包括相互独立的处理器核X和Y,若处理器核X向调度器发送了内存访问指令1,并在内存访问指令1之后发送第一类型内存屏障指令,表示相同处理器核和关联处理器核发送的内存访问指令需要在内存访问指令1执行完成之后才能执行,因此,处理器核X之后发送的内存访问指令2 将不能与内存访问指令1并行执行,而处理器核Y发送的内存访问指令3可以与内存访问指令1并行执行。
而如果处理器核X向调度器发送了内存访问指令1,并在内存访问指令1之后发送第二类型内存屏障指令,表示处理器中的任一处理器核发送的内存访问指令均需要在内存访问指令1执行完成之后才能执行,因此,处理器核X之后发送的内存访问指令2以及处理器核Y发送的内存访问指令3均不能与内存访问指令1并行执行,只能等到内存访问指令1执行完成之后才能执行。
基于上述两种类型的内存屏障指令,在实际应用中,可以在系统库中增加原语,为程序员提供内存屏障指令的API(Application Programming Interface,应用程序编程接口),使得程序员可以利用该API对处理器运行的程序进行写控制,为程序写入不同类型的内存屏障指令,实现对多种多版本机制的支持。
另外,程序员还可以在平衡多版本机制的语义和内存访问性能的基础上,选择不同类型的内存屏障指令。具体地,当某一处理器核运行的线程具有排它性(Exclusive)时,表示该处理器核运行的线程不能与其他处理器核运行的线程并行执行,则可以选用第二类型内存屏障指令来控制该线程的读写顺序;而当某一处理器核运行的线程不具有排他性(Non-exclusive)时,则可以选用第一类型内存屏障指令来控制该线程的读写顺序。
则为了根据不同类型的内存屏障指令来控制线程的读写顺序,当该调度器接收到任一处理器核发送的内存访问指令以及该内存访问指令之后的内存屏障指令时,可以先确定该内存屏障指令的类型,也即是,判断该内存屏障指令是第一类型内存屏障指令还是第二类型内存屏障指令,之后,再根据不同类型的内存屏障指令执行不同的步骤。
需要说明的一点是,该第一处理器核可以为该多个处理器核中的任一处理器核,本发明实施例对此也不做限定。
需要说明的另一点是,本发明实施例仅以该第一处理器核发送的第一内存屏障指令为第一类型内存屏障指令为例,当确定该第一内存屏障指令为第一类型内存屏障指令时,即可执行下述步骤203-206。另外,该调度器还可能接收到第二类型内存屏障指令,此时可根据下述图6所示实施例中的方法,对该第二类型内存屏障指令对应的内存访问指令进行调度,本发明实施例对此不做限定。
203、该调度器将该第一内存访问指令和该第一内存屏障指令调度至该多 个调度队列中的第一调度队列,该第一调度队列用于缓存该第一处理器核发送的内存访问指令。
当确定该第一内存屏障指令为第一类型内存屏障指令时,表示该第一处理器核运行的线程不具有排他性,则该调度器可以将该第一内存访问指令和该第一内存屏障指令调度至与该第一处理器核对应的第一调度队列,而无需限制其他处理器核发送内存访问指令。本发明实施例中,当处理器核运行的线程不具有排他性时,还存在以下两种情况:
1)如果某一处理器运行的线程是独立无冲突线程,也即是,该处理器核运行的线程与其他任一处理器核运行的线程均不是关联线程,该处理器核对应一个独立的调度队列,该调度队列只用于缓存该处理器核发送的内存访问指令。则当该处理器核在运行该线程的过程中,如果向该调度器发送了内存访问指令,该调度器会直接将该内存访问指令调度至该处理器核对应的调度队列中。
2)如果任两个处理器核运行的线程互为关联线程,该任两个处理器核对应于同一调度队列,则当该两个处理器核在运行互为关联线程的两个线程的过程中,如果向该调度器发送了内存访问指令,该调度器会将该两个处理器核发送的内存访问指令调度至同一调度队列,也即是,该调度队列用于缓存该两个处理器核发送的内存访问指令。
其中,两个线程为关联线程是指这两个线程具有共享数据。两个线程是否为关联线程可以由计算机系统的编译器判断,例如,该编译器可以预先判断多个线程之间是否互为关联线程,并将确定出互为关联线程的线程储存在关联列表中,则该调度器即可通过查找该关联列表,判断任两个线程是否互为关联线程。
两个处理器核对应于同一调度队列,如果其中一个处理器核发送的内存访问指令被调度至该调度队列,则为了保证内存访问指令的有序执行,另一处理器核将不能再发送内存访问指令。
因此,当该第一处理器核发送的第一内存访问指令所属的线程与其他处理器核运行的线程为关联线程时,该调度器在接收到该第一内存访问指令和该第一内存屏障指令时,向其他关联的处理器核发送停止发送通知,该停止发送通知用于通知关联的处理器核停止发送内存访问指令。该调度器可以等待该第一调度队列中的内存访问指令调度完成,确定该第一调度队列中的内存访问指令已经被发送给该内存控制器时,将该第一内存访问指令和该第一内存屏障指令 调度至该多个调度队列中的第一调度队列。
在后续过程中,当确定该第一内存访问指令和该第一内存屏障指令已发送至该内存控制器,该调度器即可向该关联的处理器核发送允许发送通知,该允许发送通知用于通知该关联的处理器核发送内存访问指令。
本发明实施例中,假设该第一处理器核运行的线程与第二处理器核运行的线程为关联线程,该第二处理器核可以为该多个处理器核中除该第一处理器核之外的任一处理器核,本发明实施例对此不做限定。该调度器在接收到该第一内存访问指令和该第一内存屏障指令时,确定该第一调度队列中的内存访问指令已经被发送给该内存控制器,将该第一内存访问指令和该第一内存屏障指令调度至该多个调度队列中的第一调度队列。并向该第二处理器核发送停止发送通知,该停止发送通知用于通知该第二处理器核停止发送内存访问指令。
之后,当确定该第一内存访问指令和该第一内存屏障指令已发送至该内存控制器,即可向该第二处理器核发送允许发送通知,该允许发送通知用于通知该第二处理器核发送内存访问指令。
204、该调度器将该多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给该内存控制器。
通过执行上述步骤,每个调度队列可能都缓存有一个或多个内存访问指令,且任两个调度队列中的内存访问指令所属的线程均不是关联线程,也即是不同调度队列中的内存访问指令可以并行执行,因此,该调度器可以将所有调度队列中的第一类型内存屏障指令之前的至少一个内存访问指令一起发送给该内存控制器。
在第一种可能实现方式中,该调度器可以将所有调度队列中第一个第一类型内存屏障指令之前的内存访问指令进行组合,一起发送给该内存控制器,并在之后将第二类型内存屏障指令发送给该内存控制器,以将该内存访问指令序列和该第二类型内存屏障指令调度至该内存调度队列。之后,为了进行下一次的调度,该调度器可以将位于任一调度队列最前端的第一类型内存屏障指令删除。
本发明实施例中,通过利用线程的并行性将第一个第一类型内存屏障指令之前的内存访问指令一起调度,并将多个第一类型内存屏障指令替换为一个第二类型内存屏障指令,从而减小了总体内存屏障指令的数目,增加了内存系统中调度序列的bank并行度,减小了内存访问时间。
需要说明的是,内存访问指令可以包括读请求和写请求等不同类型的请求,在进行调度时,可以将不同类型的内存访问指令一起进行调度,也可以将不同类型的内存访问指令分别进行调度,在一种类型的内存访问指令全部调度完成后,再调度另一种类型的内存访问指令。本发明实施例对此不做限定。
例如,若处理器包括4个处理器核,每个处理器核运行1个线程,且该4个线程使用不同的logging多版本机制,互相均不是关联线程,其中,该4个线程使用的logging多版本机制分别为redo-logging、locking、locking以及undo-loging,则该调度器可以将该4个处理器核发送的内存访问指令分别调度至每个处理器核对应的调度队列中,也即是,BROI 1、BROI 2、BROI3和BROI4中,如图4中的a图所示。
其中,图4中的a图中的第一行表示调度队列BROI 1、BROI 2、BROI3和BROI4,内存访问指令之间的虚线表示现有技术中的内存屏障指令;第二行表示按照对应的多版本机制的语义进行分类后,每个调度队列中需要调度的内存访问指令类别;第三行表示每个调度队列中对应的内存访问指令所要访问的bank序号,虚线表示本发明实施例中的第一类型内存屏障指令;第四行表示每个调度队列中内存访问指令的序号,例如,1.1表示BROI 1中的第1个内存访问指令,2.2表示BROI 2中的第2个内存访问指令等。假设上述内存访问指令中的3.1、3.2、3.3为写请求,其他的为读请求。
则先对写请求3.1、3.2、3.3进行调度之后,针对于其他的读请求,现有技术中,可以基于不同的logging多版本机制的调度规则,根据如图4中的a图中第二行所示的内存访问指令的类别,将内存访问指令调度至内存控制器的内存调度队列,调度结果如图4中的b图所示。由图4中的b图可以看出,由于不同多版本机制的语义混杂,难以优化调度,调度后的内存访问指令在各bank中的并行度较小,花费时间较长。
而本发明实施例中,若采用步骤204的第一种可能实现方式,如图5所示,该调度器可以将BROI 1、BROI 2、BROI3和BROI4中第一个第一类型内存屏障指令之前的内存访问指令:1.1、1.2、2.1、4.1一起调度至内存控制器,并在调度的过程中在这些内存访问指令之后插入第二类型内存屏障指令,发送至内存控制器的内存调度队列,之后,该调度器可以清除所有调度队列最前端的第一类型内存屏障指令,进行下一次调度,直至调度结束。最后的调度结果如图4中的c图所示,将图4中的c图与图4中的b图进行比较可以明显看出,与 现有技术相比,采用第一种可能实现方式提高了并行度,减少了内存屏障指令的数目,节省了内存访问时间。
在第二种可能实现方式中,步骤204包括步骤2041-2044:
2041、确定该至少一个内存访问指令的优先级,该优先级由在内存访问指令被发送给该内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示。
具体地,确定该至少一个内存访问指令的优先级的过程包括:
1)、按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分组,每个分组包括至少一个内存访问指令。
2)、根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T。其中,每个调度队列中的第一个分组为每个调度队列的当前待调度分组,该调度器可以将每个调度队列中第一个分组添加至一个集合,得到待调度的内存访问指令集合。为了便于说明,本发明实施例将待调度的内存访问指令集合用T表示。
例如,参见图4中的a图和图5,以图4中的BROI1为例,按照s_barrier的位置可以划分为3个Seg:Seg0{1.1、1.2}、Seg1{1.3}、Seg2{1.4}。将每个调度队列中第一个分组添加至T后,T即为{1.1,1.2,2.1,4.1}。
3)统计T中访问每个bank的内存访问指令个数。
其中,统计T中访问每个bank的内存访问指令个数,也即是统计每个bank在T中对应的内存访问指令个数。
例如,统计T中访问bank0、bank1…bankn-1的内存访问指令个数,分别为X0、X1。。。Xn-1
4)统计T中的每个分组中访问每个bank的内存访问指令个数。
例如,统计每个分组Seg中访问bank0、bank1…bankn-1的内存访问指令个数。参见图5,以Seg0{1.1、1.2}为例,Seg0中访问bank0的内存访问指令个数Y0_0为0,Seg0中访问bank1的内存访问指令个数Y0_1为2。
5)根据T中访问每个bank的内存访问指令个数以及T中的每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的优先级作为每个分组内的内存访问指令的优先级。
具体地,可以采用以下公式进行计算T中分组s的优先级:
W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级,分组s可以为T中的任一分组,也即是任一个调度队列的当前待调度分组;
Xn-1表示T中访问bank n-1的内存访问指令个数;
Ys_n-1表示T中的分组s中访问bank n-1的内存请求个数;
Ys+1_n-1表示分组s+1中访问bank n-1的内存请求个数,其中本发明实施例对每个调度队列单独进行编号,同一调度队列中的分组依次编号,分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最后一个分组,则Ys+1_n-1=0。其中,s为大于或等于0的整数,n为正整数。
其中,Xn-1-Ys_n-1+Ys+1_n-1表示:假设将分组s中的内存访问指令调度至内存调度队列,并将分组s+1中的内存访问指令添加至T后,形成新的内存访问指令集合T’后,T’中访问bank n-1的内存访问指令的个数。若内存包括bank0和bank1,且T’中访问bank0的内存访问指令的个数为较大值j,但是访问bank1的内存访问指令的个数为较小值k,则T’中能够并行访问bank0和bank1的内存访问指令即为k。因此,对T’进行调度时的并行度大小取决于T’中访问bank n-1的内存访问指令的个数的最小值min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)}。
则根据上述公式可以计算出每个分组的优先级W_b_s,而该每个分组的优先级W_b_s即可作为该每个分组内的内存访问指令的优先级。例如,若Seg0:{1.1、1.2}的优先级W_b_s为4,则Seg0中的内存访问指令1.1和1.2的优先级均为4。其中,W_b_s值越大,表示该分组内的内存访问指令的优先级越高,该调度器即可优先调度该分组内的内存访问指令。
通过上述计算方法,计算的优先级还可以表示内存访问指令调度至该内存调度队列后新的待调度的内存访问指令集合T的并行度。该并行度是指能够并行访问bank的数量。针对于每个待调度的内存访问指令,调度后所形成的T不同,并行度也不同。能够并行访问bank的数量越大,表示并行度越大,优先级越高;反之,能够并行访问bank的数量越小,表示并行度越小,优先级越低。
另外,由于内存访问指令包括读请求和写请求,且读请求和写请求之间存在总线turn around的时延,所以一般优先调度读请求,再调度写请求,因此, 可以将读请求的优先级设为较高的值,将写请求的优先级设为较低的值,优先对读请求进行调度,后续过程中再对写请求进行调度,本发明实施例对此不做限定。
2042、根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,选取每个bank对应的优先级最高的内存访问指令。
该调度器得到T中每个内存访问指令的优先级后,即可按照优先级调度T中访问各个bank的请求。
具体地,该调度器得到T中每个内存访问指令的优先级后,可以确定每个内存访问指令访问的bank,并根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,选取每个bank对应的优先级最高的内存访问指令。例如,若该内存中包括bank0-bank3,则该调度器可以分别选取访问bank0的内存访问指令中优先级最高的内存访问指令、访问bank1的内存访问指令中优先级最高的内存访问指令、访问bank2的内存访问指令中优先级最高的内存访问指令以及访问bank3的内存访问指令中优先级最高的内存访问指令。
其中,若任一bank对应的优先级最高的内存访问指令有多个,则该调度器可以随机选取该多个内存访问指令中的任一个或多个内存访问指令,本发明实施例对此不做限定。
2043、向该内存控制器发送选取的内存访问指令。
也即是,该调度器可以将选取的内存访问指令一起发送至该内存控制器,且该内存控制器接收到该内存访问指令后,可以将该内存访问指令调度至内存控制器中的内存调度队列。
需要说明的是,本发明实施例仅是以该多个调度队列中位于第一个第一类型内存屏障指令之前的内存访问指令有多个为例进行说明,而实际应用中,当该多个调度队列中位于第一个第一类型内存屏障指令之前的内存访问指令只要一个时,该调度器还可以无需计算优先级,直接将该一个内存访问指令发送至内存控制器。本发明实施例对位于第一个第一类型内存屏障指令之前的内存访问指令有一个还是多个不做限定。
2044、向该内存控制器发送第二类型内存屏障指令。
其中,该第二类型内存屏障指令可以在该内存调度队列中位于该选取的内存访问指令之后,或者,该第二类型内存屏障指令进入内存调度队列的时间晚于选取的内存访问指令,本发明实施例对此不做限定。
也即是,该调度器可以在选取的内存访问指令之后插入第二类型内存屏障指令,并将选取的内存访问指令和该选取的内存访问指令之后的第二类型内存屏障指令一起发送至该内存控制器;或者,该调度器也可以在向该内存控制器发送选取的内存访问指令之后,向该内存控制器发送第二类型内存屏障指令,本发明实施例对此不做限定。
当该内存控制器接收到该调度器发送的内存访问指令和第二类型内存屏障指令之后,该内存控制器可以将该内存访问指令和该第二类型内存屏障指令调度至内存调度队列,之后,可以根据该第二类型内存屏障指令对应的调度规则将内存调度队列中的内存访问指令发送至内存,从而实现对内存的访问。
通过在发送选取的内存访问指令之后,向该内存控制器发送第二类型内存屏障指令,可以使得内存控制器中只存在第二类型内存屏障指令,则内存控制器仅需按照第二类型内存屏障指令对应的内存屏障机制进行调度,操作简便。
另外,当T中的任一分组已调度至该内存调度队列时,该调度器还可以将与该任一分组位于同一调度队列的下一分组添加至T,下一分组即成为调度队列的当前待调度分组。也即是,当某个调度队列中的第一类型内存屏障指令前的所有内存访问指令都调度完毕时,该调度器即可该第一类型内存屏障指令后的内存访问指令激活,并将之后的内存访问指令添加至内存访问指令集合,从而在调度的过程中,不断对内存访问指令集合进行更新,直至调度结束。
例如,以图4为例,若初始阶段内存访问指令集合T包括{1.1,1.2,2.1,3.1,3.2,3.3,4.1},则当该调度器将该T中的内存访问指令3.1、3.2和3.3调度至内存调度队列后,即可将内存访问指令3.4激活,也即是,将内存访问指令3.4调度至该T中,此时该T更新为{1.1,1.2,2.1,3.4,4.1}。之后,该调度器可以重复执行步骤2041-2044,继续对该内存调度队列中的内存访问指令进行调度,直至调度结束。
例如,若采用上述第二种可能实现方式对图4中的a图所示的调度队列进行合并调度,调度结果如图4中的d图所示。将图4中的c图和d图进行比较可以看出,第一种可能实现方式操作简单,容易实现,但可能导致bank并行度利用不完全,而第二种可能实现方式则可以充分利用bank并行度,进一步节省内存访问时间。
205、该调度器接收该多个处理器核中的第二处理器核发送的第二内存访问指令以及该第二内存访问指令之后的第二内存屏障指令,该第二内存屏障指 令为该第一类型内存屏障指令。
206、该调度器确定该第二内存访问指令所属的第二线程与该第一内存访问指令所属的第一线程为关联线程,确定该第一调度队列中的内存访问指令已经被发送给该内存控制器,将该第二内存访问指令以及该第二内存屏障指令调度至该第一调度队列。
本发明实施例中,当确定该第二内存访问指令所属的第二线程与该第一内存访问指令所属的第一线程为关联线程时,表示该第一调度队列用于缓存该第一处理器和该第二处理器核发送的内存访问指令,而且,该第二内存访问指令和该第一内存访问指令不能并行执行。因此,当接收到第二处理器核发送的第二内存访问指令以及该第二内存访问指令之后的第二内存屏障指令,而且确定该第一调度队列中的内存访问指令已经被发送给该内存控制器时,该调度器再将该第二内存访问指令和第二内存屏障指令发送至用于缓存该第一处理器核和该第二处理器核发送的内存访问指令的第一调度队列。
另外,为了保证该第一调度队列中内存访问指令的执行顺序,该调度器将该第二内存访问指令调度至该第一调度队列后,可以向该第一处理器核发送停止发送通知,该停止发送通知用于通知该第一处理器核停止发送内存访问指令。
之后,该调度器可以等待该第一调度队列中的内存访问指令调度完成,当确定该第二内存访问指令已发送至该内存控制器,该调度器即可向该第一处理器核发送允许发送通知,该允许发送通知用于通知该第一处理器核发送内存访问指令。
本发明实施例中,通过将内存屏障指令分成两种类型的内存屏障指令,并利用两种不同的内存屏障机制进行控制,可以减少现有技术中内存屏障指令对处理器整体性能的影响,例如,当一个处理器核发出第一类型内存屏障指令时,最多仅会影响到相关联的处理器核的运行,而不会影响到整个处理器的运行。
综上所述,本发明实施例中,通过提供第一类型内存屏障指令,可以根据第一类型内存屏障指令控制处理器核的内存访问指令的先后顺序,而不会对其他处理器核的内存访问指令造成限制,从而可以减小内存屏障指令对内存性能的影响,提高了并行度,而且,通过将多个调度队列中位于第一个第一类型内存屏障指令之前的内存访问指令一起发送至内存控制器,可以进一步提高并行度,且该方法未根据任一多版本机制的语义来优化内存调度,能够适用于多种多版本机制,在不同类型的多版本机制中均能提供内存顺序保证并提高内存访 问性能。
需要说明的是,上述实施例仅以接收到的内存屏障指令为第一类型内存屏障指令,并根据第一类型内存屏障指令对应的内存屏障机制对内存访问指令进行调度为例进行说明,而实际应用中,接收到的内存屏障指令也可能是第二类型内存屏障指令,接下来将以接收到第二类型内存屏障指令,并根据第二类型内存屏障指令对应的内存屏障机制对内存访问指令进行调度为例进行说明。图6是本发明实施例提供的一种内存访问指令的调度方法流程图,该方法的执行主体为如图1所示的调度器,参见图6,该方法包括:
601、该调度器接收该多个处理器核中的第三处理器核发送的第三内存访问指令以及该第三内存访问指令之后的第三内存屏障指令。
其中,该第三处理器核可以为该多个处理器核的任一处理器核,本发明实施例对此不做限定。
602、该调度器确定该第三内存屏障指令为第二类型内存屏障指令。
在接收到第三处理器核发送的第三内存访问指令以及该第三内存访问指令之后的第三内存屏障指令之后,需要先对该第三内存屏障指令的类型进行判断,也即是,判断该第三内存屏障指令是第一类型内存屏障指令还是第二类型内存屏障指令,再根据判断结果执行不同的调动步骤。本发明实施例仅以该第三内存屏障指令为第二类型内存屏障指令为例进行说明。
603、该调度器确定该多个调度队列中的所有内存访问指令都已经被发送给该内存控制器,将该第三内存访问指令和该第三内存屏障指令调度至第三调度队列,该第三调度队列用于缓存该第三处理器核发送的内存访问指令。
当确定该第三内存屏障指令为第二类型内存屏障指令时,表示该第三内存屏障指令的作用域为处理器,该第三内存访问指令不能与其他的内存访问指令并行执行,因此需要在确定该多个调度队列中的所有内存访问指令都已经被发送给该内存控制器,也即是所有调度队列中的内存访问指令调度完成时,才能将该第三内存访问指令和该第三内存屏障指令调度至该调度器的调度队列。
在确定该多个调度队列中的所有内存访问指令都已经被发送给该内存控制器之后,才对第三内存访问指令和该第三内存屏障指令进行调度,还可以保证该调度队列中仅有第二类型内存屏障指令,避免该调度器中同时包括第一类型内存屏障指令和第二类型内存屏障指令而导致调度混乱。
其中,缓存该第三处理器核发送的内存访问指令的第三调度队列可以预先分配,本发明实施例对此不做限定。
例如,参见图7,该调度器中包括4个调度队列,分别为:BROI1、BROI2、BROI3和BROI4,则当该多个调度队列中的所有内存访问指令都已经被发送给该内存控制器时,该调度器即可将内存访问指令4.1和4.1之后的第二类型内存屏障指令B调度至该第三处理器核对应的调度队列BROI4。
另外,当确定该第三内存屏障指令为第二类型内存屏障指令时,该调度器还可以向除该第三处理器核以外的其他处理器核发送停止发送通知,并等待该调度器中已经调度至调度队列的内存访问指令调度完成。
其中,该停止发送通知用于通知其他处理器核停止发送内存访问指令。则当其他处理器核接收到该停止发送通知时,即可停止向该调度器发送内存访问指令以及内存屏障指令。
之后,该调度器即可针对多个调度队列进行调度,即将第三内存访问指令发送给内存控制器,并将该第三内存屏障指令也发送给内存控制器。当确定该第三内存访问指令已经被发送给内存控制器之后,该调度队列中已没有第二类型内存屏障指令,此时,该调度器可以向除该第三处理器核以外的其他处理器核发送允许发送通知,该允许发送通知用于通知其他处理器核发送内存访问指令。则当其他处理器核接收到该调度器发送的允许发送通知时,即可正常向该调度器发送内存访问指令。
本发明实施例中,通过在确定内存屏障指令为第二类型内存屏障指令时,按照第二类型内存屏障指令对应的内存屏障机制进行调度,保证了第二类型内存屏障指令的作用域为处理器,有效控制了排他性内存访问指令的执行先后顺序。
结合上述两个实施例,在硬件上,本发明实施例采用调度器进行内存屏障保序控制以及内存访问指令的合并调度,实现了基于两级内存屏障机制来控制内存访问指令的执行先后顺序。具体地,该调度器包括控制逻辑和调度队列,该控制逻辑可以基于第一类型内存屏障指令对应的内存屏障机制和第二类型内存屏障指令对应的内存屏障机制,调度内存访问指令进出该调度队列,另外,该调度器还可以将内存访问指令一起调度,且调度至内存控制器中的内存访问指令序列之后仅存在第二类型内存屏障指令,从而使得该内存控制器的内存调 度队列中仅存在第二类型内存屏障指令,则该内存控制器即可仅根据针对第二类型内存屏障指令的内存屏障机制对内存调度队列中的内存访问指令进行调度,操作较为简便。
在软件上,本发明实施例提供了两种类型的内存屏障指令:第一类型内存屏障指令和第二类型内存屏障指令。对于关联线程,可以采用第一类型内存屏障指令来控制内存访问指令的执行顺序;对于独立无冲突线程,可以采用第二类型内存屏障指令来控制内存访问指令的执行顺序。
本发明实施例通过提供两种类型的内存屏障指令,并利用线程的并行特性将第一类型内存屏障指令对应的内存访问指令一起调度,减少了内存屏障指令对内存访问性能的影响,而且,基于两级内存屏障机制,软件可支持多种多版本机制,相比于现有技术中对某种具体的多版本机制进行语义分析、根据事务进行调度的技术方案,本发明实施例可支持各种多版本机制、具有通用化的优点。另外,由于Non-exclusive线程对应的内存访问指令可以一起调度,本发明实施例通过对第一类型内存屏障指令之前的内存访问指令一起调度,减少了发送给内存控制器的内存屏障指令数目,提高了bank并行度,节省了内存访问时间。
综上所述,本发明实施例提供的方法,通过提供两种类型内存屏障指令:第一类型内存屏障指令和第二类型内存屏障指令,第一类型内存屏障指令的作用域为处理器核,第二类型内存屏障指令的作用域为整个处理器,并根据线程的并行性,将第一类型内存屏障指令之前的内存访问指令一起调动,减小了内存屏障指令对内存性能的影响,提高了并行度,减小了内存访问时间,且该方法未根据任一多版本机制的语义来优化内存调度,能够适用于多种多版本机制,在不同类型的多版本机制中均能提供内存顺序保证并提高内存访问性能。
图8是本发明实施例提供的一种调度器的结构示意图,该调度器应用于计算机系统中,该计算机系统包括内存控制器、该调度器以及多个处理器核;在该调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令,该调度器包括:
接收模块801,用于接收该多个处理器核中的第一处理器核发送的第一内存访问指令以及该第一内存访问指令之后的第一内存屏障指令;
确定模块802,用于确定该第一内存屏障指令为第一类型内存屏障指令, 该第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序;
调度模块803,用于将该第一内存访问指令和该第一内存屏障指令调度至该多个调度队列中的第一调度队列,该第一调度队列用于缓存该第一处理器核发送的内存访问指令;
发送模块804,用于将该多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给该内存控制器。
本发明实施例提供的调度器,通过提供两种类型内存屏障指令:第一类型内存屏障指令和第二类型内存屏障指令,第一类型内存屏障指令的作用域为处理器核,第二类型内存屏障指令的作用域为整个处理器,并根据线程的并行性,将第一类型内存屏障指令之前的内存访问指令一起调动,减小了内存屏障指令对内存性能的影响,提高了并行度,减小了内存访问时间,且该调度器未根据任一多版本机制的语义来优化内存调度,能够适用于多种多版本机制,在不同类型的多版本机制中均能提供内存顺序保证并提高内存访问性能。
可选地,参见图9,该调度器还包括:
优先级确定模块805,用于确定该至少一个内存访问指令的优先级,该优先级由在内存访问指令被发送给该内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示;
选择模块806,用于根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,从该至少一个内存访问指令中选取每个bank对应的优先级最高的内存访问指令;
该发送模块804,还用于向该内存控制器发送选取的内存访问指令,并向该内存控制器发送第二类型内存屏障指令,该第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
可选地,该优先级确定模块805还用于:
按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分组,每个分组包括至少一个内存访问指令;
根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T;统计该T中访问每个bank的内存访问指令个数;
统计该T中每个分组中访问每个bank的内存访问指令个数;
根据该T中访问每个bank的内存访问指令个数以及该T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的 优先级作为每个分组内的内存访问指令的优先级。
可选地,该优先级确定模块805具体用于采用以下公式,计算该T中分组s的优先级:
W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级;
Xn-1表示该T中访问bankn-1的内存访问指令个数;
Ys_n-1表示该T中分组s中访问bankn-1的内存请求个数;
Ys+1_n-1表示分组s+1中访问bankn-1的内存请求个数,其中分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最后一个分组,则Ys+1_n-1=0。
可选地:
该接收模块801,还用于接收该多个处理器核中的第二处理器核发送的第二内存访问指令以及该第二内存访问指令之后的第二内存屏障指令,该第二内存屏障指令为该第一类型内存屏障指令;
该确定模块802还用于:
确定该第二内存访问指令所属的第二线程与该第一内存访问指令所属的第一线程为关联线程;
确定该第一调度队列中的内存访问指令已经被发送给该内存控制器;
该调度模块803,还用于将该第二内存访问指令以及该第二内存屏障指令调度至该第一调度队列。
可选地,该发送模块804还用于:
在将该多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给该内存控制器之后,将第二类型内存屏障指令发送给该内存控制器,该第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
可选地:
该接收模块801,还用于接收该多个处理器核中的第三处理器核发送的第三内存访问指令以及该第三内存访问指令之后的第三内存屏障指令,该第三内存屏障指令为第二类型内存屏障指令,该第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序;
该确定模块802,还用于确定该多个调度队列中的所有内存访问指令都已经被发送给该内存控制器;
该调度模块803,还用于将该第三内存访问指令和该第三内存屏障指令调度至第三调度队列,该第三调度队列用于缓存该第三处理器核发送的内存访问指令。
本发明实施例图8和图9所提供的调度器可以参见前述实施例描述的内存访问指令的调度方法,具体的,各个模块功能的详细描述可参见前述实施例中对调度器的相关描述,在此不再赘述。
本发明实施例还提供一种内存访问指令的调度方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行前述任意一个方法实施例所述的方法流程。本领域普通技术人员可以理解,前述的存储介质包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Disk,SSD)或者其他非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。
需要说明的是,本申请所提供的实施例仅仅是示意性的。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。

Claims (21)

  1. 一种内存访问指令的调度方法,其特征在于,应用于计算机系统中,所述计算机系统包括内存控制器、调度器以及多个处理器核;所述调度器分别与所述内存控制器以及所述多个处理器核连接,在所述调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令,所述方法包括:
    所述调度器接收所述多个处理器核中的第一处理器核发送的第一内存访问指令以及所述第一内存访问指令之后的第一内存屏障指令;
    所述调度器确定所述第一内存屏障指令为第一类型内存屏障指令,所述第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序;
    所述调度器将所述第一内存访问指令和所述第一内存屏障指令调度至所述多个调度队列中的第一调度队列,所述第一调度队列用于缓存所述第一处理器核发送的内存访问指令;
    所述调度器将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器。
  2. 根据权利要求1所述的方法,其特征在于,所述调度器将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给内存控制器包括:
    确定所述至少一个内存访问指令的优先级,所述优先级由在内存访问指令被发送给所述内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示;
    根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,从所述至少一个内存访问指令中选取每个bank对应的优先级最高的内存访问指令;
    向所述内存控制器发送选取的内存访问指令;
    向所述内存控制器发送第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述至少一个内存访问指令的优先级,包括:
    按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分 组,每个分组包括至少一个内存访问指令;
    根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T;
    统计所述T中访问每个bank的内存访问指令个数;
    统计所述T中每个分组中访问每个bank的内存访问指令个数;
    根据所述T中访问每个bank的内存访问指令个数以及所述T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的优先级作为每个分组内的内存访问指令的优先级。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述T中访问每个bank的内存访问指令个数以及所述T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的优先级作为每个分组内的内存访问指令的优先级,包括:
    采用以下公式,计算所述T中分组s的优先级:
    W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
    其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级;
    Xn-1表示所述T中访问bankn-1的内存访问指令个数;
    Ys_n-1表示所述T中分组s中访问bankn-1的内存请求个数;
    Ys+1_n-1表示分组s+1中访问bankn-1的内存请求个数,其中分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最后一个分组,则Ys+1_n-1=0。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    所述调度器接收所述多个处理器核中的第二处理器核发送的第二内存访问指令以及所述第二内存访问指令之后的第二内存屏障指令,所述第二内存屏障指令为所述第一类型内存屏障指令;
    确定所述第二内存访问指令所属的第二线程与所述第一内存访问指令所属的第一线程为关联线程;
    确定所述第一调度队列中的内存访问指令已经被发送给所述内存控制器;
    将所述第二内存访问指令以及所述第二内存屏障指令调度至所述第一调度队列。
  6. 根据权利要求1所述的方法,其特征在于,所述调度器将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器之后,所述方法还包括:
    所述调度器将第二类型内存屏障指令发送给所述内存控制器,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述调度器接收所述多个处理器核中的第三处理器核发送的第三内存访问指令以及所述第三内存访问指令之后的第三内存屏障指令,所述第三内存屏障指令为第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序;
    确定所述多个调度队列中的所有内存访问指令都已经被发送给所述内存控制器;
    将所述第三内存访问指令和所述第三内存屏障指令调度至第三调度队列,所述第三调度队列用于缓存所述第三处理器核发送的内存访问指令。
  8. 一种调度器,其特征在于,所述调度器应用于计算机系统中,所述计算机系统包括内存控制器、所述调度器以及多个处理器核;在所述调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令,所述调度器包括:
    接收模块,用于接收所述多个处理器核中的第一处理器核发送的第一内存访问指令以及所述第一内存访问指令之后的第一内存屏障指令;
    确定模块,用于确定所述第一内存屏障指令为第一类型内存屏障指令,所述第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序;
    调度模块,用于将所述第一内存访问指令和所述第一内存屏障指令调度至所述多个调度队列中的第一调度队列,所述第一调度队列用于缓存所述第一处理器核发送的内存访问指令;
    发送模块,用于将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器。
  9. 根据权利要求8所述的调度器,其特征在于,所述调度器还包括:
    优先级确定模块,用于确定所述至少一个内存访问指令的优先级,所述优先级由在内存访问指令被发送给所述内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示;
    选择模块,用于根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,从所述至少一个内存访问指令中选取每个bank对应的优先级最高的内存访问指令;
    所述发送模块,还用于向所述内存控制器发送选取的内存访问指令,并向所述内存控制器发送第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  10. 根据权利要求9所述的调度器,其特征在于,所述优先级确定模块还用于:
    按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分组,每个分组包括至少一个内存访问指令;
    根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T;统计所述T中访问每个bank的内存访问指令个数;
    统计所述T中每个分组中访问每个bank的内存访问指令个数;
    根据所述T中访问每个bank的内存访问指令个数以及所述T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的优先级作为每个分组内的内存访问指令的优先级。
  11. 根据权利要求10所述的调度器,其特征在于,所述优先级确定模块具体用于采用以下公式,计算所述T中分组s的优先级:
    W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
    其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级;
    Xn-1表示所述T中访问bankn-1的内存访问指令个数;
    Ys_n-1表示所述T中分组s中访问bankn-1的内存请求个数;
    Ys+1_n-1表示分组s+1中访问bankn-1的内存请求个数,其中分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最 后一个分组,则Ys+1_n-1=0。
  12. 根据权利要求8所述的调度器,其特征在于:
    所述接收模块,还用于接收所述多个处理器核中的第二处理器核发送的第二内存访问指令以及所述第二内存访问指令之后的第二内存屏障指令,所述第二内存屏障指令为所述第一类型内存屏障指令;
    所述确定模块还用于:
    确定所述第二内存访问指令所属的第二线程与所述第一内存访问指令所属的第一线程为关联线程;
    确定所述第一调度队列中的内存访问指令已经被发送给所述内存控制器;
    所述调度模块,还用于将所述第二内存访问指令以及所述第二内存屏障指令调度至所述第一调度队列。
  13. 根据权利要求8所述的调度器,其特征在于,所述发送模块还用于:
    在将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器之后,将第二类型内存屏障指令发送给所述内存控制器,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  14. 根据权利要求8所述的调度器,其特征在于:
    所述接收模块,还用于接收所述多个处理器核中的第三处理器核发送的第三内存访问指令以及所述第三内存访问指令之后的第三内存屏障指令,所述第三内存屏障指令为第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序;
    所述确定模块,还用于确定所述多个调度队列中的所有内存访问指令都已经被发送给所述内存控制器;
    所述调度模块,还用于将所述第三内存访问指令和所述第三内存屏障指令调度至第三调度队列,所述第三调度队列用于缓存所述第三处理器核发送的内存访问指令。
  15. 一种计算机系统,其特征在于,所述计算机系统包括处理器和内存控 制器,所述处理器包括调度器和多个处理器核,在所述调度器中缓存有多个调度队列,每个调度队列用于缓存待调度的内存访问指令;
    所述调度器用于:
    接收所述多个处理器核中的第一处理器核发送的第一内存访问指令以及所述第一内存访问指令之后的第一内存屏障指令;
    确定所述第一内存屏障指令为第一类型内存屏障指令,所述第一类型内存屏障指令用于控制处理器核的多个内存访问指令的顺序;
    将所述第一内存访问指令和所述第一内存屏障指令调度至所述多个调度队列中的第一调度队列,所述第一调度队列用于缓存所述第一处理器核发送的内存访问指令;
    将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器。
  16. 根据权利要求15所述的系统,其特征在于,所述调度器具体用于:
    确定所述至少一个内存访问指令的优先级,所述优先级由在内存访问指令被发送给所述内存控制器之后待访问每个内存库bank的内存访问指令个数的最小值表示;
    根据每个内存访问指令的优先级以及每个内存访问指令访问的bank,从所述至少一个内存访问指令中选取每个bank对应的优先级最高的内存访问指令;
    向所述内存控制器发送选取的内存访问指令;
    向所述内存控制器发送第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  17. 根据权利要求16所述的系统,其特征在于,所述调度器具体用于:
    按照每个调度队列中第一类型内存屏障指令的位置进行划分,得到多个分组,每个分组包括至少一个内存访问指令;
    根据每个调度队列中的第一个分组获得待调度的内存访问指令集合T;
    统计所述T中访问每个bank的内存访问指令个数;
    统计所述T中每个分组中访问每个bank的内存访问指令个数;
    根据所述T中访问每个bank的内存访问指令个数以及所述T中每个分组中访问每个bank的内存访问指令个数,计算每个分组的优先级,并将每个分组的 优先级作为每个分组内的内存访问指令的优先级。
  18. 根据权利要求17所述的系统,其特征在于,所述调度器具体用于:
    采用以下公式,计算所述T中分组s的优先级:
    W_b_s=min{(X0-Ys_0+Ys+1_0),(X1-Ys_1+Ys+1_1),…(Xn-1-Ys_n-1+Ys+1_n-1)};
    其中,b表示调度队列的序号,s表示当前所调度的分组在对应的调度队列中的序号,n表示bank的序号,W_b_s表示分组s的优先级;
    Xn-1表示所述T中访问bankn-1的内存访问指令个数;
    Ys_n-1表示所述T中分组s中访问bankn-1的内存请求个数;
    Ys+1_n-1表示分组s+1中访问bankn-1的内存请求个数,其中分组s+1是指与分组s位于同一调度队列且位于分组s之后的分组,若分组s为调度队列中的最后一个分组,则Ys+1_n-1=0。
  19. 根据权利要求15所述的系统,其特征在于,所述调度器还用于:
    接收所述多个处理器核中的第二处理器核发送的第二内存访问指令以及所述第二内存访问指令之后的第二内存屏障指令,所述第二内存屏障指令为所述第一类型内存屏障指令;
    确定所述第二内存访问指令所属的第二线程与所述第一内存访问指令所属的第一线程为关联线程;
    确定所述第一调度队列中的内存访问指令已经被发送给所述内存控制器;
    将所述第二内存访问指令以及所述第二内存屏障指令调度至所述第一调度队列。
  20. 根据权利要求15所述的系统,其特征在于,所述调度器还用于:
    在将所述多个调度队列中位于第一个第一类型内存屏障指令之前的至少一个内存访问指令发送给所述内存控制器之后,将第二类型内存屏障指令发送给所述内存控制器,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序。
  21. 根据权利要求15所述的系统,其特征在于,所述调度器还用于:
    接收所述多个处理器核中的第三处理器核发送的第三内存访问指令以及所 述第三内存访问指令之后的第三内存屏障指令,所述第三内存屏障指令为第二类型内存屏障指令,所述第二类型内存屏障指令用于控制整个处理器的多个内存访问指令的顺序;
    确定所述多个调度队列中的所有内存访问指令都已经被发送给所述内存控制器;
    将所述第三内存访问指令和所述第三内存屏障指令调度至第三调度队列,所述第三调度队列用于缓存所述第三处理器核发送的内存访问指令。
PCT/CN2016/083339 2016-05-25 2016-05-25 内存访问指令的调度方法、装置及计算机系统 WO2017201693A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/083339 WO2017201693A1 (zh) 2016-05-25 2016-05-25 内存访问指令的调度方法、装置及计算机系统
CN201680004199.2A CN108027727B (zh) 2016-05-25 2016-05-25 内存访问指令的调度方法、装置及计算机系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/083339 WO2017201693A1 (zh) 2016-05-25 2016-05-25 内存访问指令的调度方法、装置及计算机系统

Publications (1)

Publication Number Publication Date
WO2017201693A1 true WO2017201693A1 (zh) 2017-11-30

Family

ID=60410956

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/083339 WO2017201693A1 (zh) 2016-05-25 2016-05-25 内存访问指令的调度方法、装置及计算机系统

Country Status (2)

Country Link
CN (1) CN108027727B (zh)
WO (1) WO2017201693A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399219B (zh) * 2019-07-18 2022-05-17 深圳云天励飞技术有限公司 内存访问方法、dmc及存储介质
CN112783613B (zh) * 2019-11-07 2024-03-01 北京沃东天骏信息技术有限公司 一种单元调度的方法和装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706715A (zh) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN104407997A (zh) * 2014-12-18 2015-03-11 中国人民解放军国防科学技术大学 带有指令动态调度功能的与非型闪存单通道同步控制器

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6795878B2 (en) * 2000-12-11 2004-09-21 International Business Machines Corporation Verifying cumulative ordering of memory instructions
US9223578B2 (en) * 2009-09-25 2015-12-29 Nvidia Corporation Coalescing memory barrier operations across multiple parallel threads
US8997103B2 (en) * 2009-09-25 2015-03-31 Nvidia Corporation N-way memory barrier operation coalescing
CN101950282B (zh) * 2010-08-30 2012-05-23 中国科学院计算技术研究所 一种多处理器系统及其同步引擎

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706715A (zh) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN104407997A (zh) * 2014-12-18 2015-03-11 中国人民解放军国防科学技术大学 带有指令动态调度功能的与非型闪存单通道同步控制器

Also Published As

Publication number Publication date
CN108027727B (zh) 2020-09-08
CN108027727A (zh) 2018-05-11

Similar Documents

Publication Publication Date Title
US9588810B2 (en) Parallelism-aware memory request scheduling in shared memory controllers
US8458721B2 (en) System and method for implementing hierarchical queue-based locks using flat combining
US8850131B2 (en) Memory request scheduling based on thread criticality
US8689221B2 (en) Speculative thread execution and asynchronous conflict events
US9830189B2 (en) Multi-threaded queuing system for pattern matching
US7861042B2 (en) Processor acquisition of ownership of access coordinator for shared resource
JP2017526996A5 (zh)
US9411757B2 (en) Memory interface
US20130014120A1 (en) Fair Software Locking Across a Non-Coherent Interconnect
US10678588B2 (en) Reducing synchronization of tasks in latency-tolerant task-parallel systems
US9047138B2 (en) Apparatus and method for thread scheduling and lock acquisition order control based on deterministic progress index
US20210303375A1 (en) Multithreaded lossy queue protocol
US10019283B2 (en) Predicting a context portion to move between a context buffer and registers based on context portions previously used by at least one other thread
US20170168727A1 (en) Single-stage arbiter/scheduler for a memory system comprising a volatile memory and a shared cache
US8806168B2 (en) Producer-consumer data transfer using piecewise circular queue
CN104978321A (zh) 构造数据队列的方法、装置及从其插入和消费对象的方法
US8566532B2 (en) Management of multipurpose command queues in a multilevel cache hierarchy
WO2017201693A1 (zh) 内存访问指令的调度方法、装置及计算机系统
US11386007B1 (en) Methods and systems for fast allocation of fragmented caches
US20220317926A1 (en) Approach for enforcing ordering between memory-centric and core-centric memory operations
EP2707793B1 (en) Request to own chaining in multi-socketed systems
US9483502B2 (en) Computational processing device including request holding units each provided for each type of commands, information processing device including request holding units each provided for each type of commands, and method of controlling information processing device
US8930628B2 (en) Managing in-line store throughput reduction
Solaiman et al. A read-write-validate approach to optimistic concurrency control for energy efficiency of resource-constrained systems
JP2018205918A (ja) 演算処理装置及び演算処理装置の制御方法

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16902683

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16902683

Country of ref document: EP

Kind code of ref document: A1