CN110716797A - DDR4 performance balance scheduling structure and method for multiple request sources - Google Patents

DDR4 performance balance scheduling structure and method for multiple request sources Download PDF

Info

Publication number
CN110716797A
CN110716797A CN201910852485.XA CN201910852485A CN110716797A CN 110716797 A CN110716797 A CN 110716797A CN 201910852485 A CN201910852485 A CN 201910852485A CN 110716797 A CN110716797 A CN 110716797A
Authority
CN
China
Prior art keywords
source
access request
memory access
request
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910852485.XA
Other languages
Chinese (zh)
Inventor
吕晖
石嵩
刘骁
吴铁彬
赵冠一
王迪
王吉军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910852485.XA priority Critical patent/CN110716797A/en
Publication of CN110716797A publication Critical patent/CN110716797A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention relates to the technical field of computer system structures and processor microstructures, in particular to a DDR4 performance balance scheduling structure and method for multiple request sources. A DDR4 performance balance scheduling structure facing multiple request sources comprises a plurality of access request scheduling buffers, which are used for improving the access bandwidth of the corresponding access request sources; the multi-source continuous arbitration component is used for selecting a memory access request to transmit; the DDR4 memory device is used for receiving the access requests transmitted by the multi-source continuous arbitration component. A DDR4 performance balance scheduling method facing multiple request sources comprises L1, setting a memory access request scheduling buffer for the memory access request of each memory access request source; and L2, the multi-source continuous arbitration component selects a memory access request to transmit through an arbitration strategy. The memory access request scheduling buffer is respectively arranged facing to multiple request sources, so that the memory access bandwidth can be improved, the influence on the memory access delay is reduced, and the comprehensive memory access performance of the system is improved.

Description

DDR4 performance balance scheduling structure and method for multiple request sources
Technical Field
The invention relates to the technical field of computer system structures and processor microstructures, in particular to a DDR4 performance balance scheduling structure and method for multiple request sources.
Background
With the continuous progress of processor manufacturing process and the requirement of practical application, a multi-core structure becomes the development trend of the current high-performance microprocessor, and the problem of memory access bandwidth and memory access delay of a many-core processor system which are difficult to match with the memory wall of computing performance is a hot problem researched in the current computer system structure.
In order to improve the memory access bandwidth, a large-scale memory access request scheduling buffer is adopted in the many-core processor. However, large-scale memory scheduling buffering can greatly increase memory access delay. For a multi-source access request sequence, some sources need higher access bandwidth, and some sources need shorter access delay, namely some sources are sensitive to delay and some sources are sensitive to bandwidth. While the traditional scheduling mechanism does not take the source characteristics into consideration, the traditional scheduling mechanism can achieve the maximization of bandwidth utilization but is not beneficial to exerting the overall performance of the chip.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a DDR4 performance balance scheduling structure and method facing multiple request sources.
The technical scheme adopted by the invention for solving the technical problems is as follows: a DDR4 performance balance scheduling structure for multiple request sources comprises
The memory access request scheduling buffers are used for improving the memory access bandwidth corresponding to the memory access request source;
the multi-source continuous arbitration component is used for selecting a memory access request to transmit;
the DDR4 memory device is used for receiving the access requests transmitted by the multi-source continuous arbitration component.
Preferably, the memory request scheduling buffer comprises a bandwidth sensitive memory scheduling buffer and a delay sensitive memory scheduling buffer.
Preferably, the bandwidth-sensitive memory scheduling buffer comprises
The memory entry is used for recording the information of the memory access request;
the empty entry queue is used for mounting storage entries in a queue form;
a binary tree is scheduled for organizing the memory entries in the form of a binary tree.
Preferably, the information of the access request comprises access request information, a left sub pointer of an entry and a right sub pointer of the entry.
A DDR4 performance balance scheduling method facing multiple request sources comprises
L1, setting a memory access request scheduling buffer for the memory access request of each memory access request source;
l2, the multi-source continuous arbitration component selects a memory access request to transmit through an arbitration strategy;
l3. the ddr4 memory device receives access requests transmitted by a multi-source sequential arbitration component.
Preferably, the arbitration policy in L2 is specifically,
1) the highest priority is rotated among the arbitration sources;
2) the arbitration source with the highest priority will release the highest priority to set the arbitration source priority to be the lowest and add one to the priorities of all other arbitration sources after the consecutive arbitration of N access requests has passed.
Preferably, a bandwidth sensitive memory access scheduling buffer is set for the memory access request of a bandwidth sensitive memory access request source in the L1;
and setting a delay sensitive access scheduling buffer for the access request of the delay sensitive access request source.
Preferably, the bandwidth-sensitive memory scheduling buffer comprises
The memory entry is used for recording the information of the memory access request;
the empty entry queue is used for mounting storage entries in a queue form;
a binary tree is scheduled for organizing the memory entries in the form of a binary tree.
Preferably, the information of the access request comprises access request information, a left sub pointer of an entry and a right sub pointer of the entry.
The invention has the advantages that the invention respectively sets a plurality of memory access request scheduling buffers facing to a plurality of request sources, can improve the memory access bandwidth, simultaneously reduce the influence on the memory access delay and improve the comprehensive memory access performance of the system.
Drawings
Fig. 1 is a schematic structural diagram of a multi-request-source-oriented DDR4 performance balancing scheduling structure according to the present application.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
As shown in FIG. 1, in a first embodiment, a DDR4 performance balancing scheduling structure for multiple request sources includes
And the memory access request scheduling buffers are used for improving the memory access bandwidth corresponding to the memory access request source.
And the multi-source continuous arbitration component is used for selecting one access request to transmit.
The DDR4 memory device is used for receiving the access requests transmitted by the multi-source continuous arbitration component.
The method and the device for scheduling the memory access requests face to multiple request sources and are respectively provided with multiple memory access request scheduling buffers, so that the mutual influence of memory access delay among the multiple sources can be reduced, and a scheduling structure with balanced memory access delay and memory access bandwidth performance is obtained.
A DDR4 performance balance scheduling method facing multiple request sources comprises
L1, setting a memory access request scheduling buffer for the memory access request of each memory access request source;
l2, the multi-source continuous arbitration component selects a memory access request to transmit through an arbitration strategy;
l3. the ddr4 memory device receives access requests transmitted by a multi-source sequential arbitration component.
Among them, the arbitration policy is specifically,
1) the highest priority is rotated among the arbitration sources;
2) the arbitration source with the highest priority will release the highest priority to set the arbitration source priority to be the lowest and add one to the priorities of all other arbitration sources after the consecutive arbitration of N access requests has passed.
Firstly, a memory access request scheduling buffer is respectively arranged for each memory access request source, and the buffer is used for mining the locality in a memory access sequence and improving the memory access bandwidth.
Second, a plurality of memory access request scheduling buffers select one memory access request to be transmitted to the DDR4 memory device through the arbitration component with one more selection. The arbitration policy is:
(1) the highest priority is rotated between the various arbitration sources.
(2) The arbitration source with the highest priority will release the highest priority only after N requests of consecutive arbitration pass. While releasing the highest priority, the arbitration source priority is set to be lowest, and the priorities of all other arbitration sources are increased by one.
For example, the initial priorities of the first arbitration source, the second arbitration source, the third arbitration source and the fourth arbitration source are all 1, the highest priority is given to the first arbitration source first, when the first arbitration source continuously arbitrates 5 requests, the priority is reduced to 0, and the priorities of the second arbitration source, the third arbitration source and the fourth arbitration source are all 2. Then, the highest priority is given to the second arbitration source, after the second arbitration source continuously arbitrates 5 requests to pass, the priority is reduced to 0, the priority of the first arbitration source is changed to 1, and the priorities of the third arbitration source and the fourth arbitration source are changed to 3. Then, the highest priority is given to the third arbitration source, when the third arbitration source continuously arbitrates 5 requests to pass, the priority is reduced to 0, the priority of the first arbitration source is changed to 2, the priority of the second arbitration source is changed to 1, and the priority of the fourth arbitration source is changed to 4. Then, the highest priority is given to the arbitration source four first, when the arbitration source four continuously arbitrates 5 requests to pass, the priority is reduced to 0, the priority of the arbitration source one is changed into 3, the priority of the arbitration source two is changed into 2, and the priority of the arbitration source three is changed into 1. Then, the highest priority is given to the first arbitration source, when the first arbitration source continuously arbitrates 5 requests to pass, the priority is reduced to 0, the priority of the second arbitration source is changed to 3, the priority of the third arbitration source is changed to 2, the priority of the fourth arbitration source is changed to 1, and so on. Wherein, the arbitration source schedules the buffer for the access request with different sources.
The memory access request scheduling buffer is respectively arranged facing to multiple request sources, so that the memory access bandwidth can be improved, the influence on the memory access delay is reduced, and the comprehensive memory access performance of the system is improved.
In the second embodiment, on the basis of the first embodiment, the memory request scheduling buffer includes a bandwidth-sensitive memory scheduling buffer and a delay-sensitive memory scheduling buffer.
Wherein the bandwidth sensitive memory scheduling buffer comprises
And the memory entry is used for recording the information of the memory access request. The information of the access request comprises access request information, a left sub pointer of an entry and a right sub pointer of the entry
And the empty entry queue is used for mounting the storage entries in a queue form.
A binary tree is scheduled for organizing the memory entries in the form of a binary tree.
First, each memory entry of the bandwidth-sensitive memory access scheduling buffer includes three pieces of information: the memory access request information, a left sub pointer of the entry and a right sub pointer of the entry. These memory entries are organized into two structures: an empty entry queue and a scheduling binary tree. In the initial state, all empty storage entries are in the empty entry queue, and the binary tree is scheduled to be empty.
Secondly, when a new access request arrives, a storage item is taken out from the empty item queue, and the access request information of the storage item is filled. Meanwhile, searching a scheduling binary tree according to the access request information, and if a node which is the same as the access request information already exists in the scheduling binary tree, mounting a new access request to a left child pointer of the node; and if the node which is the same as the access request information does not exist in the scheduling binary tree, mounting a new access request to a right child pointer of the rightmost child node of the scheduling binary tree.
Thirdly, when the scheduling binary tree is not empty, the root node of the binary tree is selected to transmit. At this time:
(1) and if the left child pointer of the root node is not null, taking the left child pointer as a new root node of the binary tree, and mounting the right child pointer of the original root node onto the right child pointer of the new root node.
(2) If the left child pointer of the root node is null, then the right child pointer of the root node serves as the new root node of the binary tree.
And finally, mounting the new root node after transmission to a corresponding storage entry of the empty entry queue.
When the memory access request reaches the bandwidth sensitive memory access scheduling buffer, the memory access request is organized into a binary tree structure, only the root node of the binary tree needs to be selected during transmission, and large-scale memory access request scheduling can be realized when a large number of memory access requests are confronted.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention and do not limit the spirit and scope of the present invention. Various modifications and improvements of the technical solutions of the present invention may be made by those skilled in the art without departing from the design concept of the present invention, and the technical contents of the present invention are all described in the claims.

Claims (9)

1. A DDR4 performance balance scheduling structure facing multiple request sources is characterized in that: comprises that
The memory access request scheduling buffers are used for improving the memory access bandwidth corresponding to the memory access request source;
the multi-source continuous arbitration component is used for selecting a memory access request to transmit;
the DDR4 memory device is used for receiving the access requests transmitted by the multi-source continuous arbitration component.
2. The multi-request-source-oriented DDR4 performance balancing scheduling architecture of claim 1, wherein: the access request scheduling buffer comprises a bandwidth sensitive access scheduling buffer and a delay sensitive access scheduling buffer.
3. The multi-request-source-oriented DDR4 performance balancing scheduling architecture of claim 2, wherein: the bandwidth sensitive memory scheduling buffer comprises
The memory entry is used for recording the information of the memory access request;
the empty entry queue is used for mounting storage entries in a queue form;
a binary tree is scheduled for organizing the memory entries in the form of a binary tree.
4. The multi-request-source-oriented DDR4 performance balancing scheduling architecture of claim 3, wherein: the information of the memory access request comprises memory access request information, a left sub pointer of an entry and a right sub pointer of the entry.
5. A DDR4 performance balance scheduling method facing multiple request sources is characterized in that: comprises that
L1, setting a memory access request scheduling buffer for the memory access request of each memory access request source;
l2, the multi-source continuous arbitration component selects a memory access request to transmit through an arbitration strategy;
l3. the ddr4 memory device receives access requests transmitted by a multi-source sequential arbitration component.
6. The multi-request-source-oriented DDR4 performance balancing scheduling method of claim 5, wherein: the arbitration policy in L2 is specifically that,
1) the highest priority is rotated among the arbitration sources;
2) the arbitration source with the highest priority will release the highest priority to set the arbitration source priority to be the lowest and add one to the priorities of all other arbitration sources after the consecutive arbitration of N access requests has passed.
7. The multi-request-source-oriented DDR4 performance balancing scheduling method of claim 5, wherein: setting a bandwidth sensitive access scheduling buffer for the access request of a bandwidth sensitive access request source in the L1;
and setting a delay sensitive access scheduling buffer for the access request of the delay sensitive access request source.
8. The multi-request-source-oriented DDR4 performance balancing scheduling method of claim 7, wherein: the bandwidth sensitive memory scheduling buffer comprises
The memory entry is used for recording the information of the memory access request;
the empty entry queue is used for mounting storage entries in a queue form;
a binary tree is scheduled for organizing the memory entries in the form of a binary tree.
9. The multi-request-source-oriented DDR4 performance balancing scheduling method of claim 8, wherein: the information of the memory access request comprises memory access request information, a left sub pointer of an entry and a right sub pointer of the entry.
CN201910852485.XA 2019-09-10 2019-09-10 DDR4 performance balance scheduling structure and method for multiple request sources Pending CN110716797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910852485.XA CN110716797A (en) 2019-09-10 2019-09-10 DDR4 performance balance scheduling structure and method for multiple request sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910852485.XA CN110716797A (en) 2019-09-10 2019-09-10 DDR4 performance balance scheduling structure and method for multiple request sources

Publications (1)

Publication Number Publication Date
CN110716797A true CN110716797A (en) 2020-01-21

Family

ID=69209755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910852485.XA Pending CN110716797A (en) 2019-09-10 2019-09-10 DDR4 performance balance scheduling structure and method for multiple request sources

Country Status (1)

Country Link
CN (1) CN110716797A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312323A (en) * 2021-06-03 2021-08-27 中国人民解放军国防科技大学 IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501498A (en) * 2013-08-29 2014-01-08 中国科学院声学研究所 Baseband processing resource allocation method and device thereof
CN104734991A (en) * 2013-12-19 2015-06-24 中国科学院沈阳自动化研究所 End-to-end time delay guarantee transmission scheduling method oriented to industrial backhaul network
US20170324677A1 (en) * 2016-05-04 2017-11-09 Radware, Ltd. Optimized stream management
CN107391243A (en) * 2017-06-30 2017-11-24 广东神马搜索科技有限公司 Thread task processing equipment, device and method
CN108833299A (en) * 2017-12-27 2018-11-16 北京时代民芯科技有限公司 A kind of large scale network data processing method based on restructural exchange chip framework
CN109831393A (en) * 2019-03-10 2019-05-31 西安电子科技大学 More granularity QoS control methods of network-oriented virtualization

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501498A (en) * 2013-08-29 2014-01-08 中国科学院声学研究所 Baseband processing resource allocation method and device thereof
CN104734991A (en) * 2013-12-19 2015-06-24 中国科学院沈阳自动化研究所 End-to-end time delay guarantee transmission scheduling method oriented to industrial backhaul network
US20170324677A1 (en) * 2016-05-04 2017-11-09 Radware, Ltd. Optimized stream management
CN107391243A (en) * 2017-06-30 2017-11-24 广东神马搜索科技有限公司 Thread task processing equipment, device and method
CN108833299A (en) * 2017-12-27 2018-11-16 北京时代民芯科技有限公司 A kind of large scale network data processing method based on restructural exchange chip framework
CN109831393A (en) * 2019-03-10 2019-05-31 西安电子科技大学 More granularity QoS control methods of network-oriented virtualization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312323A (en) * 2021-06-03 2021-08-27 中国人民解放军国防科技大学 IO (input/output) request scheduling method and system for reducing access delay in parallel file system
CN113312323B (en) * 2021-06-03 2022-07-19 中国人民解放军国防科技大学 IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Similar Documents

Publication Publication Date Title
US11036556B1 (en) Concurrent program execution optimization
US10412021B2 (en) Optimizing placement of virtual machines
EP3729281B1 (en) Scheduling memory requests with non-uniform latencies
CN103999051B (en) Strategy for tinter resource allocation in the minds of shader core
US8655962B2 (en) Shared address collectives using counter mechanisms
US20190196996A1 (en) Dynamically determining memory access burst length
US9841913B2 (en) System and method for enabling high read rates to data element lists
CN110262754B (en) NVMe and RDMA-oriented distributed storage system and lightweight synchronous communication method
EP3732578B1 (en) Supporting responses for memory types with non-uniform latencies on same channel
JP7050957B2 (en) Task scheduling
CN103294548A (en) Distributed file system based IO (input output) request dispatching method and system
WO2012055319A1 (en) Method and device for dispatching tcam (telecommunication access method) query and refreshing messages
TWI704488B (en) Network device, memory system for the network device, and method for operating the network device
US10601723B2 (en) Bandwidth matched scheduler
US20180335957A1 (en) Lock-free datapath design for efficient parallel processing storage array implementation
US9104531B1 (en) Multi-core device with multi-bank memory
CN110716797A (en) DDR4 performance balance scheduling structure and method for multiple request sources
KR20140096587A (en) Apparatus and method for sharing functional logic between functional units, and reconfigurable processor
CN117251275A (en) Multi-application asynchronous I/O request scheduling method, system, equipment and medium
CN114564420B (en) Method for sharing parallel bus by multi-core processor
CN110688209B (en) Binary tree-based large-window access flow scheduling buffer structure and method
KR20240006559A (en) Scalable fast round-robin arbiter tree
CN117421098A (en) Method, apparatus, board card and computer readable storage medium for task scheduling
CN117908959A (en) Method for performing atomic operations and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200121

RJ01 Rejection of invention patent application after publication