CN104866457B - A kind of chip multi-core processor static framework based on shared buffer memory - Google Patents
A kind of chip multi-core processor static framework based on shared buffer memory Download PDFInfo
- Publication number
- CN104866457B CN104866457B CN201510302580.4A CN201510302580A CN104866457B CN 104866457 B CN104866457 B CN 104866457B CN 201510302580 A CN201510302580 A CN 201510302580A CN 104866457 B CN104866457 B CN 104866457B
- Authority
- CN
- China
- Prior art keywords
- nodes
- cache
- class
- shared cache
- l2bank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000003068 static effect Effects 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本发明公开了一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;A类节点包括处理核心core、本地私有高速缓存L1和路由器R;B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;节点之间通过路由器进行通信;A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。本发明提供了一种基于共享缓存的片上多核处理器静态架构,在不增加整个片上网络的访问延时和不增加整个片上网络的拥塞程度的基础上,节省了最后一级高速缓存的硬件开销和面积,节省了高速缓存的静态功耗。
The invention discloses a static architecture of an on-chip multi-core processor based on a shared cache, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 A-type nodes and n/2 B-type nodes ; Class A nodes include processing core, local private cache L1, and router R; class B nodes include processing core, local private cache L1, router R, and shared cache l2bank; nodes communicate through routers; class A nodes It is intermittently distributed with class B nodes; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture. The present invention provides a static architecture of an on-chip multi-core processor based on a shared cache, which saves the hardware overhead of the last level of cache without increasing the access delay of the entire on-chip network and the degree of congestion of the entire on-chip network and area, saving cache static power.
Description
技术领域technical field
本发明涉及一种基于共享缓存的片上多核处理器静态架构。The invention relates to a static architecture of an on-chip multi-core processor based on a shared cache.
背景技术Background technique
如图1所示,一种常用的片上网络多核处理器架构,我们以常用的16核,二级高速缓存为例,这种常用的结构由16个节点组成,每个节点均包括用于通信的路由器R,处理核心core和本地私有高速缓存L1和一个较大面积的共享缓存l2bank,由于该结构是基于共享缓存的机制来进行数据交互和通信的,而共享缓存l2bank在整个片上网络所占的面积是非常大的,故其带来的功耗影响也很大,尤其是静态功耗所占的比例。As shown in Figure 1, a commonly used network-on-chip multi-core processor architecture, we take the commonly used 16-core, L2 cache as an example, this commonly used structure consists of 16 nodes, each node includes a communication The router R, processing core and local private cache L1 and a larger shared cache l2bank, because the structure is based on the shared cache mechanism for data interaction and communication, and the shared cache l2bank accounts for the entire on-chip network The area is very large, so it has a great impact on power consumption, especially the proportion of static power consumption.
从图1中我们可以看到,每个处理器核心core都连接有一个共享缓存l2bank,由于在程序访问的过程中,读写数据可能要读写相对距离较远的处理器核心core所在的共享缓存l2bank里存储的数据,所以做静态设计时,平均跳数(核访问共享缓存l2bank的平均距离)以及在数据交互的过程中,整个网络的拥塞问题(保证公平性)都是需要考虑的因素,数据存储的要求越来越大,共享缓存l2bank 的面积也越来越大,因而其静态功耗占整个片上网络的比例也呈一个日趋增长的趋势,故采用这种通用架构由于共享缓存l2bank面积过大所带来的功耗问题亦成为一个不可忽视的问题。From Figure 1, we can see that each processor core is connected to a shared cache l2bank, because in the process of program access, reading and writing data may need to read and write the shared memory where the processor core core is relatively far away. Cache the data stored in l2bank, so when doing static design, the average number of hops (the average distance between cores accessing the shared cache l2bank) and the congestion problem of the entire network (to ensure fairness) during the data interaction process are all factors that need to be considered , the requirements for data storage are getting bigger and bigger, and the area of the shared cache l2bank is also getting bigger and bigger, so the ratio of its static power consumption to the entire on-chip network is also showing a growing trend. The power consumption problem caused by too large area has also become a problem that cannot be ignored.
发明内容Contents of the invention
本发明的目的在于克服现有技术的不足,提供一种基于共享缓存的片上多核处理器静态架构,在不增加整个片上网络的访问延时、不增加整个片上网络的拥塞程度的基础上,节省了最后一级高速缓存的硬件开销和面积,节省了高速缓存的静态功耗。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a static architecture of multi-core processors on a chip based on a shared cache, and save The hardware overhead and area of the last level cache are reduced, and the static power consumption of the cache is saved.
本发明的目的是通过以下技术方案来实现的:一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R;所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;所述的节点之间通过路由器进行通信;所述的A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。The object of the present invention is achieved through the following technical solutions: a static architecture based on a shared cache on-chip multi-core processor, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 class A node and n/2 class B nodes; the class A node includes processing core core, local private cache L1 and router R; the class B node includes processing core core, local private cache L1, router R and Shared cache l2bank; said nodes communicate through routers; said A-type nodes and B-type nodes are intermittently distributed; said shared cache l2bank has twice the capacity of the shared cache in the general architecture.
所述的节点个数为16个,包括8个所述的A类节点和8个所述的B类节点。The number of said nodes is 16, including 8 said A-type nodes and 8 said B-type nodes.
所述的本地私有高速缓存L1包括指令缓存和数据缓存。The local private cache L1 includes instruction cache and data cache.
本发明的有益效果是:(1)能够保证访问共享缓存l2bank的平均距离和通用架构的平均距离一样,从而不会额外增加整个片上网络的访问延时。The beneficial effects of the present invention are: (1) It can ensure that the average distance of accessing the shared cache l2bank is the same as the average distance of the general architecture, so that the access delay of the entire network on chip will not be additionally increased.
(2)本发明在硬件设计过程中,每个共享缓存l2bank结构和传统的相同,只不过存储空间增加了一倍,而共享缓存l2bank的数目减少为原来的一半,即整个最后一级高速缓存的共享缓存容量和通用架构的相同,用来满足数据存储的需要;但减少了一半的l2bank,也就减小了一半的外围硬件开销,例如放大器和译码器的数目我们减少了一半,因此其硬件设计面积亦会有所减少,高速缓存的静态功耗也会降低。(2) In the hardware design process of the present invention, each shared cache l2bank structure is the same as the traditional one, except that the storage space is doubled, and the number of shared cache l2banks is reduced to half of the original, that is, the entire last level cache The shared cache capacity is the same as that of the general-purpose architecture, which is used to meet the needs of data storage; but reducing the l2bank by half also reduces the peripheral hardware overhead by half, such as the number of amplifiers and decoders. We have reduced by half, so The hardware design area will also be reduced, and the static power consumption of the cache will be reduced.
(3)包含共享缓存l2bank的节点和不包含共享缓存l2bank的节点成间歇式分布,在数据交互的过程中,能够保证公平性,不会带来网络拥堵的问题。(3) The nodes that include the shared cache l2bank and the nodes that do not include the shared cache l2bank are intermittently distributed. In the process of data interaction, fairness can be guaranteed without causing network congestion.
附图说明Description of drawings
图1为常用的片上网络多核处理器架构示意图;FIG. 1 is a schematic diagram of a commonly used network-on-chip multi-core processor architecture;
图2为本发明的结构示意图。Fig. 2 is a structural schematic diagram of the present invention.
具体实施方式Detailed ways
下面结合附图进一步详细描述本发明的技术方案,但本发明的保护范围不局限于以下所述。The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited to the following description.
如图2所示,一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R;所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;所述的节点之间通过路由器进行通信;所述的A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。As shown in Fig. 2, a kind of on-chip multi-core processor static architecture based on shared cache, it includes n nodes, n is an even number greater than 0, and described nodes include n/2 A-type nodes and n/2 B Class node; said class A node includes processing core core, local private cache L1 and router R; said class B node includes processing core core, local private cache L1, router R and shared cache l2bank; said Nodes communicate through routers; the A-type nodes and B-type nodes are intermittently distributed; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture.
所述的节点个数为16个,包括8个所述的A类节点和8个所述的B类节点。The number of said nodes is 16, including 8 said A-type nodes and 8 said B-type nodes.
所述的本地私有高速缓存L1包括指令缓存和数据缓存。The local private cache L1 includes instruction cache and data cache.
从图2中可以看出,不包含共享缓存l2bank的A类节点和包含共享缓存l2bankB类节点的分布情况,在图中所示的16个节点分布为四行四列,不管从横向或者是纵向,每两个A类节点间都包含一个B类节点,每两个B类节点间都包含一个A类节点;这就是在上文中提到的间歇式分布。It can be seen from Figure 2 that the distribution of A-type nodes that do not include shared cache l2bank and B-type nodes that include shared cache l2bank, the 16 nodes shown in the figure are distributed in four rows and four columns, regardless of horizontal or vertical , every two class A nodes contain a class B node, and every two class B nodes contain a class A node; this is the intermittent distribution mentioned above.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510302580.4A CN104866457B (en) | 2015-06-04 | 2015-06-04 | A kind of chip multi-core processor static framework based on shared buffer memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510302580.4A CN104866457B (en) | 2015-06-04 | 2015-06-04 | A kind of chip multi-core processor static framework based on shared buffer memory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104866457A CN104866457A (en) | 2015-08-26 |
CN104866457B true CN104866457B (en) | 2018-06-15 |
Family
ID=53912297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510302580.4A Expired - Fee Related CN104866457B (en) | 2015-06-04 | 2015-06-04 | A kind of chip multi-core processor static framework based on shared buffer memory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104866457B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107870871B (en) * | 2016-09-23 | 2021-08-20 | 华为技术有限公司 | Method and apparatus for allocating cache |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510191B (en) * | 2009-03-26 | 2010-10-06 | 浙江大学 | Implementing method of multi-core system structure with buffer window |
CN101706755A (en) * | 2009-11-24 | 2010-05-12 | 中国科学技术大学苏州研究院 | Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof |
CN102103568B (en) * | 2011-01-30 | 2012-10-10 | 中国科学院计算技术研究所 | Method for realizing cache coherence protocol of chip multiprocessor (CMP) system |
CN102270180B (en) * | 2011-08-09 | 2014-04-02 | 清华大学 | Multicore processor cache and management method thereof |
US9201837B2 (en) * | 2013-03-13 | 2015-12-01 | Futurewei Technologies, Inc. | Disaggregated server architecture for data centers |
-
2015
- 2015-06-04 CN CN201510302580.4A patent/CN104866457B/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
Fast Hierarchical Cache Directory:A Scalable Cache Organization for Large-scale CMP;Chongmin Li等;《2010 Fifth IEEE International Conference on Networking,Architecture,and Storage》;20101231;第367-376页 * |
共享高速缓存多核处理器的关键技术研究;杜建军;《中国博士学位论文全文数据库(信息科技辑)》;20111215;摘要、正文第25-28页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104866457A (en) | 2015-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI554883B (en) | System and method for segmenting data structures in a memory system | |
US9477409B2 (en) | Accelerating boot time zeroing of memory based on non-volatile memory (NVM) technology | |
CN105280215B (en) | Dynamic random access memory DRAM method for refreshing, equipment and system | |
US20140359225A1 (en) | Multi-core processor and multi-core processor system | |
CN103034617A (en) | Caching structure for realizing storage of configuration information of reconfigurable system and management method | |
CN104102586B (en) | A kind of method, apparatus of address of cache processing | |
CN103019974A (en) | Memory access processing method and controller | |
US20160004654A1 (en) | System for migrating stash transactions | |
CN106951390B (en) | NUMA system construction method capable of reducing cross-node memory access delay | |
CN107577614B (en) | Data writing method and memory system | |
CN104866457B (en) | A kind of chip multi-core processor static framework based on shared buffer memory | |
CN100466601C (en) | A data reading and writing device and reading and writing method thereof | |
US20160378151A1 (en) | Rack scale architecture (rsa) and shared memory controller (smc) techniques of fast zeroing | |
Li et al. | A compact low-power eDRAM-based NoC buffer | |
CN107168810A (en) | A kind of calculate node internal memory sharing system and reading and writing operation internal memory sharing method | |
CN114667509B (en) | Memory, network equipment and data access method | |
CN106293491B (en) | The processing method and Memory Controller Hub of write request | |
CN105718991B (en) | Cellular array computing system | |
CN105718990B (en) | Communication means between cellular array computing system and wherein cell | |
CN103914413A (en) | External-storage access interface for coarseness reconfigurable system and access method of external-storage access interface | |
JP4879172B2 (en) | Semiconductor memory device and semiconductor integrated circuit incorporating the same | |
CN203397353U (en) | Ultra-wide bus based chip framework | |
CN102739818B (en) | Address scheduling method, device and system | |
Foglia et al. | Analysis of performance dependencies in NUCA-based CMP systems | |
CN105843692A (en) | Heterogeneous computing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180615 |
|
CF01 | Termination of patent right due to non-payment of annual fee |