CN104866457B - A kind of chip multi-core processor static framework based on shared buffer memory - Google Patents

A kind of chip multi-core processor static framework based on shared buffer memory Download PDF

Info

Publication number
CN104866457B
CN104866457B CN201510302580.4A CN201510302580A CN104866457B CN 104866457 B CN104866457 B CN 104866457B CN 201510302580 A CN201510302580 A CN 201510302580A CN 104866457 B CN104866457 B CN 104866457B
Authority
CN
China
Prior art keywords
nodes
cache
class
shared cache
l2bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510302580.4A
Other languages
Chinese (zh)
Other versions
CN104866457A (en
Inventor
李嵩
褚廷斌
黄乐天
袁正希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510302580.4A priority Critical patent/CN104866457B/en
Publication of CN104866457A publication Critical patent/CN104866457A/en
Application granted granted Critical
Publication of CN104866457B publication Critical patent/CN104866457B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明公开了一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;A类节点包括处理核心core、本地私有高速缓存L1和路由器R;B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;节点之间通过路由器进行通信;A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。本发明提供了一种基于共享缓存的片上多核处理器静态架构,在不增加整个片上网络的访问延时和不增加整个片上网络的拥塞程度的基础上,节省了最后一级高速缓存的硬件开销和面积,节省了高速缓存的静态功耗。

The invention discloses a static architecture of an on-chip multi-core processor based on a shared cache, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 A-type nodes and n/2 B-type nodes ; Class A nodes include processing core, local private cache L1, and router R; class B nodes include processing core, local private cache L1, router R, and shared cache l2bank; nodes communicate through routers; class A nodes It is intermittently distributed with class B nodes; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture. The present invention provides a static architecture of an on-chip multi-core processor based on a shared cache, which saves the hardware overhead of the last level of cache without increasing the access delay of the entire on-chip network and the degree of congestion of the entire on-chip network and area, saving cache static power.

Description

一种基于共享缓存的片上多核处理器静态架构A Static Architecture of On-Chip Multi-Core Processor Based on Shared Cache

技术领域technical field

本发明涉及一种基于共享缓存的片上多核处理器静态架构。The invention relates to a static architecture of an on-chip multi-core processor based on a shared cache.

背景技术Background technique

如图1所示,一种常用的片上网络多核处理器架构,我们以常用的16核,二级高速缓存为例,这种常用的结构由16个节点组成,每个节点均包括用于通信的路由器R,处理核心core和本地私有高速缓存L1和一个较大面积的共享缓存l2bank,由于该结构是基于共享缓存的机制来进行数据交互和通信的,而共享缓存l2bank在整个片上网络所占的面积是非常大的,故其带来的功耗影响也很大,尤其是静态功耗所占的比例。As shown in Figure 1, a commonly used network-on-chip multi-core processor architecture, we take the commonly used 16-core, L2 cache as an example, this commonly used structure consists of 16 nodes, each node includes a communication The router R, processing core and local private cache L1 and a larger shared cache l2bank, because the structure is based on the shared cache mechanism for data interaction and communication, and the shared cache l2bank accounts for the entire on-chip network The area is very large, so it has a great impact on power consumption, especially the proportion of static power consumption.

从图1中我们可以看到,每个处理器核心core都连接有一个共享缓存l2bank,由于在程序访问的过程中,读写数据可能要读写相对距离较远的处理器核心core所在的共享缓存l2bank里存储的数据,所以做静态设计时,平均跳数(核访问共享缓存l2bank的平均距离)以及在数据交互的过程中,整个网络的拥塞问题(保证公平性)都是需要考虑的因素,数据存储的要求越来越大,共享缓存l2bank 的面积也越来越大,因而其静态功耗占整个片上网络的比例也呈一个日趋增长的趋势,故采用这种通用架构由于共享缓存l2bank面积过大所带来的功耗问题亦成为一个不可忽视的问题。From Figure 1, we can see that each processor core is connected to a shared cache l2bank, because in the process of program access, reading and writing data may need to read and write the shared memory where the processor core core is relatively far away. Cache the data stored in l2bank, so when doing static design, the average number of hops (the average distance between cores accessing the shared cache l2bank) and the congestion problem of the entire network (to ensure fairness) during the data interaction process are all factors that need to be considered , the requirements for data storage are getting bigger and bigger, and the area of the shared cache l2bank is also getting bigger and bigger, so the ratio of its static power consumption to the entire on-chip network is also showing a growing trend. The power consumption problem caused by too large area has also become a problem that cannot be ignored.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足,提供一种基于共享缓存的片上多核处理器静态架构,在不增加整个片上网络的访问延时、不增加整个片上网络的拥塞程度的基础上,节省了最后一级高速缓存的硬件开销和面积,节省了高速缓存的静态功耗。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a static architecture of multi-core processors on a chip based on a shared cache, and save The hardware overhead and area of the last level cache are reduced, and the static power consumption of the cache is saved.

本发明的目的是通过以下技术方案来实现的:一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R;所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;所述的节点之间通过路由器进行通信;所述的A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。The object of the present invention is achieved through the following technical solutions: a static architecture based on a shared cache on-chip multi-core processor, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 class A node and n/2 class B nodes; the class A node includes processing core core, local private cache L1 and router R; the class B node includes processing core core, local private cache L1, router R and Shared cache l2bank; said nodes communicate through routers; said A-type nodes and B-type nodes are intermittently distributed; said shared cache l2bank has twice the capacity of the shared cache in the general architecture.

所述的节点个数为16个,包括8个所述的A类节点和8个所述的B类节点。The number of said nodes is 16, including 8 said A-type nodes and 8 said B-type nodes.

所述的本地私有高速缓存L1包括指令缓存和数据缓存。The local private cache L1 includes instruction cache and data cache.

本发明的有益效果是:(1)能够保证访问共享缓存l2bank的平均距离和通用架构的平均距离一样,从而不会额外增加整个片上网络的访问延时。The beneficial effects of the present invention are: (1) It can ensure that the average distance of accessing the shared cache l2bank is the same as the average distance of the general architecture, so that the access delay of the entire network on chip will not be additionally increased.

(2)本发明在硬件设计过程中,每个共享缓存l2bank结构和传统的相同,只不过存储空间增加了一倍,而共享缓存l2bank的数目减少为原来的一半,即整个最后一级高速缓存的共享缓存容量和通用架构的相同,用来满足数据存储的需要;但减少了一半的l2bank,也就减小了一半的外围硬件开销,例如放大器和译码器的数目我们减少了一半,因此其硬件设计面积亦会有所减少,高速缓存的静态功耗也会降低。(2) In the hardware design process of the present invention, each shared cache l2bank structure is the same as the traditional one, except that the storage space is doubled, and the number of shared cache l2banks is reduced to half of the original, that is, the entire last level cache The shared cache capacity is the same as that of the general-purpose architecture, which is used to meet the needs of data storage; but reducing the l2bank by half also reduces the peripheral hardware overhead by half, such as the number of amplifiers and decoders. We have reduced by half, so The hardware design area will also be reduced, and the static power consumption of the cache will be reduced.

(3)包含共享缓存l2bank的节点和不包含共享缓存l2bank的节点成间歇式分布,在数据交互的过程中,能够保证公平性,不会带来网络拥堵的问题。(3) The nodes that include the shared cache l2bank and the nodes that do not include the shared cache l2bank are intermittently distributed. In the process of data interaction, fairness can be guaranteed without causing network congestion.

附图说明Description of drawings

图1为常用的片上网络多核处理器架构示意图;FIG. 1 is a schematic diagram of a commonly used network-on-chip multi-core processor architecture;

图2为本发明的结构示意图。Fig. 2 is a structural schematic diagram of the present invention.

具体实施方式Detailed ways

下面结合附图进一步详细描述本发明的技术方案,但本发明的保护范围不局限于以下所述。The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited to the following description.

如图2所示,一种基于共享缓存的片上多核处理器静态架构,它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R;所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;所述的节点之间通过路由器进行通信;所述的A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。As shown in Fig. 2, a kind of on-chip multi-core processor static architecture based on shared cache, it includes n nodes, n is an even number greater than 0, and described nodes include n/2 A-type nodes and n/2 B Class node; said class A node includes processing core core, local private cache L1 and router R; said class B node includes processing core core, local private cache L1, router R and shared cache l2bank; said Nodes communicate through routers; the A-type nodes and B-type nodes are intermittently distributed; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture.

所述的节点个数为16个,包括8个所述的A类节点和8个所述的B类节点。The number of said nodes is 16, including 8 said A-type nodes and 8 said B-type nodes.

所述的本地私有高速缓存L1包括指令缓存和数据缓存。The local private cache L1 includes instruction cache and data cache.

从图2中可以看出,不包含共享缓存l2bank的A类节点和包含共享缓存l2bankB类节点的分布情况,在图中所示的16个节点分布为四行四列,不管从横向或者是纵向,每两个A类节点间都包含一个B类节点,每两个B类节点间都包含一个A类节点;这就是在上文中提到的间歇式分布。It can be seen from Figure 2 that the distribution of A-type nodes that do not include shared cache l2bank and B-type nodes that include shared cache l2bank, the 16 nodes shown in the figure are distributed in four rows and four columns, regardless of horizontal or vertical , every two class A nodes contain a class B node, and every two class B nodes contain a class A node; this is the intermittent distribution mentioned above.

Claims (1)

1.一种基于共享缓存的片上多核处理器,其特征在于,其静态架构为:它包括n个节点,n为大于0的偶数,所述的节点包括n/2个A类节点和n/2个B类节点;所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R;所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank;所述的节点之间通过路由器进行通信;所述的A类节点和B类节点成间歇式分布;所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍;所述的节点个数为16个,包括8个所述的A类节点和8个所述的B类节点;所述的本地私有高速缓存L1包括指令缓存和数据缓存。1. a kind of on-chip multi-core processor based on shared cache, it is characterized in that, its static framework is: it comprises n nodes, and n is the even number greater than 0, and described node comprises n/2 class A nodes and n/ 2 class B nodes; the class A node includes processing core core, local private cache L1 and router R; the class B node includes processing core core, local private cache L1, router R and shared cache l2bank; The nodes communicate through routers; the class A nodes and class B nodes are intermittently distributed; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture; The number is 16, including 8 said A-type nodes and 8 said B-type nodes; said local private cache L1 includes instruction cache and data cache.
CN201510302580.4A 2015-06-04 2015-06-04 A kind of chip multi-core processor static framework based on shared buffer memory Expired - Fee Related CN104866457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510302580.4A CN104866457B (en) 2015-06-04 2015-06-04 A kind of chip multi-core processor static framework based on shared buffer memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510302580.4A CN104866457B (en) 2015-06-04 2015-06-04 A kind of chip multi-core processor static framework based on shared buffer memory

Publications (2)

Publication Number Publication Date
CN104866457A CN104866457A (en) 2015-08-26
CN104866457B true CN104866457B (en) 2018-06-15

Family

ID=53912297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510302580.4A Expired - Fee Related CN104866457B (en) 2015-06-04 2015-06-04 A kind of chip multi-core processor static framework based on shared buffer memory

Country Status (1)

Country Link
CN (1) CN104866457B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870871B (en) * 2016-09-23 2021-08-20 华为技术有限公司 Method and apparatus for allocating cache

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510191B (en) * 2009-03-26 2010-10-06 浙江大学 Implementing method of multi-core system structure with buffer window
CN101706755A (en) * 2009-11-24 2010-05-12 中国科学技术大学苏州研究院 Caching collaboration system of on-chip multi-core processor and cooperative processing method thereof
CN102103568B (en) * 2011-01-30 2012-10-10 中国科学院计算技术研究所 Method for realizing cache coherence protocol of chip multiprocessor (CMP) system
CN102270180B (en) * 2011-08-09 2014-04-02 清华大学 Multicore processor cache and management method thereof
US9201837B2 (en) * 2013-03-13 2015-12-01 Futurewei Technologies, Inc. Disaggregated server architecture for data centers

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fast Hierarchical Cache Directory:A Scalable Cache Organization for Large-scale CMP;Chongmin Li等;《2010 Fifth IEEE International Conference on Networking,Architecture,and Storage》;20101231;第367-376页 *
共享高速缓存多核处理器的关键技术研究;杜建军;《中国博士学位论文全文数据库(信息科技辑)》;20111215;摘要、正文第25-28页 *

Also Published As

Publication number Publication date
CN104866457A (en) 2015-08-26

Similar Documents

Publication Publication Date Title
TWI554883B (en) System and method for segmenting data structures in a memory system
US9477409B2 (en) Accelerating boot time zeroing of memory based on non-volatile memory (NVM) technology
CN105280215B (en) Dynamic random access memory DRAM method for refreshing, equipment and system
US20140359225A1 (en) Multi-core processor and multi-core processor system
CN103034617A (en) Caching structure for realizing storage of configuration information of reconfigurable system and management method
CN104102586B (en) A kind of method, apparatus of address of cache processing
CN103019974A (en) Memory access processing method and controller
US20160004654A1 (en) System for migrating stash transactions
CN106951390B (en) NUMA system construction method capable of reducing cross-node memory access delay
CN107577614B (en) Data writing method and memory system
CN104866457B (en) A kind of chip multi-core processor static framework based on shared buffer memory
CN100466601C (en) A data reading and writing device and reading and writing method thereof
US20160378151A1 (en) Rack scale architecture (rsa) and shared memory controller (smc) techniques of fast zeroing
Li et al. A compact low-power eDRAM-based NoC buffer
CN107168810A (en) A kind of calculate node internal memory sharing system and reading and writing operation internal memory sharing method
CN114667509B (en) Memory, network equipment and data access method
CN106293491B (en) The processing method and Memory Controller Hub of write request
CN105718991B (en) Cellular array computing system
CN105718990B (en) Communication means between cellular array computing system and wherein cell
CN103914413A (en) External-storage access interface for coarseness reconfigurable system and access method of external-storage access interface
JP4879172B2 (en) Semiconductor memory device and semiconductor integrated circuit incorporating the same
CN203397353U (en) Ultra-wide bus based chip framework
CN102739818B (en) Address scheduling method, device and system
Foglia et al. Analysis of performance dependencies in NUCA-based CMP systems
CN105843692A (en) Heterogeneous computing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180615

CF01 Termination of patent right due to non-payment of annual fee