CN104866457B

CN104866457B - A kind of chip multi-core processor static framework based on shared buffer memory

Info

Publication number: CN104866457B
Application number: CN201510302580.4A
Authority: CN
Inventors: 李嵩; 褚廷斌; 黄乐天; 袁正希
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-06-04
Filing date: 2015-06-04
Publication date: 2018-06-15
Anticipated expiration: 2035-06-04
Also published as: CN104866457A

Abstract

The invention discloses a static architecture of an on-chip multi-core processor based on a shared cache, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 A-type nodes and n/2 B-type nodes ; Class A nodes include processing core, local private cache L1, and router R; class B nodes include processing core, local private cache L1, router R, and shared cache l2bank; nodes communicate through routers; class A nodes It is intermittently distributed with class B nodes; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture. The present invention provides a static architecture of an on-chip multi-core processor based on a shared cache, which saves the hardware overhead of the last level of cache without increasing the access delay of the entire on-chip network and the degree of congestion of the entire on-chip network and area, saving cache static power.

Description

A Static Architecture of On-Chip Multi-Core Processor Based on Shared Cache

技术领域technical field

本发明涉及一种基于共享缓存的片上多核处理器静态架构。The invention relates to a static architecture of an on-chip multi-core processor based on a shared cache.

背景技术Background technique

如图1所示，一种常用的片上网络多核处理器架构，我们以常用的16核，二级高速缓存为例，这种常用的结构由16个节点组成，每个节点均包括用于通信的路由器R，处理核心core和本地私有高速缓存L1和一个较大面积的共享缓存l2bank，由于该结构是基于共享缓存的机制来进行数据交互和通信的，而共享缓存l2bank在整个片上网络所占的面积是非常大的，故其带来的功耗影响也很大,尤其是静态功耗所占的比例。As shown in Figure 1, a commonly used network-on-chip multi-core processor architecture, we take the commonly used 16-core, L2 cache as an example, this commonly used structure consists of 16 nodes, each node includes a communication The router R, processing core and local private cache L1 and a larger shared cache l2bank, because the structure is based on the shared cache mechanism for data interaction and communication, and the shared cache l2bank accounts for the entire on-chip network The area is very large, so it has a great impact on power consumption, especially the proportion of static power consumption.

从图1中我们可以看到，每个处理器核心core都连接有一个共享缓存l2bank,由于在程序访问的过程中，读写数据可能要读写相对距离较远的处理器核心core所在的共享缓存l2bank里存储的数据,所以做静态设计时，平均跳数（核访问共享缓存l2bank的平均距离）以及在数据交互的过程中，整个网络的拥塞问题（保证公平性）都是需要考虑的因素，数据存储的要求越来越大，共享缓存l2bank 的面积也越来越大，因而其静态功耗占整个片上网络的比例也呈一个日趋增长的趋势，故采用这种通用架构由于共享缓存l2bank面积过大所带来的功耗问题亦成为一个不可忽视的问题。From Figure 1, we can see that each processor core is connected to a shared cache l2bank, because in the process of program access, reading and writing data may need to read and write the shared memory where the processor core core is relatively far away. Cache the data stored in l2bank, so when doing static design, the average number of hops (the average distance between cores accessing the shared cache l2bank) and the congestion problem of the entire network (to ensure fairness) during the data interaction process are all factors that need to be considered , the requirements for data storage are getting bigger and bigger, and the area of the shared cache l2bank is also getting bigger and bigger, so the ratio of its static power consumption to the entire on-chip network is also showing a growing trend. The power consumption problem caused by too large area has also become a problem that cannot be ignored.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于共享缓存的片上多核处理器静态架构，在不增加整个片上网络的访问延时、不增加整个片上网络的拥塞程度的基础上，节省了最后一级高速缓存的硬件开销和面积，节省了高速缓存的静态功耗。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a static architecture of multi-core processors on a chip based on a shared cache, and save The hardware overhead and area of the last level cache are reduced, and the static power consumption of the cache is saved.

本发明的目的是通过以下技术方案来实现的：一种基于共享缓存的片上多核处理器静态架构，它包括n个节点，n为大于0的偶数，所述的节点包括n/2个A类节点和n/2个B类节点；所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R；所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank；所述的节点之间通过路由器进行通信；所述的A类节点和B类节点成间歇式分布；所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。The object of the present invention is achieved through the following technical solutions: a static architecture based on a shared cache on-chip multi-core processor, which includes n nodes, where n is an even number greater than 0, and the nodes include n/2 class A node and n/2 class B nodes; the class A node includes processing core core, local private cache L1 and router R; the class B node includes processing core core, local private cache L1, router R and Shared cache l2bank; said nodes communicate through routers; said A-type nodes and B-type nodes are intermittently distributed; said shared cache l2bank has twice the capacity of the shared cache in the general architecture.

所述的节点个数为16个，包括8个所述的A类节点和8个所述的B类节点。The number of said nodes is 16, including 8 said A-type nodes and 8 said B-type nodes.

所述的本地私有高速缓存L1包括指令缓存和数据缓存。The local private cache L1 includes instruction cache and data cache.

本发明的有益效果是：（1）能够保证访问共享缓存l2bank的平均距离和通用架构的平均距离一样，从而不会额外增加整个片上网络的访问延时。The beneficial effects of the present invention are: (1) It can ensure that the average distance of accessing the shared cache l2bank is the same as the average distance of the general architecture, so that the access delay of the entire network on chip will not be additionally increased.

（2）本发明在硬件设计过程中，每个共享缓存l2bank结构和传统的相同，只不过存储空间增加了一倍，而共享缓存l2bank的数目减少为原来的一半，即整个最后一级高速缓存的共享缓存容量和通用架构的相同，用来满足数据存储的需要；但减少了一半的l2bank，也就减小了一半的外围硬件开销，例如放大器和译码器的数目我们减少了一半，因此其硬件设计面积亦会有所减少，高速缓存的静态功耗也会降低。(2) In the hardware design process of the present invention, each shared cache l2bank structure is the same as the traditional one, except that the storage space is doubled, and the number of shared cache l2banks is reduced to half of the original, that is, the entire last level cache The shared cache capacity is the same as that of the general-purpose architecture, which is used to meet the needs of data storage; but reducing the l2bank by half also reduces the peripheral hardware overhead by half, such as the number of amplifiers and decoders. We have reduced by half, so The hardware design area will also be reduced, and the static power consumption of the cache will be reduced.

（3）包含共享缓存l2bank的节点和不包含共享缓存l2bank的节点成间歇式分布，在数据交互的过程中，能够保证公平性，不会带来网络拥堵的问题。(3) The nodes that include the shared cache l2bank and the nodes that do not include the shared cache l2bank are intermittently distributed. In the process of data interaction, fairness can be guaranteed without causing network congestion.

附图说明Description of drawings

图1为常用的片上网络多核处理器架构示意图；FIG. 1 is a schematic diagram of a commonly used network-on-chip multi-core processor architecture;

图2为本发明的结构示意图。Fig. 2 is a structural schematic diagram of the present invention.

具体实施方式Detailed ways

下面结合附图进一步详细描述本发明的技术方案，但本发明的保护范围不局限于以下所述。The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited to the following description.

如图2所示，一种基于共享缓存的片上多核处理器静态架构，它包括n个节点，n为大于0的偶数，所述的节点包括n/2个A类节点和n/2个B类节点；所述的A类节点包括处理核心core、本地私有高速缓存L1和路由器R；所述的B类节点包括处理核心core、本地私有高速缓存L1、路由器R和共享缓存l2bank；所述的节点之间通过路由器进行通信；所述的A类节点和B类节点成间歇式分布；所述的共享缓存l2bank的容量为通用架构中的共享缓存的两倍。As shown in Fig. 2, a kind of on-chip multi-core processor static architecture based on shared cache, it includes n nodes, n is an even number greater than 0, and described nodes include n/2 A-type nodes and n/2 B Class node; said class A node includes processing core core, local private cache L1 and router R; said class B node includes processing core core, local private cache L1, router R and shared cache l2bank; said Nodes communicate through routers; the A-type nodes and B-type nodes are intermittently distributed; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture.

从图2中可以看出，不包含共享缓存l2bank的A类节点和包含共享缓存l2bankB类节点的分布情况，在图中所示的16个节点分布为四行四列，不管从横向或者是纵向，每两个A类节点间都包含一个B类节点，每两个B类节点间都包含一个A类节点；这就是在上文中提到的间歇式分布。It can be seen from Figure 2 that the distribution of A-type nodes that do not include shared cache l2bank and B-type nodes that include shared cache l2bank, the 16 nodes shown in the figure are distributed in four rows and four columns, regardless of horizontal or vertical , every two class A nodes contain a class B node, and every two class B nodes contain a class A node; this is the intermittent distribution mentioned above.

Claims

1. a kind of on-chip multi-core processor based on shared cache, it is characterized in that, its static framework is: it comprises n nodes, and n is the even number greater than 0, and described node comprises n/2 class A nodes and n/ 2 class B nodes; the class A node includes processing core core, local private cache L1 and router R; the class B node includes processing core core, local private cache L1, router R and shared cache l2bank; The nodes communicate through routers; the class A nodes and class B nodes are intermittently distributed; the capacity of the shared cache l2bank is twice that of the shared cache in the general architecture; The number is 16, including 8 said A-type nodes and 8 said B-type nodes; said local private cache L1 includes instruction cache and data cache.