CN203706196U - Coarse-granularity reconfigurable and layered array register file structure - Google Patents

Coarse-granularity reconfigurable and layered array register file structure Download PDF

Info

Publication number
CN203706196U
CN203706196U CN201420060189.9U CN201420060189U CN203706196U CN 203706196 U CN203706196 U CN 203706196U CN 201420060189 U CN201420060189 U CN 201420060189U CN 203706196 U CN203706196 U CN 203706196U
Authority
CN
China
Prior art keywords
register
register file
processing unit
array
reconfigurable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CN201420060189.9U
Other languages
Chinese (zh)
Inventor
曹鹏
葛伟
徐凯
刘波
杨锦江
马俊
杨军
王超
卜爱国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201420060189.9U priority Critical patent/CN203706196U/en
Application granted granted Critical
Publication of CN203706196U publication Critical patent/CN203706196U/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

本实用新型公开了一种粗粒度可重构层次化的阵列寄存器文件结构,包括全局寄存器文件、本地寄存器文件和分布式寄存器文件。全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,不仅满足系统对可重构架构调用时的参数传递问题,而且作为阵列上每个单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数;本地寄存器文件:作为重构处理单元的私有寄存器,数据仅供自己使用;分布式寄存器文件:作为可重构阵列内部分重构计算单元数据寄存和传输通道。本实用新型通过层次化的可重构阵列寄存器文件结构设计,解决可重构计算过程中阵列数据的寄存和传输问题,提高阵列中数据变量存储效率和可重构计算性能。

The utility model discloses a coarse-grained reconfigurable hierarchical array register file structure, which includes a global register file, a local register file and a distributed register file. Global register file: As a shared register connecting the system control core and the reconfigurable array, it not only satisfies the problem of parameter transfer when the system calls the reconfigurable architecture, but also serves as a register that can be connected to each unit on the array, and has reconfigurable The largest fan-out factor in the array; local register file: as the private register of the reconfigurable processing unit, the data is only for its own use; distributed register file: as the data storage and transmission channel of the partially reconfigurable computing unit in the reconfigurable array. The utility model solves the storage and transmission problem of array data in the reconfigurable computing process through the hierarchical reconfigurable array register file structure design, and improves the storage efficiency of data variables in the array and the reconfigurable computing performance.

Description

一种粗粒度可重构层次化的阵列寄存器文件结构A Coarse-grained Reconfigurable Hierarchical Array Register File Structure

技术领域technical field

本实用新型涉及一种粗粒度可重构层次化的阵列寄存器文件结构,属于嵌入式可重构设计技术。The utility model relates to a coarse-grained reconfigurable hierarchical array register file structure, which belongs to the embedded reconfigurable design technology.

背景技术Background technique

随着现场可编程阵列可重构技术的出现,大大改变了传统的嵌入式设计的方法,可重构计算作为一种新型时空域的计算模式,在嵌入式和高性能的计算领域具有广泛的应用前景,已经成为当前嵌入式系统发展的趋势。图像处理和现代通信等媒体应用领域算法具有大规模并行性,需要进行大量的矩阵运算。寄存器文件设计允许灵活的重排序操作和移位操作。通过多路选择器实现读/写数据选择到不同的寄存器中。相比延时链和移位寄存器,寄存器文件在写操作时有更多的功耗消耗。消耗包括译码和选择的相关逻辑。因此,只有在寄存器文件作为长时间的数据存储的时候可以抵消功耗上面的损失。寄存器对于短生命周期的数据不是一个最佳选择。With the emergence of field programmable array reconfigurable technology, the traditional embedded design method has been greatly changed. Reconfigurable computing, as a new type of space-time domain computing mode, has a wide range of embedded and high-performance computing fields. The application prospect has become the development trend of the current embedded system. Algorithms in media application fields such as image processing and modern communication have large-scale parallelism and require a large number of matrix operations. The register file design allows flexible reordering and shifting operations. Read/write data selection into different registers is achieved through a multiplexer. Compared with delay chains and shift registers, register files consume more power during write operations. Consumption includes the associated logic for decoding and selection. Therefore, the loss of power consumption can be offset only when the register file is used as long-term data storage. Registers are not an optimal choice for short-lived data.

可以将现有的可重构寄存器文件架构按照对阵列计算性能的影响划分为两类:一类是阵列外的片存取寄存器,一类是阵列内的分布式寄存器。可重构阵列的数据存取优化,一方面可以通过阵列外的片上存取寄存器减少访存延时实现,另一方面还可以通过优化片上存取寄存器的访存模式来实现。而通过优化阵列内的分布式寄存器结构,可以减少数据在计算过程中因为架构约束而带来的调度性能下降,并通过层次化的寄存器文件设计和调度策略,提高阵列的计算性能。The existing reconfigurable register file architecture can be divided into two categories according to the impact on the computing performance of the array: one is the slice access register outside the array, and the other is the distributed register inside the array. The data access optimization of the reconfigurable array can be realized by reducing the memory access delay through the on-chip access register outside the array on the one hand, and on the other hand by optimizing the memory access mode of the on-chip access register. By optimizing the distributed register structure in the array, the scheduling performance degradation caused by architectural constraints in the data calculation process can be reduced, and the computing performance of the array can be improved through the hierarchical register file design and scheduling strategy.

可重构片上存取寄存器所涉及的寄存器组织形式、共享机制、替换策略、划分机制都需要根据具体的阵列结构和访存特性进行相应研究,以便在低访问延迟和高命中率之间进行权衡与折中。通过对阵列外的数据访存通路的研究,包括片上存储器的设计,预取和重用的单元结构,以及基于矢量和标量的寄存器文件设计组成了阵列外的数据流通路。The register organization form, sharing mechanism, replacement strategy, and partition mechanism involved in reconfigurable on-chip access registers need to be studied according to the specific array structure and memory access characteristics in order to trade off between low access latency and high hit rate with a compromise. Through the research on the data access path outside the array, including the design of on-chip memory, the unit structure of prefetching and reuse, and the design of register files based on vector and scalar, the data flow path outside the array is formed.

可重构阵列与全局寄存器的访存方式对访存效率的影响也愈发突出,对于连续,密集数据的快速,对带宽性能的影响巨大。可重构阵列与全局寄存器的互联形式明显束缚阵列的对访存性能,现有设计中采用的交叉互联结构以及环状结构实现高效的访问的目标,满足低延迟、高带宽、低功耗的存储访问需求;或者采用二维的存取模式用于加速多媒体数据的存取。The memory access methods of reconfigurable arrays and global registers have an increasingly prominent impact on memory access efficiency, and have a huge impact on bandwidth performance for fast continuous and dense data. The interconnection form of the reconfigurable array and the global register obviously restricts the memory access performance of the array. The cross-interconnection structure and the ring structure used in the existing design achieve the goal of efficient access and meet the requirements of low latency, high bandwidth, and low power consumption. Storage access requirements; or use a two-dimensional access mode to accelerate multimedia data access.

如何以较低代价实现行向量寄存器和列向量寄存器的灵活分块,即设计出可重构的层次化的阵列寄存器文件,仍是本领域研究的热点问题。How to achieve flexible partitioning of row vector registers and column vector registers at a lower cost, that is, to design a reconfigurable hierarchical array register file, is still a hot issue in this field.

实用新型内容Utility model content

发明目的:为了克服现有技术中存在的不足,本实用新型提供一种粗粒度可重构层次化的阵列寄存器文件结构,解决可重构计算过程中阵列数据的寄存和传输问题,以实现提高阵列中数据变量存储效率和可重构计算性能的优点。Purpose of the invention: In order to overcome the deficiencies in the prior art, the utility model provides a coarse-grained reconfigurable hierarchical array register file structure to solve the problem of registering and transmitting array data in the process of reconfigurable calculations, so as to improve Advantages of data variable storage efficiency and reconfigurable computing performance in arrays.

技术方案:为实现上述目的,本实用新型采用的技术方案为:Technical scheme: in order to achieve the above object, the technical scheme adopted in the utility model is:

一种粗粒度可重构层次化的阵列寄存器文件结构,用于实现m×n矩形阵列排布的可重构阵列与系统控制内核之间的参数传递,同时完成可重构阵列上的数据寄存和传输,通过硬件连接实现,具体包括全局寄存器文件、本地寄存器文件和分布式寄存器文件:A coarse-grained reconfigurable hierarchical array register file structure, which is used to realize the parameter transfer between the reconfigurable array arranged in an m×n rectangular array and the system control core, and at the same time complete the data registration on the reconfigurable array and transmission, realized through hardware connection, including global register file, local register file and distributed register file:

所述全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,作为系统控制内核调用可重构阵列时传递参数使用的寄存器,同时作为每个重构处理单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数;The global register file: as a shared register connecting the system control core and the reconfigurable array, as a register used for passing parameters when the system control core calls the reconfigurable array, and as a register that can be connected to each reconfigurable processing unit, Has the largest fan-out factor among reconfigurable arrays;

所述本地寄存器文件:每个重构处理单元均对应连接有一个本地寄存器,所述本地寄存器作为与之相对应的重构处理单元的私有寄存器,数据仅供与之相对应的重构处理单元使用;The local register file: each reconfiguration processing unit is correspondingly connected with a local register, and the local register is used as a private register of the corresponding reconfiguration processing unit, and the data is only used by the corresponding reconfiguration processing unit ;

所述分布式寄存器文件:与可重构阵列连接,作为可重构阵列中部分重构处理单元之间的数据寄存和传输通道。The distributed register file: connected to the reconfigurable array, serves as a data storage and transmission channel between some reconfigurable processing units in the reconfigurable array.

优选的,所述全局寄存器文件,包括n个全局寄存器,所述全局寄存器文件的数据位宽与重构处理单元的数据位宽一致;所述全局寄存器文件作为数据传输通道,用于传输输入参数和返回值,并且系统控制内核和可重构阵列都可以对全局寄存器进行存取。Preferably, the global register file includes n global registers, and the data bit width of the global register file is consistent with the data bit width of the reconstruction processing unit; the global register file is used as a data transmission channel for transmitting input parameters and return values, and both the system control core and the reconfigurable array can access global registers.

优选的,所述全局寄存器文件与可重构阵列直接互联,具体实现方法为:设计全局寄存器文件顶端的m个全局寄存器和底端的1个全局寄存器采用全网状互联和总线互联、可以被所有的重构处理单元访问,其余全局寄存器采用总线互联;当循环传入参数大于m、超过顶端的m个全局寄存器时,多出的参数需要通过总线访问;底端的1个全局寄存器用于传输函数返回值。Preferably, the global register file is directly interconnected with the reconfigurable array, and the specific implementation method is: design the m global registers at the top of the global register file and the 1 global register at the bottom to adopt full mesh interconnection and bus interconnection, which can be used by all The rest of the global registers are connected by the bus; when the loop input parameters are greater than m and exceed the top m global registers, the extra parameters need to be accessed through the bus; the bottom global register is used for the transfer function return value.

优选的,所述本地寄存器文件,主要用于存储生命周期较长而空间位置固定的变量,其输入和输出的对象都只是其私有的重构处理单元(对应的重构处理单元);所述本地寄存器能够在一个周期内完成输出数据到输入数据的准备工作;所述本地寄存器的写入通过配置字中的使能位控制,当使能位置时,其可以在一个周期内完成将重构处理单元的计算结果写入本地寄存器的本地寄存器文件中。Preferably, the local register file is mainly used to store variables with a long life cycle and a fixed spatial location, and its input and output objects are only its private reconstruction processing unit (corresponding reconstruction processing unit); The local register can complete the preparation of the output data to the input data in one cycle; the writing of the local register is controlled by the enable bit in the configuration word, and when the position is enabled, it can be completed in one cycle and will reconfigure The calculation result of the processing unit is written into the local register file of the local register.

优选的,所述分布式寄存器文件由按m×n矩形阵列排布的分布式寄存器构成,每行寄存器组和每列寄存器组共享一个分布式寄存器,分布式寄存器和重构处理单元的位置一一对应;每个重构处理单元可以操作两组寄存器,分别为对应位置分布式寄存器所置于的一行寄存器组和一列寄存器组;每个重构处理单元在同一时间仅能操作一组寄存器,多个重构处理单元间的读写操作通过多路器进行选择;多个重构处理单元可以同时对跨行域寄存器组进行写操作;多个重构处理单元可以同时对同一个分布式寄存器进行读操作。Preferably, the distributed register file is composed of distributed registers arranged in an m×n rectangular array, each row of register groups and each column of register groups share a distributed register, and the positions of the distributed register and the reconstruction processing unit are the same One-to-one correspondence; each reconfiguration processing unit can operate two sets of registers, which are respectively a row of registers and a column of registers where the distributed registers are placed in the corresponding position; each reconfiguration processing unit can only operate one set of registers at the same time, The read and write operations between multiple reconfiguration processing units are selected through multiplexers; multiple reconfiguration processing units can simultaneously write to cross-row domain register groups; multiple reconfiguration processing units can simultaneously perform write operations on the same distributed register read operation.

优选的,所述跨域寄存器组以行互联或列互联的方式实现互联,当位于第i行、第j列的重构处理单元将数据写入跨域寄存器时,位于第i行或第j列上的所有重构处理单元都可以通过跨域寄存器获得数据。Preferably, the cross-domain register group is interconnected in the form of row interconnection or column interconnection. When the reconstruction processing unit located in the i-th row and j-th column writes data into the cross-domain register, the i-th row or j-th All reconstruction processing units on a column can obtain data through cross-domain registers.

优选的,所述分布式寄存器文件采用多输入、多输出的数据访存形式,为了避免出现不同的重构处理单元同时存取同一个分布式寄存器,采用下述两种方法规避:Preferably, the distributed register file adopts a multi-input, multi-output data access form. In order to avoid different reconstruction processing units from simultaneously accessing the same distributed register, the following two methods are used to avoid:

方法一、通过在映射中避免同时对同一个分布式寄存器进行存取;Method 1. By avoiding simultaneous access to the same distributed register in the mapping;

方法二、在不可预知的多个重构处理单元同时存取同一个分布式寄存器的情况下,根据重构处理单元在可重构阵列中的编号,按照编号顺序从大到小进行优先等级划分,优先等级高的重构处理单元用于写入的权利。Method 2: In the case of multiple unpredictable reconfiguration processing units accessing the same distributed register at the same time, according to the number of reconfiguration processing units in the reconfigurable array, the priority is divided according to the order of numbers from large to small , the reconfiguration processing unit with a high priority is used for the right to write.

有益效果:本实用新型提供的粗粒度可重构层次化的阵列寄存器文件结构,使得可重构计算过程中阵列数据的寄存和传输能够准确高效地进行,提高阵列中数据变量存储效率和可重构计算性能。Beneficial effects: the coarse-grained reconfigurable hierarchical array register file structure provided by the utility model enables accurate and efficient storage and transmission of array data in the reconfigurable calculation process, and improves the storage efficiency and reproducibility of data variables in the array. structural computing performance.

附图说明Description of drawings

图1为本实用新型的一种结构示意图;Fig. 1 is a kind of structural representation of the utility model;

图2为全局寄存器文件示意图;FIG. 2 is a schematic diagram of a global register file;

图3为本地寄存器文件示意图;Fig. 3 is a schematic diagram of a local register file;

图4为分布式寄存器文件示意图;FIG. 4 is a schematic diagram of a distributed register file;

图5为本实用新型的一个实例的结构示意图;Fig. 5 is the structural representation of an example of the utility model;

图6为本实用新型数据变量寄存传输的流程图。Fig. 6 is a flow chart of data variable registration and transmission of the present invention.

具体实施方式Detailed ways

下面结合附图对本实用新型作更进一步的说明。Below in conjunction with accompanying drawing, the utility model is described further.

一种粗粒度可重构层次化的阵列寄存器文件结构,用于实现m×n矩形阵列排布的可重构阵列与系统控制内核之间的参数传递,同时完成可重构阵列上的数据寄存和传输;如图1所示,包括全局寄存器文件、本地寄存器文件和分布式寄存器文件。A coarse-grained reconfigurable hierarchical array register file structure, which is used to realize the parameter transfer between the reconfigurable array arranged in an m×n rectangular array and the system control core, and at the same time complete the data registration on the reconfigurable array and transmission; as shown in Figure 1, including a global register file, a local register file and a distributed register file.

所述全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,不仅满足系统控制内核对可重构阵列调用时的参数传递需求,而且作为每个重构处理单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数。The global register file: as a shared register connecting the system control core and the reconfigurable array, it not only meets the parameter transfer requirements when the system control core calls the reconfigurable array, but also serves as a register that can be connected to each reconfigurable processing unit , has the largest fanout factor among reconfigurable arrays.

所述全局寄存器文件,包括n个全局寄存器,所述全局寄存器文件的数据位宽与重构处理单元的数据位宽一致;所述全局寄存器文件作为数据传输通道,用于传输输入参数和返回值,并且系统控制内核和可重构阵列都可以对全局寄存器进行存取。所述全局寄存器文件与可重构阵列直接互联,具体实现方法为:设计全局寄存器文件顶端的m个全局寄存器和底端的1个全局寄存器采用全网状互联和总线互联、可以被所有的重构处理单元访问,其余全局寄存器采用总线互联;当循环传入参数大于m、超过顶端的m个全局寄存器时,多出的参数需要通过总线访问;底端的1个全局寄存器用于传输函数返回值。The global register file includes n global registers, and the data bit width of the global register file is consistent with the data bit width of the reconstruction processing unit; the global register file is used as a data transmission channel for transmitting input parameters and return values , and both the system control core and the reconfigurable array can access the global registers. The global register file is directly interconnected with the reconfigurable array, and the specific implementation method is: design the m global registers at the top of the global register file and 1 global register at the bottom to adopt full mesh interconnection and bus interconnection, which can be reconfigured by all The processing unit accesses, and the rest of the global registers are interconnected by bus; when the loop input parameter is greater than m and exceeds the top m global registers, the extra parameters need to be accessed through the bus; the bottom global register is used to transfer the return value of the function.

如图2所示的全局寄存器文件,包含了16个全局寄存器,其中顶端的3个全局寄存器和底端的1个全局寄存器可以被所有的重构处理单元访问;当循环传入的参数大于3,超过顶端的3个寄存器时,多出的参数需要通过总线访问;特别的,底端的全局寄存器用于函数返回值;全局寄存器采用重构阵列时钟域和复位域,支持软、硬件复位操作。The global register file shown in Figure 2 contains 16 global registers, of which the top 3 global registers and the bottom 1 global register can be accessed by all reconstruction processing units; when the parameters passed in by the loop are greater than 3, When the top three registers are exceeded, the extra parameters need to be accessed through the bus; in particular, the global registers at the bottom are used for function return values; the global registers use the reconfigurable array clock domain and reset domain, and support software and hardware reset operations.

所述本地寄存器文件:每个重构处理单元均对应设计有一个本地寄存器,所述本地寄存器作为与之相对应的重构处理单元的私有寄存器,数据仅供与之相对应的重构处理单元使用。The local register file: each reconfiguration processing unit is correspondingly designed with a local register, and the local register is used as a private register of the corresponding reconfiguration processing unit, and the data is only used by the corresponding reconfiguration processing unit .

所述本地寄存器文件,主要用于存储生命周期较长而空间位置固定的变量,其输入和输出的对象都只是其私有的重构处理单元(对应的重构处理单元);所述本地寄存器能够在一个周期内完成输出数据到输入数据的准备工作;所述本地寄存器的写入通过配置字中的使能位控制,当使能位置时,其可以在一个周期内完成将重构处理单元的计算结果写入本地寄存器的本地寄存器文件中。The local register file is mainly used to store variables with a long life cycle and a fixed spatial location, and its input and output objects are only its private reconstruction processing unit (corresponding reconstruction processing unit); the local register can The preparation of the output data to the input data is completed in one cycle; the writing of the local register is controlled by the enable bit in the configuration word. When the position is enabled, it can be completed in one cycle to reconstruct the processing unit The calculation result is written to the local register file of the local register.

如图3所示,本地寄存器在设计时仅提供4个子本地寄存器,本地寄存器能够在1个周期中完成输出数据到输入数据的准备工作。As shown in Figure 3, the local register only provides 4 sub-local registers during design, and the local register can complete the preparation from output data to input data in one cycle.

所述分布式寄存器文件:作为可重构阵列中部分重构处理单元之间的数据寄存和传输通道。The distributed register file: serves as a data storage and transmission channel between some reconfigurable processing units in the reconfigurable array.

所述分布式寄存器文件由按m×n矩形阵列排布的分布式寄存器构成,每行寄存器组和每列寄存器组共享一个分布式寄存器,分布式寄存器和重构处理单元的位置一一对应;每个重构处理单元可以操作两组寄存器,分别为对应位置分布式寄存器所置于的一行寄存器组和一列寄存器组;每个重构处理单元在同一时间仅能操作一组寄存器,多个重构处理单元间的读写操作通过多路器进行选择;多个重构处理单元可以同时对跨行域寄存器组进行写操作;多个重构处理单元可以同时对同一个分布式寄存器进行读操作。所述跨域寄存器组以行互联或列互联的方式实现互联,当位于第i行、第j列的重构处理单元将数据写入跨域寄存器时,位于第i行或第j列上的所有重构处理单元都可以通过跨域寄存器获得数据。所述分布式寄存器文件采用多输入、多输出的数据访存形式,为了避免出现不同的重构处理单元同时存取同一个分布式寄存器,采用下述两种方法规避:The distributed register file is composed of distributed registers arranged in an m×n rectangular array, each row of register groups and each column of register groups share a distributed register, and the distributed registers correspond to the positions of the reconstruction processing units one by one; Each reconfiguration processing unit can operate two sets of registers, which are respectively a row of registers and a column of registers where the distributed registers are placed in corresponding positions; each reconfiguration processing unit can only operate one set of registers at the same time, and multiple reconfigurations The read and write operations between the reconfiguration processing units are selected through the multiplexer; multiple reconfiguration processing units can simultaneously perform write operations on cross-row domain register groups; multiple reconfiguration processing units can simultaneously perform read operations on the same distributed register. The cross-domain register group is interconnected in the form of row interconnection or column interconnection. When the reconstruction processing unit located in the i-th row and j-th column writes data into the cross-domain register, the i-th row or j-th column All reconstruction processing units can obtain data through cross-domain registers. The distributed register file adopts a multi-input and multi-output data access form. In order to avoid different reconstruction processing units from simultaneously accessing the same distributed register, the following two methods are used to avoid:

方法一、通过在映射中避免同时对同一个分布式寄存器进行存取;Method 1. By avoiding simultaneous access to the same distributed register in the mapping;

方法二、在不可预知的多个重构处理单元同时存取同一个分布式寄存器的情况下,根据重构处理单元在可重构阵列中的编号,按照编号顺序从大到小进行优先等级划分,优先等级高的重构处理单元用于写入的权利。Method 2: In the case of multiple unpredictable reconfiguration processing units accessing the same distributed register at the same time, according to the number of reconfiguration processing units in the reconfigurable array, the priority is divided according to the order of numbers from large to small , the reconfiguration processing unit with a high priority is used for the right to write.

举例来说,当位于阵列(i,j)点的重构处理单元输出的数据需要传递到位于阵列(1,1)点的重构处理单元时,通过位于(i,1)或(1,j)位置的重构处理单元进行数据传递,在运行Ti时刻,位于阵列(i,j)点的重构处理单元将数据写入跨行寄存器组中的0位,在Ti+1时刻,位于阵列(i,1)点的重构处理单元通过数据交换指令将DCR跨行寄存器组0中的数据写入跨列寄存器组0的位置,这样在Ti+2时刻,位于阵列(1,1)点的重构处理单元即可以在跨列寄存器组0中获得位于阵列(i,j)点的重构处理单元写出的数据。For example, when the data output by the reconstruction processing unit located at point (i, j) of the array needs to be transmitted to the reconstruction processing unit located at point (1, 1) of the array, by positioning at (i, 1) or (1, The reconstruction processing unit at position j) performs data transfer. At the moment of running T i , the reconstruction processing unit at point (i, j) of the array writes the data into bit 0 in the cross-row register group. At time T i +1, The reconstruction processing unit located at point (i, 1) of the array writes the data in the DCR cross-row register group 0 to the position of the cross-column register group 0 through the data exchange instruction, so that at T i + 2 time, the data in the array (1, 1 ) can obtain the data written by the reconstruction processing unit at point (i, j) of the array in cross-column register set 0.

如图5所示的可重构计算最小系统,采用了本案提出的可重构层次化的阵列寄存器文件结构。该系统的结构包括:用作系统控制内核的ARM7TDMI处理器、可重构阵列、全局寄存器文件、本地寄存器文件、用作传输数据的AHB总线和分布式寄存器文件。The minimum reconfigurable computing system shown in Figure 5 adopts the reconfigurable hierarchical array register file structure proposed in this case. The structure of the system includes: ARM7TDMI processor used as the system control core, reconfigurable array, global register file, local register file, AHB bus and distributed register file used for data transmission.

选择具有小型、快速、低能耗、编译器支持好等优点的ARM7TDMI处理器作为内核,用于控制系统运行的调度和配置;全局寄存器文件与可重构阵列通过64bitAHB总线相连;本地寄存器文件与可重构阵列通过专用的访问接口互联,数据位宽为128bit;分布式寄存器文件与可重构阵列通过专用的访问接口互联,数据位宽为128bit;可重构阵列含有4×4个重构处理单元,每个重构处理单元可以支持单周期的16位算术操作和逻辑操作。The ARM7TDMI processor with the advantages of small size, fast, low energy consumption, and good compiler support is selected as the core to control the scheduling and configuration of system operation; the global register file is connected to the reconfigurable array through the 64bitAHB bus; the local register file is connected to the reconfigurable The reconfigurable array is interconnected through a dedicated access interface with a data bit width of 128 bits; the distributed register file and the reconfigurable array are interconnected through a dedicated access interface with a data bit width of 128 bits; the reconfigurable array contains 4×4 reconfiguration processing Unit, each reconfigurable processing unit can support single-cycle 16-bit arithmetic operations and logic operations.

可重构阵列数据的寄存和传输的过程如图6所示,包括:传输请求:可重构阵列根据外部存储器取得的指令,请求参数或可重构阵列数据的传输;若所需传输的数据与系统控制内核交换,则通过全局寄存器文件实现和系统控制内核的数据交换;否则判断是否为可重构阵列内数据交换,若为可重构阵列内数据交换,则通过分布式寄存器文件实现数据的寄存和交换;否则的即为重构处理单元的数据寄存,通过本地寄存器进行数据寄存。可重构数据的寄存和传输根据不同情况,选择最合适的寄存器文件进行寄存和传输,充分利用了寄存器文件的资源,从而提高了数据变量存储效率和可重构计算性能。The registration and transmission process of the reconfigurable array data is shown in Figure 6, including: transmission request: the reconfigurable array obtains instructions from the external memory, requests the transmission of parameters or reconfigurable array data; if the data to be transmitted To exchange with the system control core, the data exchange with the system control core is realized through the global register file; otherwise, it is judged whether it is data exchange in the reconfigurable array, and if it is data exchange in the reconfigurable array, the data is realized through the distributed register file Otherwise, it is the data registration of the reconstruction processing unit, and the data registration is performed through the local register. The registration and transmission of reconfigurable data According to different situations, the most suitable register file is selected for registration and transmission, which makes full use of the resources of the register file, thereby improving the storage efficiency of data variables and reconfigurable computing performance.

以上所述仅是本实用新型的优选实施方式,应当指出:对于本技术领域的普通技术人员来说,在不脱离本实用新型原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本实用新型的保护范围。The above is only a preferred embodiment of the utility model, it should be pointed out that for those of ordinary skill in the art, without departing from the principle of the utility model, some improvements and modifications can also be made. Retouching should also be regarded as the scope of protection of the present utility model.

Claims (6)

1. the array register file structure of coarseness restructural stratification, it is characterized in that: for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit, and comprise global register file, local register file and distributed register file:
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the register that during as system control kernel calls reconfigurable arrays, Transfer Parameters uses, the register that can connect as each reconstruction processing unit, has the fan leaves coefficient maximum in reconfigurable arrays simultaneously;
Described local register file: the equal correspondence in each reconstruction processing unit is connected with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast;
Described distributed register file: be connected with reconfigurable arrays, deposit and transmission channel as the data between reconstruction processing unit in reconfigurable arrays.
2. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described global register file, comprise n global register, the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.
3. the array register file structure of coarseness restructural according to claim 2 stratification, it is characterized in that: described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; 1 global register of bottom is for transition function rreturn value.
4. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described local register file, be mainly used in storage lifecycle compared with length and the fixing variable in locus, the object of its input and output is all its privately owned reconstruction processing unit; Writing by the enable bit control in configuration words of described local register.
5. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described distributed register file is made up of the distributed register of arranging by m × n rectangular array, every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.
6. the array register file structure of coarseness restructural according to claim 5 stratification, it is characterized in that: described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.
CN201420060189.9U 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure Expired - Lifetime CN203706196U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201420060189.9U CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201420060189.9U CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Publications (1)

Publication Number Publication Date
CN203706196U true CN203706196U (en) 2014-07-09

Family

ID=51056601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201420060189.9U Expired - Lifetime CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Country Status (1)

Country Link
CN (1) CN203706196U (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761072A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity reconfigurable hierarchical array register file structure
CN111630487A (en) * 2017-12-22 2020-09-04 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761072A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity reconfigurable hierarchical array register file structure
CN103761072B (en) * 2014-02-10 2016-08-31 东南大学 A kind of array register file structure of coarseness reconfigurable hierarchical
CN111630487A (en) * 2017-12-22 2020-09-04 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing
CN111630487B (en) * 2017-12-22 2023-06-20 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing

Similar Documents

Publication Publication Date Title
CN103761072B (en) A kind of array register file structure of coarseness reconfigurable hierarchical
US10705960B2 (en) Processors having virtually clustered cores and cache slices
Rahimi et al. A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters
KR102655386B1 (en) Method and apparatus for distributed and cooperative computation in artificial neural networks
CN103744644B (en) The four core processor systems built using four nuclear structures and method for interchanging data
Zhan et al. OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures
CN103761075B (en) Coarse granularity dynamic reconfigurable data integration and control unit structure
US9575806B2 (en) Monitoring accesses of a thread to multiple memory controllers and selecting a thread processor for the thread based on the monitoring
WO2020103058A1 (en) Programmable operation and control chip, a design method, and device comprising same
CN107562549B (en) Isomery many-core ASIP framework based on on-chip bus and shared drive
CN110347635A (en) A kind of heterogeneous polynuclear microprocessor based on multilayer bus
CN102279818B (en) Vector data access and storage control method supporting limited sharing and vector memory
CN107506329B (en) A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line
US10884672B2 (en) NDP-server: a data-centric computing architecture based on storage server in data center
CN102306139A (en) Heterogeneous multi-core digital signal processor for orthogonal frequency division multiplexing (OFDM) wireless communication system
CN104317770A (en) Data storage structure and data access method for multiple core processing system
Niu et al. FlashGNN: An In-SSD Accelerator for GNN Training
CN203706196U (en) Coarse-granularity reconfigurable and layered array register file structure
CN108874730A (en) A kind of data processor and data processing method
CN103455367B (en) Management unit and method for implementing multi-task scheduling in reconfigurable systems
CN100481060C (en) Method for multi-nuclear expansion in flow processor
Li et al. An efficient multicast router using shared-buffer with packet merging for dataflow architecture
CN203706197U (en) Coarse-granularity dynamic and reconfigurable data regularity control unit structure
CN102289424B (en) Configuration stream working method for dynamic reconfigurable array processor
CN117234720A (en) Dynamically configurable memory computing fusion data caching structure, processor and electronic equipment

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
AV01 Patent right actively abandoned

Granted publication date: 20140709

Effective date of abandoning: 20160831

C25 Abandonment of patent right or utility model to avoid double patenting