CN203706196U - Coarse-granularity reconfigurable and layered array register file structure - Google Patents
Coarse-granularity reconfigurable and layered array register file structure Download PDFInfo
- Publication number
- CN203706196U CN203706196U CN201420060189.9U CN201420060189U CN203706196U CN 203706196 U CN203706196 U CN 203706196U CN 201420060189 U CN201420060189 U CN 201420060189U CN 203706196 U CN203706196 U CN 203706196U
- Authority
- CN
- China
- Prior art keywords
- register
- register file
- processing unit
- array
- reconfigurable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000012545 processing Methods 0.000 claims abstract description 71
- 230000005540 biological transmission Effects 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000013461 design Methods 0.000 claims abstract description 13
- 238000012546 transfer Methods 0.000 claims abstract description 8
- 238000003491 array Methods 0.000 claims description 13
- 238000013517 stratification Methods 0.000 claims 6
- 230000007704 transition Effects 0.000 claims 1
- 238000013500 data storage Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 abstract description 4
- 238000004364 calculation method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Landscapes
- Executing Machine-Instructions (AREA)
Abstract
本实用新型公开了一种粗粒度可重构层次化的阵列寄存器文件结构,包括全局寄存器文件、本地寄存器文件和分布式寄存器文件。全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,不仅满足系统对可重构架构调用时的参数传递问题,而且作为阵列上每个单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数;本地寄存器文件:作为重构处理单元的私有寄存器,数据仅供自己使用;分布式寄存器文件:作为可重构阵列内部分重构计算单元数据寄存和传输通道。本实用新型通过层次化的可重构阵列寄存器文件结构设计,解决可重构计算过程中阵列数据的寄存和传输问题,提高阵列中数据变量存储效率和可重构计算性能。
The utility model discloses a coarse-grained reconfigurable hierarchical array register file structure, which includes a global register file, a local register file and a distributed register file. Global register file: As a shared register connecting the system control core and the reconfigurable array, it not only satisfies the problem of parameter transfer when the system calls the reconfigurable architecture, but also serves as a register that can be connected to each unit on the array, and has reconfigurable The largest fan-out factor in the array; local register file: as the private register of the reconfigurable processing unit, the data is only for its own use; distributed register file: as the data storage and transmission channel of the partially reconfigurable computing unit in the reconfigurable array. The utility model solves the storage and transmission problem of array data in the reconfigurable computing process through the hierarchical reconfigurable array register file structure design, and improves the storage efficiency of data variables in the array and the reconfigurable computing performance.
Description
技术领域technical field
本实用新型涉及一种粗粒度可重构层次化的阵列寄存器文件结构,属于嵌入式可重构设计技术。The utility model relates to a coarse-grained reconfigurable hierarchical array register file structure, which belongs to the embedded reconfigurable design technology.
背景技术Background technique
随着现场可编程阵列可重构技术的出现,大大改变了传统的嵌入式设计的方法,可重构计算作为一种新型时空域的计算模式,在嵌入式和高性能的计算领域具有广泛的应用前景,已经成为当前嵌入式系统发展的趋势。图像处理和现代通信等媒体应用领域算法具有大规模并行性,需要进行大量的矩阵运算。寄存器文件设计允许灵活的重排序操作和移位操作。通过多路选择器实现读/写数据选择到不同的寄存器中。相比延时链和移位寄存器,寄存器文件在写操作时有更多的功耗消耗。消耗包括译码和选择的相关逻辑。因此,只有在寄存器文件作为长时间的数据存储的时候可以抵消功耗上面的损失。寄存器对于短生命周期的数据不是一个最佳选择。With the emergence of field programmable array reconfigurable technology, the traditional embedded design method has been greatly changed. Reconfigurable computing, as a new type of space-time domain computing mode, has a wide range of embedded and high-performance computing fields. The application prospect has become the development trend of the current embedded system. Algorithms in media application fields such as image processing and modern communication have large-scale parallelism and require a large number of matrix operations. The register file design allows flexible reordering and shifting operations. Read/write data selection into different registers is achieved through a multiplexer. Compared with delay chains and shift registers, register files consume more power during write operations. Consumption includes the associated logic for decoding and selection. Therefore, the loss of power consumption can be offset only when the register file is used as long-term data storage. Registers are not an optimal choice for short-lived data.
可以将现有的可重构寄存器文件架构按照对阵列计算性能的影响划分为两类:一类是阵列外的片存取寄存器,一类是阵列内的分布式寄存器。可重构阵列的数据存取优化,一方面可以通过阵列外的片上存取寄存器减少访存延时实现,另一方面还可以通过优化片上存取寄存器的访存模式来实现。而通过优化阵列内的分布式寄存器结构,可以减少数据在计算过程中因为架构约束而带来的调度性能下降,并通过层次化的寄存器文件设计和调度策略,提高阵列的计算性能。The existing reconfigurable register file architecture can be divided into two categories according to the impact on the computing performance of the array: one is the slice access register outside the array, and the other is the distributed register inside the array. The data access optimization of the reconfigurable array can be realized by reducing the memory access delay through the on-chip access register outside the array on the one hand, and on the other hand by optimizing the memory access mode of the on-chip access register. By optimizing the distributed register structure in the array, the scheduling performance degradation caused by architectural constraints in the data calculation process can be reduced, and the computing performance of the array can be improved through the hierarchical register file design and scheduling strategy.
可重构片上存取寄存器所涉及的寄存器组织形式、共享机制、替换策略、划分机制都需要根据具体的阵列结构和访存特性进行相应研究,以便在低访问延迟和高命中率之间进行权衡与折中。通过对阵列外的数据访存通路的研究,包括片上存储器的设计,预取和重用的单元结构,以及基于矢量和标量的寄存器文件设计组成了阵列外的数据流通路。The register organization form, sharing mechanism, replacement strategy, and partition mechanism involved in reconfigurable on-chip access registers need to be studied according to the specific array structure and memory access characteristics in order to trade off between low access latency and high hit rate with a compromise. Through the research on the data access path outside the array, including the design of on-chip memory, the unit structure of prefetching and reuse, and the design of register files based on vector and scalar, the data flow path outside the array is formed.
可重构阵列与全局寄存器的访存方式对访存效率的影响也愈发突出,对于连续,密集数据的快速,对带宽性能的影响巨大。可重构阵列与全局寄存器的互联形式明显束缚阵列的对访存性能,现有设计中采用的交叉互联结构以及环状结构实现高效的访问的目标,满足低延迟、高带宽、低功耗的存储访问需求;或者采用二维的存取模式用于加速多媒体数据的存取。The memory access methods of reconfigurable arrays and global registers have an increasingly prominent impact on memory access efficiency, and have a huge impact on bandwidth performance for fast continuous and dense data. The interconnection form of the reconfigurable array and the global register obviously restricts the memory access performance of the array. The cross-interconnection structure and the ring structure used in the existing design achieve the goal of efficient access and meet the requirements of low latency, high bandwidth, and low power consumption. Storage access requirements; or use a two-dimensional access mode to accelerate multimedia data access.
如何以较低代价实现行向量寄存器和列向量寄存器的灵活分块,即设计出可重构的层次化的阵列寄存器文件,仍是本领域研究的热点问题。How to achieve flexible partitioning of row vector registers and column vector registers at a lower cost, that is, to design a reconfigurable hierarchical array register file, is still a hot issue in this field.
实用新型内容Utility model content
发明目的:为了克服现有技术中存在的不足,本实用新型提供一种粗粒度可重构层次化的阵列寄存器文件结构,解决可重构计算过程中阵列数据的寄存和传输问题,以实现提高阵列中数据变量存储效率和可重构计算性能的优点。Purpose of the invention: In order to overcome the deficiencies in the prior art, the utility model provides a coarse-grained reconfigurable hierarchical array register file structure to solve the problem of registering and transmitting array data in the process of reconfigurable calculations, so as to improve Advantages of data variable storage efficiency and reconfigurable computing performance in arrays.
技术方案:为实现上述目的,本实用新型采用的技术方案为:Technical scheme: in order to achieve the above object, the technical scheme adopted in the utility model is:
一种粗粒度可重构层次化的阵列寄存器文件结构,用于实现m×n矩形阵列排布的可重构阵列与系统控制内核之间的参数传递,同时完成可重构阵列上的数据寄存和传输,通过硬件连接实现,具体包括全局寄存器文件、本地寄存器文件和分布式寄存器文件:A coarse-grained reconfigurable hierarchical array register file structure, which is used to realize the parameter transfer between the reconfigurable array arranged in an m×n rectangular array and the system control core, and at the same time complete the data registration on the reconfigurable array and transmission, realized through hardware connection, including global register file, local register file and distributed register file:
所述全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,作为系统控制内核调用可重构阵列时传递参数使用的寄存器,同时作为每个重构处理单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数;The global register file: as a shared register connecting the system control core and the reconfigurable array, as a register used for passing parameters when the system control core calls the reconfigurable array, and as a register that can be connected to each reconfigurable processing unit, Has the largest fan-out factor among reconfigurable arrays;
所述本地寄存器文件:每个重构处理单元均对应连接有一个本地寄存器,所述本地寄存器作为与之相对应的重构处理单元的私有寄存器,数据仅供与之相对应的重构处理单元使用;The local register file: each reconfiguration processing unit is correspondingly connected with a local register, and the local register is used as a private register of the corresponding reconfiguration processing unit, and the data is only used by the corresponding reconfiguration processing unit ;
所述分布式寄存器文件:与可重构阵列连接,作为可重构阵列中部分重构处理单元之间的数据寄存和传输通道。The distributed register file: connected to the reconfigurable array, serves as a data storage and transmission channel between some reconfigurable processing units in the reconfigurable array.
优选的,所述全局寄存器文件,包括n个全局寄存器,所述全局寄存器文件的数据位宽与重构处理单元的数据位宽一致;所述全局寄存器文件作为数据传输通道,用于传输输入参数和返回值,并且系统控制内核和可重构阵列都可以对全局寄存器进行存取。Preferably, the global register file includes n global registers, and the data bit width of the global register file is consistent with the data bit width of the reconstruction processing unit; the global register file is used as a data transmission channel for transmitting input parameters and return values, and both the system control core and the reconfigurable array can access global registers.
优选的,所述全局寄存器文件与可重构阵列直接互联,具体实现方法为:设计全局寄存器文件顶端的m个全局寄存器和底端的1个全局寄存器采用全网状互联和总线互联、可以被所有的重构处理单元访问,其余全局寄存器采用总线互联;当循环传入参数大于m、超过顶端的m个全局寄存器时,多出的参数需要通过总线访问;底端的1个全局寄存器用于传输函数返回值。Preferably, the global register file is directly interconnected with the reconfigurable array, and the specific implementation method is: design the m global registers at the top of the global register file and the 1 global register at the bottom to adopt full mesh interconnection and bus interconnection, which can be used by all The rest of the global registers are connected by the bus; when the loop input parameters are greater than m and exceed the top m global registers, the extra parameters need to be accessed through the bus; the bottom global register is used for the transfer function return value.
优选的,所述本地寄存器文件,主要用于存储生命周期较长而空间位置固定的变量,其输入和输出的对象都只是其私有的重构处理单元(对应的重构处理单元);所述本地寄存器能够在一个周期内完成输出数据到输入数据的准备工作;所述本地寄存器的写入通过配置字中的使能位控制,当使能位置时,其可以在一个周期内完成将重构处理单元的计算结果写入本地寄存器的本地寄存器文件中。Preferably, the local register file is mainly used to store variables with a long life cycle and a fixed spatial location, and its input and output objects are only its private reconstruction processing unit (corresponding reconstruction processing unit); The local register can complete the preparation of the output data to the input data in one cycle; the writing of the local register is controlled by the enable bit in the configuration word, and when the position is enabled, it can be completed in one cycle and will reconfigure The calculation result of the processing unit is written into the local register file of the local register.
优选的,所述分布式寄存器文件由按m×n矩形阵列排布的分布式寄存器构成,每行寄存器组和每列寄存器组共享一个分布式寄存器,分布式寄存器和重构处理单元的位置一一对应;每个重构处理单元可以操作两组寄存器,分别为对应位置分布式寄存器所置于的一行寄存器组和一列寄存器组;每个重构处理单元在同一时间仅能操作一组寄存器,多个重构处理单元间的读写操作通过多路器进行选择;多个重构处理单元可以同时对跨行域寄存器组进行写操作;多个重构处理单元可以同时对同一个分布式寄存器进行读操作。Preferably, the distributed register file is composed of distributed registers arranged in an m×n rectangular array, each row of register groups and each column of register groups share a distributed register, and the positions of the distributed register and the reconstruction processing unit are the same One-to-one correspondence; each reconfiguration processing unit can operate two sets of registers, which are respectively a row of registers and a column of registers where the distributed registers are placed in the corresponding position; each reconfiguration processing unit can only operate one set of registers at the same time, The read and write operations between multiple reconfiguration processing units are selected through multiplexers; multiple reconfiguration processing units can simultaneously write to cross-row domain register groups; multiple reconfiguration processing units can simultaneously perform write operations on the same distributed register read operation.
优选的,所述跨域寄存器组以行互联或列互联的方式实现互联,当位于第i行、第j列的重构处理单元将数据写入跨域寄存器时,位于第i行或第j列上的所有重构处理单元都可以通过跨域寄存器获得数据。Preferably, the cross-domain register group is interconnected in the form of row interconnection or column interconnection. When the reconstruction processing unit located in the i-th row and j-th column writes data into the cross-domain register, the i-th row or j-th All reconstruction processing units on a column can obtain data through cross-domain registers.
优选的,所述分布式寄存器文件采用多输入、多输出的数据访存形式,为了避免出现不同的重构处理单元同时存取同一个分布式寄存器,采用下述两种方法规避:Preferably, the distributed register file adopts a multi-input, multi-output data access form. In order to avoid different reconstruction processing units from simultaneously accessing the same distributed register, the following two methods are used to avoid:
方法一、通过在映射中避免同时对同一个分布式寄存器进行存取;Method 1. By avoiding simultaneous access to the same distributed register in the mapping;
方法二、在不可预知的多个重构处理单元同时存取同一个分布式寄存器的情况下,根据重构处理单元在可重构阵列中的编号,按照编号顺序从大到小进行优先等级划分,优先等级高的重构处理单元用于写入的权利。Method 2: In the case of multiple unpredictable reconfiguration processing units accessing the same distributed register at the same time, according to the number of reconfiguration processing units in the reconfigurable array, the priority is divided according to the order of numbers from large to small , the reconfiguration processing unit with a high priority is used for the right to write.
有益效果:本实用新型提供的粗粒度可重构层次化的阵列寄存器文件结构,使得可重构计算过程中阵列数据的寄存和传输能够准确高效地进行,提高阵列中数据变量存储效率和可重构计算性能。Beneficial effects: the coarse-grained reconfigurable hierarchical array register file structure provided by the utility model enables accurate and efficient storage and transmission of array data in the reconfigurable calculation process, and improves the storage efficiency and reproducibility of data variables in the array. structural computing performance.
附图说明Description of drawings
图1为本实用新型的一种结构示意图;Fig. 1 is a kind of structural representation of the utility model;
图2为全局寄存器文件示意图;FIG. 2 is a schematic diagram of a global register file;
图3为本地寄存器文件示意图;Fig. 3 is a schematic diagram of a local register file;
图4为分布式寄存器文件示意图;FIG. 4 is a schematic diagram of a distributed register file;
图5为本实用新型的一个实例的结构示意图;Fig. 5 is the structural representation of an example of the utility model;
图6为本实用新型数据变量寄存传输的流程图。Fig. 6 is a flow chart of data variable registration and transmission of the present invention.
具体实施方式Detailed ways
下面结合附图对本实用新型作更进一步的说明。Below in conjunction with accompanying drawing, the utility model is described further.
一种粗粒度可重构层次化的阵列寄存器文件结构,用于实现m×n矩形阵列排布的可重构阵列与系统控制内核之间的参数传递,同时完成可重构阵列上的数据寄存和传输;如图1所示,包括全局寄存器文件、本地寄存器文件和分布式寄存器文件。A coarse-grained reconfigurable hierarchical array register file structure, which is used to realize the parameter transfer between the reconfigurable array arranged in an m×n rectangular array and the system control core, and at the same time complete the data registration on the reconfigurable array and transmission; as shown in Figure 1, including a global register file, a local register file and a distributed register file.
所述全局寄存器文件:作为连接系统控制内核和可重构阵列的共享寄存器,不仅满足系统控制内核对可重构阵列调用时的参数传递需求,而且作为每个重构处理单元都可以连接的寄存器,拥有可重构阵列中最大的扇出系数。The global register file: as a shared register connecting the system control core and the reconfigurable array, it not only meets the parameter transfer requirements when the system control core calls the reconfigurable array, but also serves as a register that can be connected to each reconfigurable processing unit , has the largest fanout factor among reconfigurable arrays.
所述全局寄存器文件,包括n个全局寄存器,所述全局寄存器文件的数据位宽与重构处理单元的数据位宽一致;所述全局寄存器文件作为数据传输通道,用于传输输入参数和返回值,并且系统控制内核和可重构阵列都可以对全局寄存器进行存取。所述全局寄存器文件与可重构阵列直接互联,具体实现方法为:设计全局寄存器文件顶端的m个全局寄存器和底端的1个全局寄存器采用全网状互联和总线互联、可以被所有的重构处理单元访问,其余全局寄存器采用总线互联;当循环传入参数大于m、超过顶端的m个全局寄存器时,多出的参数需要通过总线访问;底端的1个全局寄存器用于传输函数返回值。The global register file includes n global registers, and the data bit width of the global register file is consistent with the data bit width of the reconstruction processing unit; the global register file is used as a data transmission channel for transmitting input parameters and return values , and both the system control core and the reconfigurable array can access the global registers. The global register file is directly interconnected with the reconfigurable array, and the specific implementation method is: design the m global registers at the top of the global register file and 1 global register at the bottom to adopt full mesh interconnection and bus interconnection, which can be reconfigured by all The processing unit accesses, and the rest of the global registers are interconnected by bus; when the loop input parameter is greater than m and exceeds the top m global registers, the extra parameters need to be accessed through the bus; the bottom global register is used to transfer the return value of the function.
如图2所示的全局寄存器文件,包含了16个全局寄存器,其中顶端的3个全局寄存器和底端的1个全局寄存器可以被所有的重构处理单元访问;当循环传入的参数大于3,超过顶端的3个寄存器时,多出的参数需要通过总线访问;特别的,底端的全局寄存器用于函数返回值;全局寄存器采用重构阵列时钟域和复位域,支持软、硬件复位操作。The global register file shown in Figure 2 contains 16 global registers, of which the top 3 global registers and the bottom 1 global register can be accessed by all reconstruction processing units; when the parameters passed in by the loop are greater than 3, When the top three registers are exceeded, the extra parameters need to be accessed through the bus; in particular, the global registers at the bottom are used for function return values; the global registers use the reconfigurable array clock domain and reset domain, and support software and hardware reset operations.
所述本地寄存器文件:每个重构处理单元均对应设计有一个本地寄存器,所述本地寄存器作为与之相对应的重构处理单元的私有寄存器,数据仅供与之相对应的重构处理单元使用。The local register file: each reconfiguration processing unit is correspondingly designed with a local register, and the local register is used as a private register of the corresponding reconfiguration processing unit, and the data is only used by the corresponding reconfiguration processing unit .
所述本地寄存器文件,主要用于存储生命周期较长而空间位置固定的变量,其输入和输出的对象都只是其私有的重构处理单元(对应的重构处理单元);所述本地寄存器能够在一个周期内完成输出数据到输入数据的准备工作;所述本地寄存器的写入通过配置字中的使能位控制,当使能位置时,其可以在一个周期内完成将重构处理单元的计算结果写入本地寄存器的本地寄存器文件中。The local register file is mainly used to store variables with a long life cycle and a fixed spatial location, and its input and output objects are only its private reconstruction processing unit (corresponding reconstruction processing unit); the local register can The preparation of the output data to the input data is completed in one cycle; the writing of the local register is controlled by the enable bit in the configuration word. When the position is enabled, it can be completed in one cycle to reconstruct the processing unit The calculation result is written to the local register file of the local register.
如图3所示,本地寄存器在设计时仅提供4个子本地寄存器,本地寄存器能够在1个周期中完成输出数据到输入数据的准备工作。As shown in Figure 3, the local register only provides 4 sub-local registers during design, and the local register can complete the preparation from output data to input data in one cycle.
所述分布式寄存器文件:作为可重构阵列中部分重构处理单元之间的数据寄存和传输通道。The distributed register file: serves as a data storage and transmission channel between some reconfigurable processing units in the reconfigurable array.
所述分布式寄存器文件由按m×n矩形阵列排布的分布式寄存器构成,每行寄存器组和每列寄存器组共享一个分布式寄存器,分布式寄存器和重构处理单元的位置一一对应;每个重构处理单元可以操作两组寄存器,分别为对应位置分布式寄存器所置于的一行寄存器组和一列寄存器组;每个重构处理单元在同一时间仅能操作一组寄存器,多个重构处理单元间的读写操作通过多路器进行选择;多个重构处理单元可以同时对跨行域寄存器组进行写操作;多个重构处理单元可以同时对同一个分布式寄存器进行读操作。所述跨域寄存器组以行互联或列互联的方式实现互联,当位于第i行、第j列的重构处理单元将数据写入跨域寄存器时,位于第i行或第j列上的所有重构处理单元都可以通过跨域寄存器获得数据。所述分布式寄存器文件采用多输入、多输出的数据访存形式,为了避免出现不同的重构处理单元同时存取同一个分布式寄存器,采用下述两种方法规避:The distributed register file is composed of distributed registers arranged in an m×n rectangular array, each row of register groups and each column of register groups share a distributed register, and the distributed registers correspond to the positions of the reconstruction processing units one by one; Each reconfiguration processing unit can operate two sets of registers, which are respectively a row of registers and a column of registers where the distributed registers are placed in corresponding positions; each reconfiguration processing unit can only operate one set of registers at the same time, and multiple reconfigurations The read and write operations between the reconfiguration processing units are selected through the multiplexer; multiple reconfiguration processing units can simultaneously perform write operations on cross-row domain register groups; multiple reconfiguration processing units can simultaneously perform read operations on the same distributed register. The cross-domain register group is interconnected in the form of row interconnection or column interconnection. When the reconstruction processing unit located in the i-th row and j-th column writes data into the cross-domain register, the i-th row or j-th column All reconstruction processing units can obtain data through cross-domain registers. The distributed register file adopts a multi-input and multi-output data access form. In order to avoid different reconstruction processing units from simultaneously accessing the same distributed register, the following two methods are used to avoid:
方法一、通过在映射中避免同时对同一个分布式寄存器进行存取;Method 1. By avoiding simultaneous access to the same distributed register in the mapping;
方法二、在不可预知的多个重构处理单元同时存取同一个分布式寄存器的情况下,根据重构处理单元在可重构阵列中的编号,按照编号顺序从大到小进行优先等级划分,优先等级高的重构处理单元用于写入的权利。Method 2: In the case of multiple unpredictable reconfiguration processing units accessing the same distributed register at the same time, according to the number of reconfiguration processing units in the reconfigurable array, the priority is divided according to the order of numbers from large to small , the reconfiguration processing unit with a high priority is used for the right to write.
举例来说,当位于阵列(i,j)点的重构处理单元输出的数据需要传递到位于阵列(1,1)点的重构处理单元时,通过位于(i,1)或(1,j)位置的重构处理单元进行数据传递,在运行Ti时刻,位于阵列(i,j)点的重构处理单元将数据写入跨行寄存器组中的0位,在Ti+1时刻,位于阵列(i,1)点的重构处理单元通过数据交换指令将DCR跨行寄存器组0中的数据写入跨列寄存器组0的位置,这样在Ti+2时刻,位于阵列(1,1)点的重构处理单元即可以在跨列寄存器组0中获得位于阵列(i,j)点的重构处理单元写出的数据。For example, when the data output by the reconstruction processing unit located at point (i, j) of the array needs to be transmitted to the reconstruction processing unit located at point (1, 1) of the array, by positioning at (i, 1) or (1, The reconstruction processing unit at position j) performs data transfer. At the moment of running T i , the reconstruction processing unit at point (i, j) of the array writes the data into bit 0 in the cross-row register group. At time T i +1, The reconstruction processing unit located at point (i, 1) of the array writes the data in the DCR cross-row register group 0 to the position of the cross-column register group 0 through the data exchange instruction, so that at T i + 2 time, the data in the array (1, 1 ) can obtain the data written by the reconstruction processing unit at point (i, j) of the array in cross-column register set 0.
如图5所示的可重构计算最小系统,采用了本案提出的可重构层次化的阵列寄存器文件结构。该系统的结构包括:用作系统控制内核的ARM7TDMI处理器、可重构阵列、全局寄存器文件、本地寄存器文件、用作传输数据的AHB总线和分布式寄存器文件。The minimum reconfigurable computing system shown in Figure 5 adopts the reconfigurable hierarchical array register file structure proposed in this case. The structure of the system includes: ARM7TDMI processor used as the system control core, reconfigurable array, global register file, local register file, AHB bus and distributed register file used for data transmission.
选择具有小型、快速、低能耗、编译器支持好等优点的ARM7TDMI处理器作为内核,用于控制系统运行的调度和配置;全局寄存器文件与可重构阵列通过64bitAHB总线相连;本地寄存器文件与可重构阵列通过专用的访问接口互联,数据位宽为128bit;分布式寄存器文件与可重构阵列通过专用的访问接口互联,数据位宽为128bit;可重构阵列含有4×4个重构处理单元,每个重构处理单元可以支持单周期的16位算术操作和逻辑操作。The ARM7TDMI processor with the advantages of small size, fast, low energy consumption, and good compiler support is selected as the core to control the scheduling and configuration of system operation; the global register file is connected to the reconfigurable array through the 64bitAHB bus; the local register file is connected to the reconfigurable The reconfigurable array is interconnected through a dedicated access interface with a data bit width of 128 bits; the distributed register file and the reconfigurable array are interconnected through a dedicated access interface with a data bit width of 128 bits; the reconfigurable array contains 4×4 reconfiguration processing Unit, each reconfigurable processing unit can support single-cycle 16-bit arithmetic operations and logic operations.
可重构阵列数据的寄存和传输的过程如图6所示,包括:传输请求:可重构阵列根据外部存储器取得的指令,请求参数或可重构阵列数据的传输;若所需传输的数据与系统控制内核交换,则通过全局寄存器文件实现和系统控制内核的数据交换;否则判断是否为可重构阵列内数据交换,若为可重构阵列内数据交换,则通过分布式寄存器文件实现数据的寄存和交换;否则的即为重构处理单元的数据寄存,通过本地寄存器进行数据寄存。可重构数据的寄存和传输根据不同情况,选择最合适的寄存器文件进行寄存和传输,充分利用了寄存器文件的资源,从而提高了数据变量存储效率和可重构计算性能。The registration and transmission process of the reconfigurable array data is shown in Figure 6, including: transmission request: the reconfigurable array obtains instructions from the external memory, requests the transmission of parameters or reconfigurable array data; if the data to be transmitted To exchange with the system control core, the data exchange with the system control core is realized through the global register file; otherwise, it is judged whether it is data exchange in the reconfigurable array, and if it is data exchange in the reconfigurable array, the data is realized through the distributed register file Otherwise, it is the data registration of the reconstruction processing unit, and the data registration is performed through the local register. The registration and transmission of reconfigurable data According to different situations, the most suitable register file is selected for registration and transmission, which makes full use of the resources of the register file, thereby improving the storage efficiency of data variables and reconfigurable computing performance.
以上所述仅是本实用新型的优选实施方式,应当指出:对于本技术领域的普通技术人员来说,在不脱离本实用新型原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本实用新型的保护范围。The above is only a preferred embodiment of the utility model, it should be pointed out that for those of ordinary skill in the art, without departing from the principle of the utility model, some improvements and modifications can also be made. Retouching should also be regarded as the scope of protection of the present utility model.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201420060189.9U CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201420060189.9U CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN203706196U true CN203706196U (en) | 2014-07-09 |
Family
ID=51056601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201420060189.9U Expired - Lifetime CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN203706196U (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761072A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity reconfigurable hierarchical array register file structure |
CN111630487A (en) * | 2017-12-22 | 2020-09-04 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
-
2014
- 2014-02-10 CN CN201420060189.9U patent/CN203706196U/en not_active Expired - Lifetime
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761072A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity reconfigurable hierarchical array register file structure |
CN103761072B (en) * | 2014-02-10 | 2016-08-31 | 东南大学 | A kind of array register file structure of coarseness reconfigurable hierarchical |
CN111630487A (en) * | 2017-12-22 | 2020-09-04 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
CN111630487B (en) * | 2017-12-22 | 2023-06-20 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761072B (en) | A kind of array register file structure of coarseness reconfigurable hierarchical | |
US10705960B2 (en) | Processors having virtually clustered cores and cache slices | |
Rahimi et al. | A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters | |
KR102655386B1 (en) | Method and apparatus for distributed and cooperative computation in artificial neural networks | |
CN103744644B (en) | The four core processor systems built using four nuclear structures and method for interchanging data | |
Zhan et al. | OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures | |
CN103761075B (en) | Coarse granularity dynamic reconfigurable data integration and control unit structure | |
US9575806B2 (en) | Monitoring accesses of a thread to multiple memory controllers and selecting a thread processor for the thread based on the monitoring | |
WO2020103058A1 (en) | Programmable operation and control chip, a design method, and device comprising same | |
CN107562549B (en) | Isomery many-core ASIP framework based on on-chip bus and shared drive | |
CN110347635A (en) | A kind of heterogeneous polynuclear microprocessor based on multilayer bus | |
CN102279818B (en) | Vector data access and storage control method supporting limited sharing and vector memory | |
CN107506329B (en) | A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line | |
US10884672B2 (en) | NDP-server: a data-centric computing architecture based on storage server in data center | |
CN102306139A (en) | Heterogeneous multi-core digital signal processor for orthogonal frequency division multiplexing (OFDM) wireless communication system | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
Niu et al. | FlashGNN: An In-SSD Accelerator for GNN Training | |
CN203706196U (en) | Coarse-granularity reconfigurable and layered array register file structure | |
CN108874730A (en) | A kind of data processor and data processing method | |
CN103455367B (en) | Management unit and method for implementing multi-task scheduling in reconfigurable systems | |
CN100481060C (en) | Method for multi-nuclear expansion in flow processor | |
Li et al. | An efficient multicast router using shared-buffer with packet merging for dataflow architecture | |
CN203706197U (en) | Coarse-granularity dynamic and reconfigurable data regularity control unit structure | |
CN102289424B (en) | Configuration stream working method for dynamic reconfigurable array processor | |
CN117234720A (en) | Dynamically configurable memory computing fusion data caching structure, processor and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
AV01 | Patent right actively abandoned |
Granted publication date: 20140709 Effective date of abandoning: 20160831 |
|
C25 | Abandonment of patent right or utility model to avoid double patenting |