CN203706196U - Coarse-granularity reconfigurable and layered array register file structure - Google Patents
Coarse-granularity reconfigurable and layered array register file structure Download PDFInfo
- Publication number
- CN203706196U CN203706196U CN201420060189.9U CN201420060189U CN203706196U CN 203706196 U CN203706196 U CN 203706196U CN 201420060189 U CN201420060189 U CN 201420060189U CN 203706196 U CN203706196 U CN 203706196U
- Authority
- CN
- China
- Prior art keywords
- register
- register file
- processing unit
- reconstruction processing
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn - After Issue
Links
Landscapes
- Executing Machine-Instructions (AREA)
Abstract
The utility model discloses a coarse-granularity reconfigurable and layered array register file structure. The coarse-granularity reconfigurable and layered array register file structure comprises a global register file, a local register file and a distributed register file. Registers in the global register file serve as sharing registers used for connecting a system control kernel with a reconfigurable array, meet the requirement for parameter passing in the process that a system calls reconfigurable architecture, can be connected with each unit on the reconfigurable array, and have the maximum fan-out coefficient in the reconfigurable array; registers in the local register file serve as private registers of reconfiguration processing units, and data are used only by the registers in the local register file; registers in the distributed register file serve as registering and transmission channels for data of part of reconfiguration calculating units in the reconfigurable array. According to the coarse-granularity reconfigurable and layered array register file structure, due to the layered reconfigurable array register file structure design, array data registering and transmission in the reconfigurable calculation process are achieved, and data variable storage efficiency and reconfigurable calculation performance in an array are improved.
Description
Technical field
The utility model relates to the array register file structure of a kind of coarseness restructural stratification, belongs to imbedded reconfigurable designing technique.
Background technology
Along with the appearance of field programmable gate array Reconfiguration Technologies, greatly change the method for traditional embedded design, restructural calculates the computation schema as a kind of novel time-space domain, be with a wide range of applications in embedded and high performance calculating field, become the trend of current embedded system development.The media application field algorithms such as image processing and modern communications have large-scale parallel, need to carry out a large amount of matrix operations.Register file design allows to reorder flexibly and operates and shifting function.Realizing read/write data by MUX chooses in different registers.Compare time delay chain and shift register, register file has more power consumption consumption in the time of write operation.Consumption comprises the interrelated logic of decoding and selection.Therefore, only when register file is as long data storage, can offset the loss above power consumption.Register pair is not an optimal selection in the data of short-life-cycle.
Existing restructural register file framework can be divided into two classes according to the impact of pair array calculated performance: a class is the sheet access function resister outside array, and a class is the distributed register in array.The data access optimization of reconfigurable arrays, can reduce Memory accessing delay by access function resister on the sheet outside array on the one hand and realize, and can also realize by the memory access mode of optimizing access function resister on sheet on the other hand.And by optimizing the distributed register structure in array, can reduce the scheduling performance decline that data are brought because of framework constraint in computation process, and by register file design and the scheduling strategy of stratification, improve the calculated performance of array.
On restructural sheet, related register organizational form, shared mechanism, replacement policy, the partition mechanism of access function resister all needs to carry out corresponding research according to concrete array structure and memory access characteristic, to weigh between low access delay and high hit rate and compromise.By the research of the data memory access path outside pair array, comprise the design of on-chip memory, the cellular construction of looking ahead and reusing, and register file design based on vector scalar has formed the data flow circuit outside array.
The memory access mode of reconfigurable arrays and global register is also more outstanding on the impact of memory access efficiency, for continuously, density data fast, huge on the impact of bandwidth performance.The Join Shape of reconfigurable arrays and global register obviously fetter array to memory access performance, the cross interconnected structure adopting in existing design and ring texture realize the target of access efficiently, meet the memory access demand of low delay, high bandwidth, low-power consumption; Or adopt two-dimentional access mode for accelerating the access of multi-medium data.
As how realized the flexible piecemeal of row vector register and column vector register compared with low-cost, design the array register file of reconfigurable stratification, be still the hot issue of this area research.
Utility model content
Goal of the invention: in order to overcome the deficiencies in the prior art, the utility model provides the array register file structure of a kind of coarseness restructural stratification, solve depositing and transmission problem of array data in restructural computation process, to realize the advantage that improves data variable storage efficiency and restructural calculated performance in array.
Technical scheme: for achieving the above object, the technical solution adopted in the utility model is:
The array register file structure of a kind of coarseness restructural stratification, be used for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit, connect realization by hardware, specifically comprise global register file, local register file and distributed register file:
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the register that during as system control kernel calls reconfigurable arrays, Transfer Parameters uses, the register that can connect as each reconstruction processing unit, has the fan leaves coefficient maximum in reconfigurable arrays simultaneously;
Described local register file: the equal correspondence in each reconstruction processing unit is connected with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast;
Described distributed register file: be connected with reconfigurable arrays, deposit and transmission channel as the data between part reconstruction processing unit in reconfigurable arrays.
Preferably, described global register file, comprises n global register, and the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.
Preferably, described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; Import parameter into when circulation and be greater than m, while exceeding m the global register on top, the parameter having more need to be passed through bus access; 1 global register of bottom is for transition function rreturn value.
Preferably, described local register file, is mainly used in storage lifecycle compared with length and the fixing variable in locus, and the object of its input and output is all its privately owned reconstruction processing unit (corresponding reconstruction processing unit); Described local register can complete the preliminary work of output data to input data in one-period; Writing by the enable bit control in configuration words of described local register, in the time enabling position, it can complete the result of calculation of reconstruction processing unit is write in the local register file of local register in one-period.
Preferably, described distributed register file is made up of the distributed register of arranging by m × n rectangular array, and every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.
Preferably, described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.
Preferably, described distributed register file adopts the data memory access form of many inputs, many outputs, for fear of there is the same distributed register of different reconstruction processing unit access simultaneously, adopts following two kinds of methods to evade:
Method one, by avoid same distributed register being carried out to access in mapping simultaneously;
Method two, in unpredictable multiple reconstruction processing unit simultaneously the same distributed register of access in the situation that, numbering according to reconstruction processing unit in reconfigurable arrays, carry out from big to small priority level division according to number order, the reconstruction processing unit that priority level is high is used for the right writing.
Beneficial effect: the array register file structure of the coarseness restructural stratification that the utility model provides, make depositing and transmit and can carrying out accurately and efficiently of array data in restructural computation process, improve data variable storage efficiency and restructural calculated performance in array.
Brief description of the drawings
Fig. 1 is a kind of structural representation of the present utility model;
Fig. 2 is global register file schematic diagram;
Fig. 3 is local register file schematic diagram;
Fig. 4 is distributed register file schematic diagram;
Fig. 5 is the structural representation of an example of the present utility model;
Fig. 6 is the process flow diagram that the utility model data variable is deposited transmission.
Embodiment
Below in conjunction with accompanying drawing, the utility model is further described.
An array register file structure for coarseness restructural stratification, for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit; As shown in Figure 1, comprise global register file, local register file and distributed register file.
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the parameter not only meeting when checking reconfigurable arrays in system control and calling is transmitted demand, and the register that can connect as each reconstruction processing unit, have the fan leaves coefficient maximum in reconfigurable arrays.
Described global register file, comprises n global register, and the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.Described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; Import parameter into when circulation and be greater than m, while exceeding m the global register on top, the parameter having more need to be passed through bus access; 1 global register of bottom is for transition function rreturn value.
Global register file as shown in Figure 2, has comprised 16 global registers, and wherein 1 global register of 3 of top global registers and bottom can be by all reconstruction processing unit access; The parameter of importing into when circulation is greater than 3, and while exceeding 3 registers on top, the parameter having more need to be passed through bus access; Especially, the global register of bottom is for function return value; Global register adopts restructuring array clock zone and reset domain, supports soft and hardware reset operation.
Described local register file: the equal correspondence in each reconstruction processing unit is designed with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast.
Described local register file, is mainly used in storage lifecycle compared with length and the fixing variable in locus, and the object of its input and output is all its privately owned reconstruction processing unit (corresponding reconstruction processing unit); Described local register can complete the preliminary work of output data to input data in one-period; Writing by the enable bit control in configuration words of described local register, in the time enabling position, it can complete the result of calculation of reconstruction processing unit is write in the local register file of local register in one-period.
As shown in Figure 3, local register only provides 4 local registers of son, local register can in 1 cycle, complete the preliminary work of output data to input data in the time of design.
Described distributed register file: deposit and transmission channel as the data between part reconstruction processing unit in reconfigurable arrays.
Described distributed register file is made up of the distributed register of arranging by m × n rectangular array, and every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.Described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.Described distributed register file adopts the data memory access form of many inputs, many outputs, for fear of there is the same distributed register of different reconstruction processing unit access simultaneously, adopts following two kinds of methods to evade:
Method one, by avoid same distributed register being carried out to access in mapping simultaneously;
Method two, in unpredictable multiple reconstruction processing unit simultaneously the same distributed register of access in the situation that, numbering according to reconstruction processing unit in reconfigurable arrays, carry out from big to small priority level division according to number order, the reconstruction processing unit that priority level is high is used for the right writing.
For instance, in the time being positioned at the data of reconstruction processing unit output of array (i, j) point and need to being delivered to the reconstruction processing unit that is positioned at array (1,1) point, by being positioned at (i, 1) or (1, j) the reconstruction processing unit of position carries out data transmission, is moving T
iin the moment, the reconstruction processing unit that is positioned at array (i, j) point writes data 0 of inter-bank register group, at T
iin+1 moment, the reconstruction processing unit that is positioned at array (i, 1) point writes the position across column register group 0 by exchanges data instruction by the data in DCR inter-bank register group 0, like this at T
iin+2 moment, the reconstruction processing unit that is positioned at array (1,1) point can obtain and be positioned at the data that the reconstruction processing unit of array (i, j) point writes out in across column register group 0.
Restructural as shown in Figure 5 calculates minimum system, has adopted the array register file structure of the restructural stratification of this case proposition.The structure of this system comprises: as the ARM7TDMI processor of system control kernel, reconfigurable arrays, global register file, local register file, as ahb bus and the distributed register file of transmission data.
The ARM7TDMI processor of the advantages such as that selection has is small-sized, quick, low energy consumption, compiler are supported is as kernel, for scheduling and the configuration of control system operation; Global register file is connected by 64bitAHB bus with reconfigurable arrays; Local register file and reconfigurable arrays are interconnected by special access interface, and data bit width is 128bit; Distributed register file and reconfigurable arrays are interconnected by special access interface, and data bit width is 128bit; Reconfigurable arrays contains 4 × 4 reconstruction processing unit, and monocyclic 16 arithmetical operations and logical operation can be supported in each reconstruction processing unit.
The process of depositing and transmitting of reconfigurable arrays data as shown in Figure 6, comprising: transmission request: the instruction that reconfigurable arrays is obtained according to external memory storage, the transmission of required parameter or reconfigurable arrays data; If the exchange of the data and system control kernel of required transmission, by the exchanges data of the realization of global register file and system control kernel; Otherwise determine whether exchanges data in reconfigurable arrays, if exchanges data in reconfigurable arrays realizes depositing and exchanging of data by distributed register file; Otherwise the data that are reconstruction processing unit deposit, carry out data by local register and deposit.Depositing and transmitting according to different situations of reconfigurable data, selects most suitable register file deposit and transmit, and takes full advantage of the resource of register file, thereby has improved data variable storage efficiency and restructural calculated performance.
The above is only preferred implementation of the present utility model; be noted that for those skilled in the art; do not departing under the prerequisite of the utility model principle; can also make some improvements and modifications, these improvements and modifications also should be considered as protection domain of the present utility model.
Claims (6)
1. the array register file structure of coarseness restructural stratification, it is characterized in that: for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit, and comprise global register file, local register file and distributed register file:
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the register that during as system control kernel calls reconfigurable arrays, Transfer Parameters uses, the register that can connect as each reconstruction processing unit, has the fan leaves coefficient maximum in reconfigurable arrays simultaneously;
Described local register file: the equal correspondence in each reconstruction processing unit is connected with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast;
Described distributed register file: be connected with reconfigurable arrays, deposit and transmission channel as the data between reconstruction processing unit in reconfigurable arrays.
2. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described global register file, comprise n global register, the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.
3. the array register file structure of coarseness restructural according to claim 2 stratification, it is characterized in that: described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; 1 global register of bottom is for transition function rreturn value.
4. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described local register file, be mainly used in storage lifecycle compared with length and the fixing variable in locus, the object of its input and output is all its privately owned reconstruction processing unit; Writing by the enable bit control in configuration words of described local register.
5. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described distributed register file is made up of the distributed register of arranging by m × n rectangular array, every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.
6. the array register file structure of coarseness restructural according to claim 5 stratification, it is characterized in that: described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201420060189.9U CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201420060189.9U CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN203706196U true CN203706196U (en) | 2014-07-09 |
Family
ID=51056601
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201420060189.9U Withdrawn - After Issue CN203706196U (en) | 2014-02-10 | 2014-02-10 | Coarse-granularity reconfigurable and layered array register file structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN203706196U (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761072A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity reconfigurable hierarchical array register file structure |
CN111630487A (en) * | 2017-12-22 | 2020-09-04 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
-
2014
- 2014-02-10 CN CN201420060189.9U patent/CN203706196U/en not_active Withdrawn - After Issue
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761072A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity reconfigurable hierarchical array register file structure |
CN103761072B (en) * | 2014-02-10 | 2016-08-31 | 东南大学 | A kind of array register file structure of coarseness reconfigurable hierarchical |
CN111630487A (en) * | 2017-12-22 | 2020-09-04 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
CN111630487B (en) * | 2017-12-22 | 2023-06-20 | 阿里巴巴集团控股有限公司 | Centralized-distributed hybrid organization of shared memory for neural network processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107590085B (en) | A kind of dynamic reconfigurable array data path and its control method with multi-level buffer | |
CN103761072A (en) | Coarse granularity reconfigurable hierarchical array register file structure | |
CN105912501B (en) | A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor | |
CN104699631A (en) | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) | |
CN101833441A (en) | Parallel vector processing engine structure | |
CN104571949A (en) | Processor for realizing computing and memory integration based on memristor and operation method thereof | |
CN103744644A (en) | Quad-core processor system built in quad-core structure and data switching method thereof | |
CN105335331A (en) | SHA256 realizing method and system based on large-scale coarse-grain reconfigurable processor | |
CN107506329B (en) | A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line | |
CN107562549B (en) | Isomery many-core ASIP framework based on on-chip bus and shared drive | |
CN104933008A (en) | Reconfigurable system and reconfigurable array structure and application of reconfigurable array structure | |
CN103970720A (en) | Embedded reconfigurable system based on large-scale coarse granularity and processing method of system | |
CN102306141B (en) | Method for describing configuration information of dynamic reconfigurable array | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
CN102279818A (en) | Vector data access and storage control method supporting limited sharing and vector memory | |
CN108647777A (en) | A kind of data mapped system and method for realizing that parallel-convolution calculates | |
US20180212894A1 (en) | Fork transfer of data between multiple agents within a reconfigurable fabric | |
US20180324112A1 (en) | Joining data within a reconfigurable fabric | |
CN203706196U (en) | Coarse-granularity reconfigurable and layered array register file structure | |
CN103902505A (en) | Configurable FFT processor circuit structure based on switching network | |
CN108874730A (en) | A kind of data processor and data processing method | |
CN104035896B (en) | Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system | |
CN103761213A (en) | On-chip array system based on circulating pipeline computation | |
CN106021171A (en) | An SM4-128 secret key extension realization method and system based on a large-scale coarseness reconfigurable processor | |
CN106155979B (en) | A kind of DES algorithm secret key expansion system and extended method based on coarseness reconstruction structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
AV01 | Patent right actively abandoned |
Granted publication date: 20140709 Effective date of abandoning: 20160831 |
|
C25 | Abandonment of patent right or utility model to avoid double patenting |