CN203706196U - Coarse-granularity reconfigurable and layered array register file structure - Google Patents

Coarse-granularity reconfigurable and layered array register file structure Download PDF

Info

Publication number
CN203706196U
CN203706196U CN201420060189.9U CN201420060189U CN203706196U CN 203706196 U CN203706196 U CN 203706196U CN 201420060189 U CN201420060189 U CN 201420060189U CN 203706196 U CN203706196 U CN 203706196U
Authority
CN
China
Prior art keywords
register
register file
processing unit
reconstruction processing
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn - After Issue
Application number
CN201420060189.9U
Other languages
Chinese (zh)
Inventor
曹鹏
葛伟
徐凯
刘波
杨锦江
马俊
杨军
王超
卜爱国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201420060189.9U priority Critical patent/CN203706196U/en
Application granted granted Critical
Publication of CN203706196U publication Critical patent/CN203706196U/en
Anticipated expiration legal-status Critical
Withdrawn - After Issue legal-status Critical Current

Links

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The utility model discloses a coarse-granularity reconfigurable and layered array register file structure. The coarse-granularity reconfigurable and layered array register file structure comprises a global register file, a local register file and a distributed register file. Registers in the global register file serve as sharing registers used for connecting a system control kernel with a reconfigurable array, meet the requirement for parameter passing in the process that a system calls reconfigurable architecture, can be connected with each unit on the reconfigurable array, and have the maximum fan-out coefficient in the reconfigurable array; registers in the local register file serve as private registers of reconfiguration processing units, and data are used only by the registers in the local register file; registers in the distributed register file serve as registering and transmission channels for data of part of reconfiguration calculating units in the reconfigurable array. According to the coarse-granularity reconfigurable and layered array register file structure, due to the layered reconfigurable array register file structure design, array data registering and transmission in the reconfigurable calculation process are achieved, and data variable storage efficiency and reconfigurable calculation performance in an array are improved.

Description

The array register file structure of a kind of coarseness restructural stratification
Technical field
The utility model relates to the array register file structure of a kind of coarseness restructural stratification, belongs to imbedded reconfigurable designing technique.
Background technology
Along with the appearance of field programmable gate array Reconfiguration Technologies, greatly change the method for traditional embedded design, restructural calculates the computation schema as a kind of novel time-space domain, be with a wide range of applications in embedded and high performance calculating field, become the trend of current embedded system development.The media application field algorithms such as image processing and modern communications have large-scale parallel, need to carry out a large amount of matrix operations.Register file design allows to reorder flexibly and operates and shifting function.Realizing read/write data by MUX chooses in different registers.Compare time delay chain and shift register, register file has more power consumption consumption in the time of write operation.Consumption comprises the interrelated logic of decoding and selection.Therefore, only when register file is as long data storage, can offset the loss above power consumption.Register pair is not an optimal selection in the data of short-life-cycle.
Existing restructural register file framework can be divided into two classes according to the impact of pair array calculated performance: a class is the sheet access function resister outside array, and a class is the distributed register in array.The data access optimization of reconfigurable arrays, can reduce Memory accessing delay by access function resister on the sheet outside array on the one hand and realize, and can also realize by the memory access mode of optimizing access function resister on sheet on the other hand.And by optimizing the distributed register structure in array, can reduce the scheduling performance decline that data are brought because of framework constraint in computation process, and by register file design and the scheduling strategy of stratification, improve the calculated performance of array.
On restructural sheet, related register organizational form, shared mechanism, replacement policy, the partition mechanism of access function resister all needs to carry out corresponding research according to concrete array structure and memory access characteristic, to weigh between low access delay and high hit rate and compromise.By the research of the data memory access path outside pair array, comprise the design of on-chip memory, the cellular construction of looking ahead and reusing, and register file design based on vector scalar has formed the data flow circuit outside array.
The memory access mode of reconfigurable arrays and global register is also more outstanding on the impact of memory access efficiency, for continuously, density data fast, huge on the impact of bandwidth performance.The Join Shape of reconfigurable arrays and global register obviously fetter array to memory access performance, the cross interconnected structure adopting in existing design and ring texture realize the target of access efficiently, meet the memory access demand of low delay, high bandwidth, low-power consumption; Or adopt two-dimentional access mode for accelerating the access of multi-medium data.
As how realized the flexible piecemeal of row vector register and column vector register compared with low-cost, design the array register file of reconfigurable stratification, be still the hot issue of this area research.
Utility model content
Goal of the invention: in order to overcome the deficiencies in the prior art, the utility model provides the array register file structure of a kind of coarseness restructural stratification, solve depositing and transmission problem of array data in restructural computation process, to realize the advantage that improves data variable storage efficiency and restructural calculated performance in array.
Technical scheme: for achieving the above object, the technical solution adopted in the utility model is:
The array register file structure of a kind of coarseness restructural stratification, be used for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit, connect realization by hardware, specifically comprise global register file, local register file and distributed register file:
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the register that during as system control kernel calls reconfigurable arrays, Transfer Parameters uses, the register that can connect as each reconstruction processing unit, has the fan leaves coefficient maximum in reconfigurable arrays simultaneously;
Described local register file: the equal correspondence in each reconstruction processing unit is connected with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast;
Described distributed register file: be connected with reconfigurable arrays, deposit and transmission channel as the data between part reconstruction processing unit in reconfigurable arrays.
Preferably, described global register file, comprises n global register, and the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.
Preferably, described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; Import parameter into when circulation and be greater than m, while exceeding m the global register on top, the parameter having more need to be passed through bus access; 1 global register of bottom is for transition function rreturn value.
Preferably, described local register file, is mainly used in storage lifecycle compared with length and the fixing variable in locus, and the object of its input and output is all its privately owned reconstruction processing unit (corresponding reconstruction processing unit); Described local register can complete the preliminary work of output data to input data in one-period; Writing by the enable bit control in configuration words of described local register, in the time enabling position, it can complete the result of calculation of reconstruction processing unit is write in the local register file of local register in one-period.
Preferably, described distributed register file is made up of the distributed register of arranging by m × n rectangular array, and every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.
Preferably, described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.
Preferably, described distributed register file adopts the data memory access form of many inputs, many outputs, for fear of there is the same distributed register of different reconstruction processing unit access simultaneously, adopts following two kinds of methods to evade:
Method one, by avoid same distributed register being carried out to access in mapping simultaneously;
Method two, in unpredictable multiple reconstruction processing unit simultaneously the same distributed register of access in the situation that, numbering according to reconstruction processing unit in reconfigurable arrays, carry out from big to small priority level division according to number order, the reconstruction processing unit that priority level is high is used for the right writing.
Beneficial effect: the array register file structure of the coarseness restructural stratification that the utility model provides, make depositing and transmit and can carrying out accurately and efficiently of array data in restructural computation process, improve data variable storage efficiency and restructural calculated performance in array.
Brief description of the drawings
Fig. 1 is a kind of structural representation of the present utility model;
Fig. 2 is global register file schematic diagram;
Fig. 3 is local register file schematic diagram;
Fig. 4 is distributed register file schematic diagram;
Fig. 5 is the structural representation of an example of the present utility model;
Fig. 6 is the process flow diagram that the utility model data variable is deposited transmission.
Embodiment
Below in conjunction with accompanying drawing, the utility model is further described.
An array register file structure for coarseness restructural stratification, for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit; As shown in Figure 1, comprise global register file, local register file and distributed register file.
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the parameter not only meeting when checking reconfigurable arrays in system control and calling is transmitted demand, and the register that can connect as each reconstruction processing unit, have the fan leaves coefficient maximum in reconfigurable arrays.
Described global register file, comprises n global register, and the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.Described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; Import parameter into when circulation and be greater than m, while exceeding m the global register on top, the parameter having more need to be passed through bus access; 1 global register of bottom is for transition function rreturn value.
Global register file as shown in Figure 2, has comprised 16 global registers, and wherein 1 global register of 3 of top global registers and bottom can be by all reconstruction processing unit access; The parameter of importing into when circulation is greater than 3, and while exceeding 3 registers on top, the parameter having more need to be passed through bus access; Especially, the global register of bottom is for function return value; Global register adopts restructuring array clock zone and reset domain, supports soft and hardware reset operation.
Described local register file: the equal correspondence in each reconstruction processing unit is designed with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast.
Described local register file, is mainly used in storage lifecycle compared with length and the fixing variable in locus, and the object of its input and output is all its privately owned reconstruction processing unit (corresponding reconstruction processing unit); Described local register can complete the preliminary work of output data to input data in one-period; Writing by the enable bit control in configuration words of described local register, in the time enabling position, it can complete the result of calculation of reconstruction processing unit is write in the local register file of local register in one-period.
As shown in Figure 3, local register only provides 4 local registers of son, local register can in 1 cycle, complete the preliminary work of output data to input data in the time of design.
Described distributed register file: deposit and transmission channel as the data between part reconstruction processing unit in reconfigurable arrays.
Described distributed register file is made up of the distributed register of arranging by m × n rectangular array, and every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.Described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.Described distributed register file adopts the data memory access form of many inputs, many outputs, for fear of there is the same distributed register of different reconstruction processing unit access simultaneously, adopts following two kinds of methods to evade:
Method one, by avoid same distributed register being carried out to access in mapping simultaneously;
Method two, in unpredictable multiple reconstruction processing unit simultaneously the same distributed register of access in the situation that, numbering according to reconstruction processing unit in reconfigurable arrays, carry out from big to small priority level division according to number order, the reconstruction processing unit that priority level is high is used for the right writing.
For instance, in the time being positioned at the data of reconstruction processing unit output of array (i, j) point and need to being delivered to the reconstruction processing unit that is positioned at array (1,1) point, by being positioned at (i, 1) or (1, j) the reconstruction processing unit of position carries out data transmission, is moving T iin the moment, the reconstruction processing unit that is positioned at array (i, j) point writes data 0 of inter-bank register group, at T iin+1 moment, the reconstruction processing unit that is positioned at array (i, 1) point writes the position across column register group 0 by exchanges data instruction by the data in DCR inter-bank register group 0, like this at T iin+2 moment, the reconstruction processing unit that is positioned at array (1,1) point can obtain and be positioned at the data that the reconstruction processing unit of array (i, j) point writes out in across column register group 0.
Restructural as shown in Figure 5 calculates minimum system, has adopted the array register file structure of the restructural stratification of this case proposition.The structure of this system comprises: as the ARM7TDMI processor of system control kernel, reconfigurable arrays, global register file, local register file, as ahb bus and the distributed register file of transmission data.
The ARM7TDMI processor of the advantages such as that selection has is small-sized, quick, low energy consumption, compiler are supported is as kernel, for scheduling and the configuration of control system operation; Global register file is connected by 64bitAHB bus with reconfigurable arrays; Local register file and reconfigurable arrays are interconnected by special access interface, and data bit width is 128bit; Distributed register file and reconfigurable arrays are interconnected by special access interface, and data bit width is 128bit; Reconfigurable arrays contains 4 × 4 reconstruction processing unit, and monocyclic 16 arithmetical operations and logical operation can be supported in each reconstruction processing unit.
The process of depositing and transmitting of reconfigurable arrays data as shown in Figure 6, comprising: transmission request: the instruction that reconfigurable arrays is obtained according to external memory storage, the transmission of required parameter or reconfigurable arrays data; If the exchange of the data and system control kernel of required transmission, by the exchanges data of the realization of global register file and system control kernel; Otherwise determine whether exchanges data in reconfigurable arrays, if exchanges data in reconfigurable arrays realizes depositing and exchanging of data by distributed register file; Otherwise the data that are reconstruction processing unit deposit, carry out data by local register and deposit.Depositing and transmitting according to different situations of reconfigurable data, selects most suitable register file deposit and transmit, and takes full advantage of the resource of register file, thereby has improved data variable storage efficiency and restructural calculated performance.
The above is only preferred implementation of the present utility model; be noted that for those skilled in the art; do not departing under the prerequisite of the utility model principle; can also make some improvements and modifications, these improvements and modifications also should be considered as protection domain of the present utility model.

Claims (6)

1. the array register file structure of coarseness restructural stratification, it is characterized in that: for realizing the parameter transmission between reconfigurable arrays and the system control kernel that m × n rectangular array arranges, the data that simultaneously complete on reconfigurable arrays are deposited and transmit, and comprise global register file, local register file and distributed register file:
Described global register file: as the shared register of connected system control kernel and reconfigurable arrays, the register that during as system control kernel calls reconfigurable arrays, Transfer Parameters uses, the register that can connect as each reconstruction processing unit, has the fan leaves coefficient maximum in reconfigurable arrays simultaneously;
Described local register file: the equal correspondence in each reconstruction processing unit is connected with a local register, described local register is as the privately owned register of the reconstruction processing unit of answering in contrast, and data are only for the reconstruction processing unit of answering in contrast;
Described distributed register file: be connected with reconfigurable arrays, deposit and transmission channel as the data between reconstruction processing unit in reconfigurable arrays.
2. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described global register file, comprise n global register, the data bit width of described global register file is consistent with the data bit width of reconstruction processing unit; Described global register file is as data transmission channel, and for transmitting input parameter and rreturn value, and system control kernel and reconfigurable arrays can carry out access to global register.
3. the array register file structure of coarseness restructural according to claim 2 stratification, it is characterized in that: described global register file and reconfigurable arrays are directly interconnected, concrete methods of realizing is: m global register on design global register file top and 1 global register of bottom adopt that full mesh is interconnected and bus is interconnected, can be by all reconstruction processing unit access, and all the other global registers employing buses are interconnected; 1 global register of bottom is for transition function rreturn value.
4. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described local register file, be mainly used in storage lifecycle compared with length and the fixing variable in locus, the object of its input and output is all its privately owned reconstruction processing unit; Writing by the enable bit control in configuration words of described local register.
5. the array register file structure of coarseness restructural according to claim 1 stratification, it is characterized in that: described distributed register file is made up of the distributed register of arranging by m × n rectangular array, every row register group and every column register group are shared a distributed register, and the position of distributed register and reconstruction processing unit is corresponding one by one; Each reconstruction processing unit can operate two groups of registers, is respectively a line register group and a column register group that correspondence position distributed register is placed in; Each reconstruction processing unit only can operate one group of register at one time, and the read-write operation between multiple reconstruction processing unit is selected by Port Multiplier; Multiple reconstruction processing unit can carry out write operation to inter-bank domain register group simultaneously; Multiple reconstruction processing unit can carry out read operation to same distributed register simultaneously.
6. the array register file structure of coarseness restructural according to claim 5 stratification, it is characterized in that: described cross-domain register group is interconnected or be listed as interconnected mode and realize interconnected with row, when data are write cross-domain register by the reconstruction processing unit that i is capable when being positioned at, j is listed as, being positioned at all reconstruction processing unit that i is capable or j lists can obtain data by cross-domain register.
CN201420060189.9U 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure Withdrawn - After Issue CN203706196U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201420060189.9U CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201420060189.9U CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Publications (1)

Publication Number Publication Date
CN203706196U true CN203706196U (en) 2014-07-09

Family

ID=51056601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201420060189.9U Withdrawn - After Issue CN203706196U (en) 2014-02-10 2014-02-10 Coarse-granularity reconfigurable and layered array register file structure

Country Status (1)

Country Link
CN (1) CN203706196U (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761072A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity reconfigurable hierarchical array register file structure
CN111630487A (en) * 2017-12-22 2020-09-04 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761072A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity reconfigurable hierarchical array register file structure
CN103761072B (en) * 2014-02-10 2016-08-31 东南大学 A kind of array register file structure of coarseness reconfigurable hierarchical
CN111630487A (en) * 2017-12-22 2020-09-04 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing
CN111630487B (en) * 2017-12-22 2023-06-20 阿里巴巴集团控股有限公司 Centralized-distributed hybrid organization of shared memory for neural network processing

Similar Documents

Publication Publication Date Title
CN107590085B (en) A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN103761072A (en) Coarse granularity reconfigurable hierarchical array register file structure
CN105912501B (en) A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN101833441A (en) Parallel vector processing engine structure
CN104571949A (en) Processor for realizing computing and memory integration based on memristor and operation method thereof
CN103744644A (en) Quad-core processor system built in quad-core structure and data switching method thereof
CN105335331A (en) SHA256 realizing method and system based on large-scale coarse-grain reconfigurable processor
CN107506329B (en) A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line
CN107562549B (en) Isomery many-core ASIP framework based on on-chip bus and shared drive
CN104933008A (en) Reconfigurable system and reconfigurable array structure and application of reconfigurable array structure
CN103970720A (en) Embedded reconfigurable system based on large-scale coarse granularity and processing method of system
CN102306141B (en) Method for describing configuration information of dynamic reconfigurable array
CN104317770A (en) Data storage structure and data access method for multiple core processing system
CN102279818A (en) Vector data access and storage control method supporting limited sharing and vector memory
CN108647777A (en) A kind of data mapped system and method for realizing that parallel-convolution calculates
US20180212894A1 (en) Fork transfer of data between multiple agents within a reconfigurable fabric
US20180324112A1 (en) Joining data within a reconfigurable fabric
CN203706196U (en) Coarse-granularity reconfigurable and layered array register file structure
CN103902505A (en) Configurable FFT processor circuit structure based on switching network
CN108874730A (en) A kind of data processor and data processing method
CN104035896B (en) Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system
CN103761213A (en) On-chip array system based on circulating pipeline computation
CN106021171A (en) An SM4-128 secret key extension realization method and system based on a large-scale coarseness reconfigurable processor
CN106155979B (en) A kind of DES algorithm secret key expansion system and extended method based on coarseness reconstruction structure

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
AV01 Patent right actively abandoned

Granted publication date: 20140709

Effective date of abandoning: 20160831

C25 Abandonment of patent right or utility model to avoid double patenting