CN103984672B - Processor cluster structure based on shared register file and global synchronization module - Google Patents
Processor cluster structure based on shared register file and global synchronization module Download PDFInfo
- Publication number
- CN103984672B CN103984672B CN201410197295.6A CN201410197295A CN103984672B CN 103984672 B CN103984672 B CN 103984672B CN 201410197295 A CN201410197295 A CN 201410197295A CN 103984672 B CN103984672 B CN 103984672B
- Authority
- CN
- China
- Prior art keywords
- register file
- signal
- synchronization module
- shared
- bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention belongs to the technical field of designing a high-performance dedicated processor system structure, in particular to a processor cluster structure based on a shared register file and a global synchronization module. The invention provides a concept of the shared register file and a realizing method of the shared register file, and the problem of efficient sharing and quick communication of data between processors in application fields such as a public key cryptography of which an operating numerical digit width is larger is effectively solved; for parameter sharing characteristics in some application algorithms, data which need to be shared can be stored in the shared register file, and thus smaller register expenditure is realized. Meanwhile, the invention also provides a global synchronization module circuit structure, which is used for realizing the efficient synchronization between the registers and realizing sequential communication by matching the shared register file. According to the processor cluster structure based on the shared register file and the global synchronization module, provided by the invention, functions of quick communication between clusters, efficient sharing between the clusters and accurate synchronization between the clusters can be realized.
Description
Technical field
The invention belongs to high-performance application specific processor architecture Design field, a kind of processor clustering architecture based on shared register file and global synchronization module.
Background technology
In recent years, along with the development in some field, the dynamical calculating towards this field is proposed challenge.Traditional universal processor framework is not optimized towards specific area, so being difficulty with dynamical calculating.Such as, in public key cryptography field, generally require the modulo operation realizing long bit wide data, and traditional processor is the lowest in this aspect efficiency, be difficult to meet the demand of some application.So, the processor architecture towards certain neighborhood optimization has been designed to solve the popular approach of relevant issues.
For the algorithm and application in some field, how realizing the high-speed traffic between multiple processor becomes and realizes high performance key technology.But for traditional based on message transmission or share memorizer communication mode for, its time delay is relatively large, it is impossible to realize the internuclear real-time Communication for Power of high speed low delay.How to solve this problem and become challenge.
For the algorithm and application in some field, it has certain parameter sharing characteristic.As public key cryptography field is used for carrying out the parameter etc. of modular multiplication.If the parameter of these long bit wides being copied directly to each processor unit realize sharing, the expense of many parts of bigger memory spaces can be brought undoubtedly.The storage of the shared parameter how realizing low overhead is the key realizing efficient system design.
For traditional multicore architecture based on shared memorizer, it is typically all the stationary problem being realized share and access by software.Owing to software realizes, synchronize to need the more clock cycle, and be difficulty with the internuclear synchronization of Cycle accurate.How to realize the key technology synchronizing also to be to realize high-performance calculation efficiently.
Summary of the invention
In order to overcome above-mentioned challenge, the present invention proposes processor clustering architecture based on shared register file and global synchronization module, it is possible to achieve the function communicating between quick bunch, sharing between efficient bunch, synchronize between accurate bunch.
A kind of based on shared register file and global synchronization module the processor clustering architecture that the present invention provides, by several processor units, a shared register file and a global synchronization module composition;Wherein: privately owned register file is set in each processor unit;It is respectively provided with some read ports and write port on described shared register file, privately owned register file, is connected with write port by read port between processor unit and shared register file, privately owned register file;Described global synchronization module is mainly by its internal sync bit depositor be used for judging whether that synchronizing successful combinational logic circuit is constituted, and described processor unit is connected by the synchronous input end mouth arranged in global synchronization module and global synchronization module;Wherein:
Processor unit sends register file read signal in decoding level and accesses the synchronizing signal of global synchronization module;The lower part bit of register file read signal is by the way of duplication, and the read port being simultaneously sent to share register file and privately owned register file carries out parallel read operation;The high partial bit of register file read signal is selected by selector for the read operation of shared register file exports the read operation output with privately owned register file;Global synchronization module judges to synchronize the most successfully according to synchronizing signal and the synchronous regime depositor thereof of synchronous input end mouth, and synchronized result is sent to each processor unit;
Described processor unit sends register file write data signal, register file writing address signal and register file write enable signal at Write-back stage;What described register file write data signal gave shared register file and privately owned register file by the way of directly duplication writes FPDP;The lower part bit of register file writing address signal, by the way of duplication, is simultaneously sent to share register file and the write port of privately owned register file;Register file write enable signal, selects to be shared register file to perform write operation or privately owned register file is performed write operation according to the highest part bit of writing address signal.
In the present invention, described sync bit depositor is made up of some groups of depositors, is often made up of several bit status depositors in group depositor;One bit status depositor input for certain group synchronous enabled signal and certain sync bit organized certain bit phase and after value;For judging whether that the synchronization bit organized according to certain of combinational logic circuit synchronized and all of status register value calculate, produce the consequential signal whether synchronized;Synchronized result signal exports to corresponding processor unit.
The beneficial effects of the present invention is: what the present invention provided process, and clustering architecture can realize communicates between quick bunch, share between efficient bunch, the function that synchronizes between accurate bunch.
Accompanying drawing explanation
Fig. 1 is the processor clustering architecture with four processors.
Fig. 2 is the connection of register file in decoding level of tradition risc processor.
Fig. 3 is the connection in decoding level of the processor with shared register file and privately owned register file.
Fig. 4 is the connection of processor and synchronization module.
Fig. 5 is the circuit structure of synchronization module.
Detailed description of the invention
With embodiment, the present invention is further elaborated on below in conjunction with the accompanying drawings.
The present invention proposes processor clustering architecture based on shared register file and global synchronization module, and this processor clustering architecture is by several processor units, a shared register file and a global synchronization module composition.Fig. 1 show the cluster unit structure of 4 processors.Connecting as in figure 2 it is shown, and in the present invention connection of processor and shared register file and privately owned register file is as shown in Figure 3 of traditional risc processor and register file.Wherein: each processor unit sends the read signal accessing shared register file in decoding level, and is connected to share the read port of register file;Each processor unit sends the write signal accessing shared register file at Write-back stage, and is connected to share the write port of register file;Shared depositor has several read ports, for connecting the read signal of several processor units;Shared depositor has several write ports, for connecting the write signal of several processor units;In the present invention, the connection of processor and global synchronization module is as shown in Figure 4.Wherein: each processor unit sends the synchronizing signal accessing global synchronization module in decoding level, and is connected to the synchronous input end mouth of global synchronization module;Global synchronization module judges to synchronize the most successfully according to the signal of synchronous input end mouth and the synchronous regime depositor inside it, and synchronized result is sent to each processor unit;
Described shared register file is the register file that several processor units can access simultaneously;Meanwhile, also there is inside each processor unit the privately owned register file that can only be accessed by certain processor unit oneself;Shared register file is structurally similar to privately owned register file, and the reading-writing port simply sharing register file is more than privately owned register file, needs and processor quantity matches;Share depositor and privately owned depositor and be equivalent to the sub-register cell in legacy register on connecting;The connection of processor and privately owned register file and shared register file is as shown in the figure.Wherein: the read port that processor unit can, by the way of duplication, be simultaneously sent to share register file and privately owned register file at the lower part bit decoding the depositor read signal that level sends carries out parallel read operation;Processor unit is selected by selector for exporting the read operation of the read operation of shared register file output and privately owned register file at the highest part bit of the register file read signal that decoding level sends;What the register file write data signal that processor unit sends at Write-back stage can give shared register file and privately owned register file by the way of directly duplication writes FPDP;Processor unit can be simultaneously sent to share register file and the write port of privately owned register file at the lower part bit of the register file writing address signal that Write-back stage sends by the way of duplication;The register file write enable signal that processor unit sends at Write-back stage, can select shared register file performs write operation or performs write operation for privately owned register file according to the highest part bit of writing address signal.
Described global synchronization module is by its internal sync bit depositor and is used for judging whether that synchronizing successful combinational logic circuit is constituted;Its sync bit depositor is made up of some groups of depositors, is often made up of several bit status depositors in group depositor;It is illustrated in figure 5 the shared synchronization module circuit structure with 4 processor units.Wherein: several processor units produce some groups of synchronous enabled signals and sync bit signal in decoding level;One bit status depositor input for certain group synchronous enabled signal and certain sync bit organized certain bit phase and after value;For judging whether that the synchronization bit organized according to certain of combinational logic circuit synchronized and all of status register value calculate, produce result (whether the synchronize success) signal whether synchronized;The main circuit of this combination logic to be constituted by XOR gate, NAND gate with door;The output of all status registers can and after sync bit signal carry out XOR in sequence, then carry out NOT-AND operation in certain sequence, finally by all outputs by by with door realize with and taper to a bit synchronous result;Synchronized result signal may be output to the decoding level of corresponding processor unit.
This processor structure, is verified in the design of public key cryptography processor platform.For based on shared depositor and global synchronization module communication, it is possible to achieve and the time delay (limit of minimum communication delay) that processor pipeline progression is suitable.Existence due to shared depositor, it is to avoid the identical parameters (the parameter bit wide used in cryptography is longer, and storage overhead is bigger) backup on multiple cores, it is achieved thereby that less bunch area.Need to realize the synchronization of how internuclear Cycle accurate simultaneously for some operation, and global synchronization module can provide the hardware synchronization performance of Cycle accurate just.The processor clustering architecture that the present invention proposes can realize communicating between quick bunch, it is Tong Bus with between accurate bunch, such that it is able to solution is towards the design challenge in some field to share between efficient bunch.
Claims (1)
1. a processor clustering architecture based on shared register file and global synchronization module, it is characterised in that: this processor clustering architecture is by several processor units, a shared register file and a global synchronization module composition;Wherein: privately owned register file is set in each processor unit;It is respectively provided with some read ports and write port on described shared register file, privately owned register file, is connected with write port by read port between processor unit and shared register file, privately owned register file;Described global synchronization module is mainly by its internal sync bit depositor be used for judging whether that synchronizing successful combinational logic circuit is constituted, and described processor unit is connected by the synchronous input end mouth arranged in global synchronization module and global synchronization module;Wherein:
Processor unit sends register file read signal in decoding level and accesses the synchronizing signal of global synchronization module;The lower part bit of register file read signal is by the way of duplication, and the read port being simultaneously sent to share register file and privately owned register file carries out parallel read operation;The high partial bit of register file read signal is selected by selector for the read operation of shared register file exports the read operation output with privately owned register file;Global synchronization module judges to synchronize the most successfully according to synchronizing signal and the synchronous regime depositor thereof of synchronous input end mouth, and synchronized result is sent to each processor unit;
Described processor unit sends register file write data signal, register file writing address signal and register file write enable signal at Write-back stage;What described register file write data signal gave shared register file and privately owned register file by the way of directly duplication writes FPDP;The lower part bit of register file writing address signal, by the way of duplication, is simultaneously sent to share register file and the write port of privately owned register file;Register file write enable signal, selects to be shared register file to perform write operation or performs write operation for privately owned register file according to the highest part bit of writing address signal;
Described sync bit depositor is made up of some groups of depositors, is often made up of several bit status depositors in group depositor;One bit status depositor input for certain group synchronous enabled signal and certain sync bit organized certain bit phase and after value;For judging whether that the synchronization bit organized according to certain of combinational logic circuit synchronized and all of status register value calculate, produce the consequential signal whether synchronized;Synchronized result signal exports to corresponding processor unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410197295.6A CN103984672B (en) | 2014-05-12 | 2014-05-12 | Processor cluster structure based on shared register file and global synchronization module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410197295.6A CN103984672B (en) | 2014-05-12 | 2014-05-12 | Processor cluster structure based on shared register file and global synchronization module |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103984672A CN103984672A (en) | 2014-08-13 |
CN103984672B true CN103984672B (en) | 2017-01-11 |
Family
ID=51276650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410197295.6A Active CN103984672B (en) | 2014-05-12 | 2014-05-12 | Processor cluster structure based on shared register file and global synchronization module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103984672B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833439A (en) * | 2010-04-20 | 2010-09-15 | 清华大学 | Parallel computing hardware structure based on separation and combination thought |
CN102141974A (en) * | 2011-04-11 | 2011-08-03 | 复旦大学 | Internuclear communication method of multinuclear processor and circuit structure thereof |
-
2014
- 2014-05-12 CN CN201410197295.6A patent/CN103984672B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833439A (en) * | 2010-04-20 | 2010-09-15 | 清华大学 | Parallel computing hardware structure based on separation and combination thought |
CN102141974A (en) * | 2011-04-11 | 2011-08-03 | 复旦大学 | Internuclear communication method of multinuclear processor and circuit structure thereof |
Non-Patent Citations (1)
Title |
---|
A 65nm 39GOPS/W 24-Core Processor with 11Tb/s/W Packet-Controlled Circuit-Switched Double-Layer Network-on-Chip and Heterogeneous Execution Array;Peng Ou et al;《2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers》;20130221;第56页第1栏第1段第4-12行、第2段、第3段第18-21行,第2栏第2段第4-10行、第14-21行,图3.6.1-3.6.6 * |
Also Published As
Publication number | Publication date |
---|---|
CN103984672A (en) | 2014-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103744644B (en) | The four core processor systems built using four nuclear structures and method for interchanging data | |
CN101329589B (en) | Control system and method of low power consumption read-write register | |
CN108536642A (en) | Big data operation acceleration system and chip | |
TW200949591A (en) | Synchronous to asynchronous logic conversion | |
CN101236774B (en) | Device and method for single-port memory to realize the multi-port storage function | |
Fu et al. | A study on the optimization of blockchain hashing algorithm based on PRCA | |
CN101441616B (en) | Rapid data exchange structure based on register document and management method thereof | |
CN104699641A (en) | EDMA (enhanced direct memory access) controller concurrent control method in multinuclear DSP (digital signal processor) system | |
US8443315B2 (en) | Reset mechanism conversion | |
CN103761072A (en) | Coarse granularity reconfigurable hierarchical array register file structure | |
Nguyen et al. | An efficient FPGA-based database processor for fast database analytics | |
CN103577161A (en) | Big data frequency parallel-processing method | |
WO2020087276A1 (en) | Big data operation acceleration system and chip | |
CN104820659A (en) | Multi-mode dynamic configurable high-speed memory access interface for coarse grain reconfigurable system | |
CN103984672B (en) | Processor cluster structure based on shared register file and global synchronization module | |
CN105825880B (en) | Access control method, device and circuit for DDR controller | |
CN104035896A (en) | Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system | |
Contini et al. | Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication | |
CN101980140B (en) | SSRAM access control system | |
CN208298179U (en) | Big data operation acceleration system and chip | |
CN203706196U (en) | Coarse-granularity reconfigurable and layered array register file structure | |
US20170212861A1 (en) | Clock tree implementation method, system-on-chip and computer storage medium | |
WO2020087275A1 (en) | Method for big data operation acceleration system carrying out operations | |
CN106709187B (en) | Method and device for establishing CPU based on model | |
CN204009891U (en) | The soft core of a kind of sixteen bit embedded chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |