CN103440210A - Register file reading and isolating method controlled by asynchronous clock - Google Patents

Register file reading and isolating method controlled by asynchronous clock Download PDF

Info

Publication number
CN103440210A
CN103440210A CN2013103658314A CN201310365831A CN103440210A CN 103440210 A CN103440210 A CN 103440210A CN 2013103658314 A CN2013103658314 A CN 2013103658314A CN 201310365831 A CN201310365831 A CN 201310365831A CN 103440210 A CN103440210 A CN 103440210A
Authority
CN
China
Prior art keywords
register file
asynchronous clock
read
level
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013103658314A
Other languages
Chinese (zh)
Inventor
虞志益
俞政
于学球
张家杰
曾晓洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN2013103658314A priority Critical patent/CN103440210A/en
Publication of CN103440210A publication Critical patent/CN103440210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of microprocessors and particularly relates to a register file reading and isolating method controlled by an asynchronous clock. The method includes the steps of detecting whether read operation needs to be conducted on a register file or not when having access to a flow level of the register file on the basis of a microprocessor framework with a basic flow line structure, and keeping the read address of the register file unchanged through a partial asynchronous clock network if the read operation conducted on the register file is found to be useless, namely, an instruction does not need to have access to the register file, or the required operation number can be obtained through a feedback network. Due to the fact that the read operation of the register file is read asynchronously, if the address is kept unchanged, the corresponding logic can not be turned over, and useless register file reading power consumption is reduced.

Description

The register file that asynchronous clock is controlled is read partition method
Technical field
The invention belongs to the microprocessor technology field, the register file that is specifically related to a kind of asynchronous clock control is read partition method.
Background technology
Register file is first order storage unit in processor, it is the core component of Modern microprocessor, because the access to register file presents at a high speed, the characteristics of high frequency, the power consumption of the register file that is and power dissipation density are all quite large, to such an extent as to become the energy of microprocessor, consume critical piece and power consumption focus.High energy consumption to microprocessor particularly the microprocessor in Embedded Application field challenge has been proposed, and the power consumption focus more can cause circuit stability and life-span to descend.Therefore, research reduces the register file power consumption very important realistic meaning.
Fig. 1 has showed the microprocessor architecture figure of 6 traditional level production lines.Comprised instruction fetch stage, decode stage, execution level, visit storage level, alignment level and write back level.
In traditional microprocessor architecture design, for register file read do not have special circuit to be isolated, even if found useless read operation in the actual instruction implementation, can not shield the read operation to register file yet, thereby caused unnecessary energy consumption, for this shortcoming, need to register file read detected, once find useless read operation to register file read shielded, reduce power consumption.
Summary of the invention
The object of the present invention is to provide a kind of register file that can reduce to read the register file that the asynchronous clock of power consumption controls and read partition method.
The present invention is by detecting useless register file read operation, then utilize local asynchronous clock network to carry out register file and read maintaining of address, reduce the upset of logic, thereby reduce the power consumption of reading of register file, also reduce the power dissipation density of register file, improve stability and the life-span of circuit simultaneously.
The register file that asynchronous clock provided by the invention is controlled is read partition method, be based on a microprocessor framework that comprises the basic pipeline structure, this microprocessor comprises instruction fetch stage, decode stage, execution level, visit storage level, alignment level and writes back level (as shown in Figure 1), the basic ideas of the inventive method are: when access register heap pipelining-stage, detect and whether need register file is carried out to read operation, if find that read operation to register file is that useless (useless read operation comprises two kinds of situations: instruction itself does not need the access register heap, and needed operand can obtain by feedback network), the asynchronous clock network by a part maintains the address of reading to register file, make it constant, because the read operation of register file is asynchronous reading (in other words, get final product sense data to address), if address remains unchanged, corresponding logic just can not overturn, thereby reduced useless read register heap power consumption.
For this reason, on the basis of above-mentioned microprocessor architecture design, at decode stage, increase a decision logic, as in Fig. 2, two source operands in instruction are Rs and Rt.Take Rs as example, this logic provides 0(when reading Rs " need to read Rs "), otherwise providing 1(, this logic ought not need to read Rs), " the Rs feedback obtains " this logic provides 1(and obtains Rs when feeding back), otherwise this logic provides 0(and obtains Rs when feeding back), and then the Output rusults of these two logics is carried out to an OR operation; Rt and Rs are similar; The access register that whether needs increased as decode stage with this is piled decision logic; Also need to increase the steering logic of a control register heap address input port and local asynchronous clock network at decode stage simultaneously, as in Fig. 2, described steering logic selects 1 data selector and a trigger DFF to form by one 2, the one-input terminal of selector switch is selected the value of the output terminal Q of DFF, and zero-input terminal is selected " Rs/Rt address value ", the control end signal of selector switch is from the output terminal of above-mentioned "or" logic gate, and the selection result of selector switch is input to the data input pin D of DFF; Local clock is added in the input end of clock (input end shown in the triangle of the lower left corner in the DFF frame) of DFF, and as shown in the waveform of Fig. 2 below, local clock and system clock have a fixed phase difference adapted with the design sequential to postpone.So the register file that described asynchronous clock is controlled is read partition method, concrete steps are:
Decision logic by decode stage, whether two source operands (as Rs and Rt) in decision instruction need to obtain from register file, the foundation of judgement is: a kind of situation is that instruction execution itself does not need to use two source operands (as Rs or Rt), another kind of situation is that needed operand can be from execution level, memory access level or alignment level feedback, once any being identified in both of these case, shielded the register file read operation of two source operands (as Rs and Rt);
The concrete measure of described shielding is, by local asynchronous clock network, decode stage isolated into to two sections, by local asynchronous clock, controlled to the address input signal of register file, when the needs mask register is piled, corresponding address signal isolated.
In the present invention, the concrete grammar of described isolation is that the holding circuit based on selector switch-trigger, when needs are isolated, give the value of its previous clock period of address selection of register file, otherwise select the address value provided by decoding scheme.
In the present invention, the method for the full customization of the generation of local asynchronous clock network employing, utilize manual this local clock network of building of chain of inverters, and the local clock of generation and global clock have the stationary phase deviation adapted with the requirement of design sequential.
The inventive method can further be summarized as:
(1) by the decision logic of decode stage, judge whether to be read two source operands, do not need to read to think register file is shielded, the condition of shielding is: (i), instruction itself does not need this operand, (ii), needed operand can execution level, memory access level or alignment level feedback from behind obtain;
(2) if confirm above-mentioned isolation condition establishment, adopt the holding circuit based on selector switch-trigger, when needs are isolated, this holding circuit will remain unchanged to the address value of register file, otherwise selects the address value provided by decoding scheme;
(3) clock of the holding circuit described in triggering (2) is produced by local asynchronous clock network, local asynchronous clock network is obtained by manual customization, adopt the chain of inverters form to build, the clock network of generation and global clock network have the stationary phase deviation adapted with designing requirement.
The inventive method can detect unnecessary register file read operation at decode stage.
With existing framework, compare, the register file that asynchronous clock provided by the invention is controlled is read partition method, can effectively detect and isolate useless register file read operation, thereby power consumption and the power dissipation density of register file have been reduced, on hardware, expense is a small amount of register (in order to build the buffer circuit of selector switch-trigger) and simple logic, and experiment shows that these expenses all can ignore.
The accompanying drawing explanation
Fig. 1 is 6 traditional level production line microprocessor architecture designs.
Fig. 2 is the general frame that the register file of asynchronous clock control is read partition method.
Fig. 3 is the concrete decision logic of useless register file read operation.
Fig. 4 is parametric t _ delay definite who determines local clock's network and system clock deviation.
Fig. 5 is circuit diagram and the simulation result of local asynchronous clock network.
Embodiment
The register file that further describes asynchronous clock control provided by the invention below by example is read partition method.
Fig. 2 has showed that register file that asynchronous clock is controlled reads the general frame of partition method.The structure traditional with Fig. 1 compared, this structure has increased the useless decision logic of reading, thereby determine whether register file is carried out to read operation, increased the asynchronous clock network of a part, controlled the address input signal to register file, if assert that some read operations are invalid, the circuit by selector switch-trigger maintains the address of corresponding register file addresses input port, thereby useless logic upset conductively-closed, avoided useless power consumption expense.
Fig. 3 has showed the judgement of useless register file being read to logic.In actual logical design, mainly to be distinguished two kinds of situations, first instruction execution itself does not need to read register file, in the present invention, we are distinguished the instruction of R type and the instruction of I type, for the instruction of R type, two operand all will obtain from register file, and, for the instruction of I type, due to one of them operand, from immediate, only have an operand Rs to need the access register heap.The second situation is, needed operand can obtain in feedback network from behind, in Fig. 3, provided an example that utilizes feedback network to obtain operand, in brief, as long as present instruction is at decode stage, and its leading instruction is also at execution level, memory access level or alignment level, and the purpose operation note of leading instruction is as the source operation note of current decode stage instruction, the source operation note of present instruction just is considered to from feedforward network, to obtain (in Fig. 3 so, $ 1 is respectively at execution level, memory access level and alignment level are fed back to the decode stage of its subsequent instructions).Both of these case needs only any generation just can mask the read operation to register file.
Fig. 4 has showed the calculating of parametric t _ delay of the time deviation of the asynchronous clock network of definite part and global system clock network.As shown in Figure 4, t_delay has comprised the time delay clk_Q of the clock of pipeline stages inter-register to data, judge whether to need the logical time logic of read register heap, setup Time Created of register file addresses control register, generally in order to guarantee the correct of sequential, need to stay regular hour surplus margin in addition.Therefore, the t_delay=clk_Q here+logic+setup+margin; Determining of each parameter will and require the system performance index realized comprehensively to determine according to actual technology characteristics.
Fig. 5 has showed the local clock's network designed on Fig. 4 basis, can adopt small-scale full custom circuit layout strategy, perhaps directly in layout, not carrying out manual interpolation buffer(on the net table after line is pieced together by phase inverter) target that to reach with the system clock deviation be t_delay, specific design need to be carried out emulation according to actual parameter and be determined.In the present invention, determined t_delay is 300ps, and corresponding Simulation results has also been showed in Fig. 5.

Claims (2)

1. the register file of an asynchronous clock control is read partition method, based on a microprocessor framework that comprises the basic pipeline structure, this microprocessor comprises instruction fetch stage, decode stage, execution level, visit storage level, alignment level and writes back level, it is characterized in that: on the basis of above-mentioned microprocessor architecture design, increase a decision logic at decode stage, for whether needing the logic of access register heap to judge; A steering logic at decode stage control register heap address input port and local asynchronous clock network also are provided simultaneously, so the concrete steps of described method are:
Decision logic by decode stage, whether two source operands in decision instruction need to obtain from register file, the foundation of judgement is: a kind of situation is that instruction execution itself does not need to use two source operands, another kind of situation is that needed source operand can be from execution level, memory access level or alignment level feedback, once any being identified in both of these case shielded the register file read operation of two source operands:
The concrete measure of shielding is, by local asynchronous clock network, decode stage isolated into to two sections, by local asynchronous clock, controlled to the address input signal of register file, when needing the read operation of mask register heap, corresponding address signal isolated.
2. the register file that asynchronous clock according to claim 1 is controlled is read partition method, it is characterized in that the described concrete measure that corresponding address signal is isolated is: adopt the holding circuit based on selector switch-trigger, if above-mentioned isolation condition is set up, this holding circuit remains unchanged to the address of register file, thereby register file can be by read access.
CN2013103658314A 2013-08-21 2013-08-21 Register file reading and isolating method controlled by asynchronous clock Pending CN103440210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013103658314A CN103440210A (en) 2013-08-21 2013-08-21 Register file reading and isolating method controlled by asynchronous clock

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013103658314A CN103440210A (en) 2013-08-21 2013-08-21 Register file reading and isolating method controlled by asynchronous clock

Publications (1)

Publication Number Publication Date
CN103440210A true CN103440210A (en) 2013-12-11

Family

ID=49693901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013103658314A Pending CN103440210A (en) 2013-08-21 2013-08-21 Register file reading and isolating method controlled by asynchronous clock

Country Status (1)

Country Link
CN (1) CN103440210A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112379928A (en) * 2020-11-11 2021-02-19 海光信息技术股份有限公司 Instruction scheduling method and processor comprising instruction scheduling unit
CN113360191A (en) * 2020-03-03 2021-09-07 杭州海康威视数字技术股份有限公司 Driving device of network switching chip
WO2023283886A1 (en) * 2021-07-15 2023-01-19 华为技术有限公司 Register array circuit and method for accessing register array
US11862289B2 (en) 2021-06-11 2024-01-02 International Business Machines Corporation Sum address memory decoded dual-read select register file

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1664775A (en) * 2004-03-03 2005-09-07 浙江大学 Data by-passage technology in digital signal processor
CN101539849A (en) * 2009-04-21 2009-09-23 北京红旗胜利科技发展有限责任公司 Processor and gating method of register

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1664775A (en) * 2004-03-03 2005-09-07 浙江大学 Data by-passage technology in digital signal processor
CN101539849A (en) * 2009-04-21 2009-09-23 北京红旗胜利科技发展有限责任公司 Processor and gating method of register

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENG YU: "A Low Power Register File with Asynchronously Controlled Read-Isolation and Software-Directed Write-Discarding", 《CIRCUITS AND SYSTEMS (ISCAS), 2013 IEEE INTERNATIONAL SYMPOSIUM ON》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360191A (en) * 2020-03-03 2021-09-07 杭州海康威视数字技术股份有限公司 Driving device of network switching chip
CN112379928A (en) * 2020-11-11 2021-02-19 海光信息技术股份有限公司 Instruction scheduling method and processor comprising instruction scheduling unit
CN112379928B (en) * 2020-11-11 2023-04-07 海光信息技术股份有限公司 Instruction scheduling method and processor comprising instruction scheduling unit
US11862289B2 (en) 2021-06-11 2024-01-02 International Business Machines Corporation Sum address memory decoded dual-read select register file
WO2023283886A1 (en) * 2021-07-15 2023-01-19 华为技术有限公司 Register array circuit and method for accessing register array

Similar Documents

Publication Publication Date Title
US8065647B2 (en) Method and system for asynchronous chip design
US9158328B2 (en) Memory array clock gating scheme
US20210089324A1 (en) Controlling the operating speed of stages of an asynchronous pipeline
CN103440210A (en) Register file reading and isolating method controlled by asynchronous clock
US20110283125A1 (en) Automatic clock-gating propagation technique
US20150120268A1 (en) Method and apparatus for simulating a digital circuit
JP4747026B2 (en) Microprocessor
US9104824B1 (en) Power aware retention flop list analysis and modification
US8954904B1 (en) Veryifing low power functionality through RTL transformation
CN104020982B (en) With the efficient branch target buffer for returning to predictive ability
Macko et al. Managing digital-system power at the system level
Takizawa et al. A design support tool set for asynchronous circuits with bundled-data implementation on FPGAs
Reaz et al. A single clock cycle MIPS RISC processor design using VHDL
US8078999B2 (en) Structure for implementing speculative clock gating of digital logic circuits
CN101923386B (en) Method and device for reducing CPU power consumption and low power consumption CPU
Rani et al. Novel design of 32-bit asynchronous (RISC) microprocessor & its implementation on FPGA
Aneesh et al. Design of FPGA based 8-bit RISC controller IP core using VHDL
CN103650346B (en) pipeline power gating technology
Yi et al. 32-bit RISC CPU based on MIPS instruction fetch module design
US7971161B2 (en) Apparatus and method for implementing speculative clock gating of digital logic circuits
Gerber et al. Design of an asynchronous microprocessor using RSFQ-AT
Inoue et al. Variable-duty-cycle scheduling in double-edge-triggered flip-flop-based high-level synthesis
KR101621760B1 (en) A pipeline circuit apparatus having asynchronous clock
Gharehbaghi et al. Specification and formal verification of power gating in processors
Grover Area & Power Optimization of Asynchronous Processor Using Xilinx ISE & Vivado.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131211