CN1469236A - Register stack capable of being reconfigured and its design method - Google Patents

Register stack capable of being reconfigured and its design method Download PDF

Info

Publication number
CN1469236A
CN1469236A CNA021262225A CN02126222A CN1469236A CN 1469236 A CN1469236 A CN 1469236A CN A021262225 A CNA021262225 A CN A021262225A CN 02126222 A CN02126222 A CN 02126222A CN 1469236 A CN1469236 A CN 1469236A
Authority
CN
China
Prior art keywords
register
window
address
operator
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA021262225A
Other languages
Chinese (zh)
Other versions
CN1228711C (en
Inventor
王俊宇
刘大力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Duosi Technology Development Co ltd
Original Assignee
NANSI SCIENCE AND TECHNOLOGY DEVELOPMENT Co Ltd BEIJING
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANSI SCIENCE AND TECHNOLOGY DEVELOPMENT Co Ltd BEIJING filed Critical NANSI SCIENCE AND TECHNOLOGY DEVELOPMENT Co Ltd BEIJING
Priority to CN 02126222 priority Critical patent/CN1228711C/en
Publication of CN1469236A publication Critical patent/CN1469236A/en
Application granted granted Critical
Publication of CN1228711C publication Critical patent/CN1228711C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The restructurable register stack has at least two work modes and includes one register stack address conversion and selection part and one register stack and data input selection part. The register read/write address in command is converted in the register stack address conversion and selection part into physical address of the register stack in corresponding mode for controlling the read/write of the register stack; and the input data in corresponding mode are written into the register stack via the data input selection part. The present invention also provides operator extracting method for compatibility purpose and the compatible reconfigured part design method. The present invention makes it possible to realize compatibility in several kinds of microcomputer systems with a few hardware overhead.

Description

A kind of restructural register file and method for designing thereof
Technical field
The present invention relates to the computer microprocessor structure, and relate more specifically to a kind of computer microprocessor structure and a kind of microprocessor restructural register file with good compatibility with good compatibility.
Background technology
Microprocessor has been obtained swift and violent development in its history in more than 50 years, emerge the microprocessor of various different architecture on the market, application spreads all over various aspects such as daily life, office automation, finance and account, national defense construction, Aero-Space, the development level of microprocessor not only has been related to sciemtifec and technical sphere itself, and is related to national economy.Because historical reasons is different with development in science and technology speed, at present in the general purpose microprocessor field, occupied monopolistic status with the chip product headed by the companies such as Intel, IBM, HP, SUN on market, the software product that is used for multi-purpose computer of the overwhelming majority all is based on above-mentioned architecture and develops.This brings huge pressure for architecture Design person afterwards.Though in the long run, compatibility not necessarily, if but new architecture does not possess compatible characteristics, it can't inherit existing software wealth, especially the most frequently used software product that people were accustomed on the current market, all software all will redesign, and must weaken its market competitiveness like this.Therefore, solve compatibility issue and become one of matter of utmost importance that the new architecture of design faces.
Because microprocessor is the most complicated in all integrated circuit, design cost is the highest, therefore often need years of researches and design, the microprocessor manufacturers of overwhelming majority refusal reveals that any internal work of their product plagiarizes its chip design to prevent other company.Like this, for chip designer afterwards, the solution of compatibility issue is difficulty more.
The scheme that solves compatibility issue in the current chip design mainly contains following a few class:
With Itanium is the general processor of representative owing to be the new product of Intel Company exploitation itself, therefore the strategy that adopts at its initial stage is to utilize different hardware to carry out the instruction of IA-32 (architecture that the X86 series processors adopts) and IA-64 (architecture of Itanium processor adopting) respectively, to realize the instruction-level compatibility;
The general purpose microprocessor that with AMD is representative is pure compatible, be that itself does not design new architecture, but, begin design from the bottom directly according to the instructions and the instruction set of X86 chip, final chip can imitate the former function, thereby guarantees the compatibility of instruction-level.Like this, most design and development can be avoided by reverse-engineering, promptly derive design by product.
With Transmeta, E2K etc. are the general purpose microprocessor of representative, the mode that solves compatibility issue is to finish analysis, the decoding of target instruction target word being flowed (as X86) by software, and the molecular flow of generation VLIW, dynamically target instruction target word is translated on the instruction set of machine self, utilize the parallel characteristics of self hardware to realize the efficient execution of program.The characteristics that when guaranteeing the instruction-level compatibility, keep self.
For new architecture Design person, because the problem of intellecture property and the secured feature of product design, the scheme of the solution compatibility issue of Itanium is obviously inadvisable; The scheme of the solution compatibility issue of AMD is direct imitation, does not relate to self architectural question.And for the microprocessor of above-mentioned architecture, the instruction of every increase of system interface all means the increase of chip-scale, in case and architecture determines then the instruction set realized is also determined thereupon, after chip design finishes, just can not increase instruction again, lifetime of chip product is short, and is also very unfavorable to the compatibility design, because will pay too big cost when wanting the processor of compatible multiple other architecture.
Though the scheme of the solution compatibility issue of Transmeta can keep new architectural feature, it is bigger fully to carry out the loss in efficiency that instruction transformation causes with software.
Thereby, exist a kind of new needs of invention in the prior art with compatible architecture, this architecture keeps own characteristic in the instruction-level compatibility that can realize with higher efficient main flow processor in the market, promptly the software product that designs on the basis of new architecture can obtain the execution of top efficiency.
MISC (macro set computer) architecture with " explicit hardware is parallel " characteristic is for a kind of possible mode that provides is provided, the important feature of this architecture is that operator is the least unit of instruction, and hardware cell directly manifests to software by the operator interface.Because the control granularity of operator is usually less than general RISC instruction, make the dirigibility of instruction design of MISC be much higher than the design of other architecture, and because operator can be formed the variable length macro instruction by grand processing mode such as assembly unit, time-delay, ordering, replacement, thereby can improve the degree of parallelism of instruction execution greatly and carry out efficient.Compare with existing micro-processor architecture, the MISC architecture helps optimizing the exploitation of compiler more, and the realization of user defined commands, makes call instruction can satisfy application demand widely in efficient the execution.But do not possess compatible design in the research in the past of MISC architecture, make its range of application be subjected to certain restriction.In order to solve the compatibility issue of MISC architecture, two kinds of thinkings are arranged, a kind of is direct instruction transformation, a kind of is the instruction transformation of being supported by the hardware reconfigurable design.Both difference are that the latter carries out the compatible command mappings of target to the reconfigurable hardware according to this instruction set design, the high-level efficiency that can utilize hardware to carry out like this reduces the loss in efficiency in the transcode process, utilize the grand processing characteristics of MISC variable length macro instruction further to improve the execution efficient of execution simultaneously, thereby guarantee the efficient compatibility that realizes instruction-level.Thereby adopt instruction transformation by the reconfigurable hardware support to solve the instruction-level compatibling problem to have bigger advantage.The reflection of reconfigurable hardware in instruction is operator for the compatible target design of difference, promptly can control same hardware cell by the operator of same-action not.The core of this method is the design of configurable component.
The restructural register file is one of part of most critical in the above-mentioned reconfigurable hardware.Register file is positioned at the top of computer storage hierarchy, it is the vitals of processor inside, be the important place of exchanges data, because modern advanced computers adopts register-the deposit order structure of type more, the structure of register file and performance are the key factors of its overall performance of decision.The imbalance of microelectronic technique development makes the performance gap between processor and the storer constantly enlarge, and is the Another reason that register file research receives increasingly extensive concern, and reasonably register file design can reduce the visit capacity to storer effectively.The design of MISC restructural register file is the important component part of MISC microprocessor Design, and it has compatible MISC system to round Realization and has great significance.
Summary of the invention
Therefore the objective of the invention is to solve above-mentioned about realizing compatible problem.
To achieve these goals, the invention provides a kind of operator extraction method that is used for compatible purpose, described method is characterised in that and may further comprise the steps:
(1) the compatible instruction set function of target is analyzed;
(2) according to the result of compatible target instruction target word set functional analysis, will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator; Source operand is independently become the route operator, corresponding to the read port of register file; Destination operand is independently become the destination register territory of data operator; The operational design that must control a plurality of parts execution simultaneously is a composition operators;
(3) result who gathers functional analysis according to compatible target instruction target word to small part determines internal path;
(4) determine the quantity of route operator and data operator according to internal path;
(5) determine the Data Source territory of function operator and the Data Source territory of data operator.
To achieve these goals, the present invention also provides a kind of compatible configurable component method for designing that has, and it is characterized in that may further comprise the steps:
(1) carries out hardware design respectively according to the operator set of the compatible target of difference, determine to satisfy hardware resource, annexation, control relation and the sequential relationship of each operator consolidation function respectively;
(2) will carry out formalized description according to the Component Design that the compatible complex of difference draws;
(3) formalized description with parts carries out optimized overlap-add;
To realize that the complete identical same base part of function set operates (OP) stack, and difference in functionality set (serial stack) effectively the time simultaneously, superpose according to following rule:
<resource rule〉resource requirement of satisfying OP set serial stack is the union of finishing all OP needed corresponding resource under different time scales;
<concatenate rule〉during serial stack (circuit of finishing multiple function is described and superposeed, but has only a kind of function effective at synchronization), identical Data Source can merge, and different Data Source parallel arranged changes corresponding gating controlled condition simultaneously;
<control law〉control of satisfying OP serial stack be described as superposeing before the union described of the control of each OP, to the new controlled condition of same operation be old terms or relation;
<sequential rule〉critical path after the OP serial stack for stack before the maximal value of critical path and the delay value sum of the on-off circuit that increases separately of each OP.
(4) change above formalized description into circuit design.
To achieve these goals, the present invention also provides a kind of restructural register file, at least have two kinds of mode of operations, it is characterized in that comprising a register file addresses conversion and alternative pack, a register file and a data input alternative pack, the read-write that the physical address of depositing heap that register read write address process register file addresses conversion in wherein instructing and alternative pack convert associative mode to comes the control register heap, the input data of associative mode are imported alternative pack through data and are written into register file.
To achieve these goals, the present invention also provides in the microprocessor of a compatible system, and a kind of method of work of restructural register file is characterized in that comprising step:
To instruct some low levels input global registers of the register address in the operator to pile according to the working method of the compatible register file of want;
Thereby will instruct the address of corresponding address translation subassembly generation access window register file in the register address Input Address converting member in the operator according to the working method of the compatible register file of want;
In the time that register will be write, also data are write global register heap or window registers heap according to the working mode selection suitable data input of the compatible register file of want and according to writing enable signal accordingly;
When wanting read register,, then export the data of reading from the global register heap if the high address is zero entirely; If the high address is not zero entirely, then export the data of reading from the window registers heap.
Overall design philosophy with compatible micro-processor architecture of the present invention is: the reconfigurable design technology that will optimize technique of compiling and processor combines, utilize the reconfigurable hardware design to support the reconfigurable instruction design, with compatible and efficient execution of realizing target instruction target word is gathered.
For this reason, the invention provides a kind of reconfigurable computing system method for designing.So-called system reconfigurable design is meant that system can reorganize according to different compatible target processor architectures, thereby realizes the function of different compatible target processors.This method for designing comprises:
1) instruction set restructural;
2) memory model restructural;
3) interrupt and sign model restructural;
4) register file restructural;
5) streamline restructural
The restructural of instruction set is the basis of realizing the instruction-level compatibility.The restructural of memory model, interruption sign model, register file, streamline is to the reconfigurable hardware supported of instruction-level.Instruction set reconstruct is to be the steps necessary that realizes that the different application demand is carried out, and the implementation procedure of hardware reconstruct is based on the hardware stack and the optimizing process of theoretical model.Instruction set reconstruct shows as the array mode of different operators on the system interface.Hardware reconstruct shows as dissimilar operators on the architecture interface, guarantee to have only the operator of same type can participate in the instruction transformation process when the architecture of a certain particular type of compatibility by optimizing compiler.
When architecture of the present invention is used for the microprocessor System Design, after the instruction of other architecture is carried out binary code conversion and code compaction processing through the optimization compiler, can be carried out efficiently on reconfigurable hardware, the feasible simultaneously application oriented instruction based on this architecture is designed to possibility.
The present invention be more particularly directed to a kind of order structure, it is characterized in that based on explicit hardware cell control (EHCC) thought with compatibility feature:
1. instruction set is made of order format and operator set;
2. order format comprises three parts at least: form control domain CBFF, operator section control domain CONTROL and operator domain, wherein the form control domain is used for indicating different order format, operator section control domain is corresponding one by one with the operator groove, determines that by operator section control domain operator in the operator groove is encoded to the numbering of concrete operator in the operator set of this operator groove correspondence.
3. operator is the least unit of instruction control, is the control coded representation corresponding to the hardware controllable node in realization, is the reflection of hardware cell on the architecture interface that can finish certain function.The result that each operator is carried out will finish one and have certain function operations.According to the difference of the mode of action, operator is divided into function operator, data operator, route operator and composition operators four classes.Its 26S Proteasome Structure and Function is described as follows: the function operator comprises function control domain, source operand control domain, operand width control domain, is used for the control of functional unit (hardware cell that can finish a feature operation that is made of data path controllable node and execution unit controllable node); The data operator comprises source operand control domain, destination operand control domain and lever piece control domain, is used for the control of data cell (by data path controllable node and the hardware cell that can finish a data storage that constitutes of depositing controllable node); The route operator comprises the source operand control domain, is used for the control of routing unit (hardware cell that can finish a switch control that is made of input data path and on-off circuit merely); Composition operators comprises the function control domain, be used for the control of recombiner unit (hardware cell that can finish a kind of specific function that constitutes by indivisible some functional units or data cell), so-called recombiner unit is meant, so-called " indivisible " is meant that the part controllable node in the unit can not produce the action with meaning when controlled, have only when all controllable node are controlled in the unit, just produce action with definite meaning.
4. according to the purpose of design difference, the operator set is divided into two classes: a class is application-oriented demand, designs voluntarily for efficient solution application problem; One class is in order to realize compatible purpose, carry out according to the function of compatible target instruction set, have the corresponding relation of determining between this class operator coding and the compatible target instruction target word set, any instruction in the promptly compatible target instruction target word set can be expressed as operator in gathering an operator or the assembly unit of a plurality of operators.
5. operator can comprise time-delay, ordering, replace the territory; The territory of wherein delaying time shows that can delay time several cycles of this operator carry out, and the ordering territory shows the order that operator is carried out, and replaces the territory and shows the number of times that this operator need repeat.
The order structure that more than has compatible explicit hardware cell control provides a kind of new way that realizes the instruction-level compatibility.Compatible target instruction set directly is converted to and the corresponding operator combination of hardware cell through the processing of optimizing compiler, the part relevant with machine is optimized and code compaction (assembly unit in the compiler by optimizing then, time-delay, ordering, replacement), form the operator stream of variable length macro instruction word (VLMIW), have efficiently execution on the compatible configurable component.Also new instruction provides convenience according to Application Design for the user based on the instruction method for designing of operator.
Significant advantage of the present invention is: the current function set that can finish of hardware is not only reflected at the system interface, and reflect the situation of hardware resource fully, comprise quantity, application characteristic, annexation of various hardware etc., and method is provided, make the user can realize finishing the flow process of command function based on the ardware feature design.In such structure, hardware basis, comprise that functional unit, register, data path, control module etc. directly are exposed to the user, and on command interface, provide direct control device for the user, make the user finish the function of expectation most effectively by direct control to hardware cell.Consequently, the elementary instruction of instruction set and semantic propagation rule, pragmatic rules, syntax rule are determined when hardware design is determined; Each composition in the instruction set can directly be realized by hardware, carries out the efficient height.
Another significant advantage of the present invention is: be convenient to realize the compatibility to the plurality of target instruction set, the operator that compatible target instruction target word set is extracted according to difference can be distributed in the different operator grooves, is combined into towards the instruction of the compatible target architecture of difference by grand processing rule (time-delay, assembly unit, ordering, replacement).
Another significant advantage of the present invention is: order format can horizontal extension, helps the executed in parallel of a large amount of parallel processing parts;
Another significant advantage of the present invention is: the granularity that the size ratio of operator is instructed usually is little, is convenient to the new instruction of application-oriented demand structure, and structure is flexible, and adaptability is strong.
Another significant advantage of the present invention is: the array mode of operator can change, and promptly can change the time that operator is carried out by means such as time-delay, assembly unit, ordering, replacements, can significantly reduce the shared space of instruction code.
Another fundamental purpose of the present invention is to disclose a kind of operator extraction method that is used for compatible purpose, and described method is characterised in that and may further comprise the steps:
(1) the compatible instruction set functional analysis of target.
(2) will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator; Source operand is independently become the route operator, corresponding to the read port of register file; Destination operand is independently become the destination register territory of data operator; The operational design that must control a plurality of parts execution simultaneously is a composition operators.
(3) design internal path.
(4) determine the quantity of route operator and data operator.
(5) determine the Data Source territory of function operator and the Data Source territory of data operator.
Operator method for designing of the present invention adopts the forward design, promptly according to the design of application demand (compatible target instruction target word) derivation operator, for the design of reconfigurable hardware provides foundation.
Significant advantage of the present invention is: the compatible target instruction target word of operator design considerations is carried out, and the implementation effect of the combination of assurance operator or operator is consistent with compatible target instruction target word.
Another significant advantage of the present invention is: the operator according to compatible target instruction target word design can reconfigure, thereby can construct new instruction.
The number that another significant advantage of the present invention is operator is variable, the quantity that is hardware cell is variable, be convenient to utilize duplicating of hardware cell to increase the degree of parallelism that instruction is carried out, improve the efficient that instruction is carried out, and make this architecture have better extensibility.
Another fundamental purpose of the present invention provides the register window structure that a kind of global register and window registers be separated and a kind of method that above-mentioned register file is controlled is provided.If the visible register number of the system in the register window is 2 a, described register window structure is characterised in that and comprises:
A global register heap comprises 2 bIndividual register, b are natural number, b<a, and the read/write address width of global register heap is the b position;
A window registers heap comprises 2 mIndividual physical register, m is a natural number, for stationary window, must satisfy m=b+1+k, k is a natural number, 2 kFor the number of stationary window,, must satisfy 2 for moving window m〉=2 a-2 b, the read/write address width of window registers heap is the m position;
A window registers heap address converting member according to command function, is that the address translation of a is that the width of access window register file is the physical address of m position with width in the instruction;
An output data alternative pack is selected the output correct data between the output data that the output data and the window registers of global register heap are piled.
The control method of described register file is characterised in that and may further comprise the steps:
(1) reset values of the bottom of stack pointer BOF of the current window pointer CWP of all registers, control stationary window, control moving window is 0 in the register file;
(2), carry out the address and separate and conversion according to the instruction decode mode of operation of specified data as a result.Register window can be divided into stationary window (window in input register, output register, local register number fix) and moving window two classes (input register in the window, output register, local register number can by software set), and principle is distinguished as follows:
(21) stationary window address computation
(211) global register heap read/write address
The low level [b-1:0] of the register address in register file physical address=instruction
(212) window registers stacking yard reason address
(2121) alter mode as CWP is the described mode of step 411 a period of time,
Physical address=(the register address in the instruction-2 b)+{ CWP, 0} m
Wherein (CWP, 0} mThe m bit address that expression obtains the effective value arithmetic shift left of CWP.
(2122) when the alter mode of CWP is the described mode two of step 411,
Physical address=(the register address in the instruction-2 b)+{ CWP Mend, 0} m
{ CWP wherein Mend, 0} mExpression is carried out the effective value supplement of CWP again arithmetic shift left and is got
The m bit address that arrives.
(22) moving window
The low level [b-1:0] of the register address in the address=instruction of global register stacking yard reason
Window registers stacking yard reason address=(register address in the instruction-2 b)+BOF
Wherein BOF is the physical address of first register of current moving window.
(3) finish the register file read-write operation
(31) when the register file write operation is effective, under the control of write address enable signal, have only one between the two effectively to the write operation of global register heap with to the write operation that window registers is piled, Rule of judgment is as follows:
(311) the register write operation is effective, and the register address [a-1:b] in the instruction be complete 0, and is then effective to the write address enable signal of visit global register heap, to window registers pile the write address enable signal invalid;
(312) the register write operation is effective, and the register address [a-1:b] in the instruction is not 0 entirely, and is then effective to the write address enable signal of window registers heap, invalid to the write address enable signal of global register heap;
(32) when the register file read operation is effective, the read operation of global register heap and window registers heap is carried out simultaneously, whether last be that 0 decision condition selects effective sense data to export according to reading an address high position:
(321) register address [a-1:b] in the instruction be complete 0, selects the sense data of global register heap to export;
(322) register address [a-1:b] in the instruction is not 0 entirely, selects the sense data output of window registers heap;
(4) according to the initial value of instruction sum counter, revise current window pointer CWP or current bottom of stack pointer BOF:
(41) the CWP alter mode of stationary window
(411) mode one
(4111) during the SAVE efficient in operation, CWP N+1=CWP n-1
(4112) during the RESTORE efficient in operation, recover the last SAVE and operate it
Preceding CWP, CWP N+1=CWP n+ 1
(411) mode two
(4111) during the SAVE efficient in operation, CWP N+1=CWP n+ 1
(4112) during the RESTORE efficient in operation, recover the last SAVE and operate it
Preceding CWP, CWP N+1=CWP n-1
(42) the BOF alter mode of moving window
(421) during the CALL efficient in operation, BOF N+1=BOF n+ SOL n
(422) during the RETURN efficient in operation, recover before the last CALL operation
BOF,BOF n+1=BOF n-SOL n-1
Original advantage of the present invention is: global register and window registers independently address, and can simplify and realize that the register number is 2 in the window registers heap mThe time register window design, the calculating of window registers address pointer can directly be finished with signless addition, no demand modulo operation has been simplified circuit design.
Another original advantage of the present invention is: this method is applicable to the design of stationary window and moving window, applied range simultaneously.
Another fundamental purpose of the present invention provides a kind of spin register address based on look-up table and generates parts and control method thereof.If register file comprises 2 n(n is a natural number) individual physical register, the size in rotation territory is 2 mSOR doubly, m is a nonnegative integer, and m≤n, SOR=1~s, s are natural number and s≤2 N-m, described address generation parts are characterised in that and comprise:
Rotation base register (RRB) control assembly, rotation base register bit wide is the n position, and reset values is 0, whenever finishes once iterating, and the RRB register value subtracts one, and rotation base register RRB can also be in zero clearing under the instruction control;
The one-level adding unit is finished the add operation of register address (bit wide n position) and RRB register in the instruction;
A hardware consulting table circuit that is made of register or ROM, the rower of this table are that bit wide is the full arrangement of n-m, and the row mark is the SOR territory, and list item is that rower is to row target delivery.
Described control method is characterised in that and comprises the steps:
(1) resets
The reset values of RRB register is 0, and when using register design lookup table circuit, the reset values of list item is the delivery value in the lookup table circuit;
(2) by the rotation territory selection that rotation territory multiple SOR (value 1~s, this value arithmetic shift left m position obtains rotating the territory) in the instruction controls the list item of corresponding respective column is set;
(3) the output valve addition of instruction address and RRB, low m position [m-1:0] is directly as the low m position [m-1:0] that rotates the address among the result of addition, and high n-m bit address [n-1:m] is imported lookup table circuit;
(4) high n-m position [n-1:m] and the SOR with the OPADD of step 3 is that index carries out table lookup operation, with the list item that obtains high n-m bit address [n-1:m] as the rotation address, low m position [m-1:0] combination with addition results in the step 3 becomes final spin register physical address;
(5) RRB successively decreases or clear 0 under instruction control.
(51) identify when effective when the software flow loop branches, RRB subtracts 1 automatically;
(52) instruct when effective as clear RRB, RRB clear 0.
The register rotation is by RAU, and people such as B.R. took the lead in proposing in the research of Cydra5 giant computer in 1989, its objective is the modulo scheduling for support software flowing water, and the general formula of register rotation is: (register address+RRB) mod rotates the territory
Ask the method for spin register physical address to compare with common with modular arithmetic, circuit design structure of the present invention is simple, the efficient height.Above-mentioned formula is changed a little:
(register address+RRB-2 n) mod rotates territory+2 n
Can be implemented in the same register file and guarantee 2 nIndividual register is a static register, does not rotate.It is similar to the above that the address generates parts, only increases definite value 2 nEach once add reducing, the controlling party rule is in full accord.
Original advantage of the present invention is: based on the register rotation of look-up table, can avoid the modulo operation of asking in the computation process of register rotation address, simplify circuit greatly and realize, and can obtain to rotate the address fast.
Another original advantage of the present invention is: the rotation territory can be 2 mMultiple, be a kind of method in common.
Another object of the present invention is to disclose a kind of compatible configurable component method for designing that has, and it is characterized in that may further comprise the steps:
(1) carries out hardware design respectively according to the operator set of the compatible target of difference, determine to satisfy hardware resource, annexation, control relation and the sequential relationship of each operator consolidation function respectively;
(2) will carry out formalized description according to the Component Design that the compatible complex of difference draws;
(3) formalized description with parts superposes;
(4) change above formalized description into circuit design.
The structure of the configurable component that obtains by stack design rule and cluster analysis technology can reconfigure according to dissimilar operator stream, realizes that operator combination that this hardware reconfigures can support the function of the instruction set of different systems.When finishing the instruction-level compatibility, can save hardware resource.
Significant advantage of the present invention is: can carry out conforming specification description to the design with compatible configurable component;
Another significant advantage of the present invention is: can set up unified design rule, be widely used for the design with base part;
Another significant advantage of the present invention is: possess the resource of multiplying that the configurable component design needs, and this resource is distributed, organized and controls;
Another fundamental purpose of the present invention provides a kind of have compatible configurable component, especially a kind of restructural register file.It is characterized in that comprising:
A register file comprises two read ports and a write port at least;
Address selection parts, it is input as the address pointer of realizing the difference in functionality register file operation, and it is output as effective address pointer under the current operation, and the number of OPADD is in importing being the maximal value of finishing the needed address of same function number; And
An input data alternative pack, it is input as the data that write that realize the difference in functionality register file operation, and it is output as and effectively writes data under the current operation, and the number of output data is in importing being the maximal value of finishing the needed data number of same function.
Restructural register file of the present invention has increased address selection parts and input data alternative pack on the basis of general register heap, realize the control of the register file addresses and the input data of difference in functionality demand, just make same register file can satisfy the functional requirement of different operators.
Original advantage of the present invention is: the hardware that the function reconstruct of realization register file needs only is union and two alternative packs for the needed set of hardware of each function of realization, and hardware spending is little;
Another original advantage of the present invention is: the input of address selection and data alternative pack increases and decreases according to the different of compatible target, has good expandability.
Description of drawings
Fig. 1 a is that the microprocessor of AMD type is realized compatible method sketch;
Fig. 1 b is that the microprocessor of Itanium type is realized compatible method sketch;
Fig. 1 c is that the microprocessor of Transmeta type is realized compatible method sketch;
Fig. 1 d is that architecture of the present invention realizes compatible method sketch;
Fig. 2 is an instruction executive mode of utilizing the compatible that the present invention designs;
Fig. 3 is a MISC order structure synoptic diagram;
Fig. 4 a is and the corresponding functional unit synoptic diagram of function operator;
Fig. 4 b is and the corresponding data cell synoptic diagram of data operator;
Fig. 4 c is and the corresponding routing unit synoptic diagram of route operator;
Fig. 4 d is and the corresponding recombiner unit synoptic diagram of composition operators;
Fig. 5 is SPARC instruction and MISC instruction transformation synoptic diagram;
Fig. 6 is the design cycle with compatible configurable component;
Fig. 7 is compatible target instruction target word analysis and operator extraction flow process;
Fig. 8 a is the resource rule of the compatible Component Design of restructural;
Fig. 8 b is the concatenate rule of the compatible Component Design of restructural;
Fig. 8 c is the control law of the compatible Component Design of restructural;
Fig. 8 d is the sequential rule of the compatible Component Design of restructural;
Fig. 9 is the block diagram outside the compatible restructural register parts that has of the design according to the present invention;
Architectural feature when Fig. 9 a is above-mentioned register file as the random read-write register file;
Architectural feature when Fig. 9 b is above-mentioned register file as register window;
Architectural feature when Fig. 9 c is above-mentioned register file as mobile register window and spin register;
Fig. 9 d is that round-robin CTOP is decided in control in the branch instruction, and CEXIT operates implementation;
Fig. 9 e is the indefinite round-robin WTOP of control in the branch instruction, and WEXIT operates implementation;
The register window structural representation that Figure 10 separates with window registers for global register;
Window registers heap address when Figure 10 a is used for fixing window design for the structure with Figure 10 generates block diagram;
Figure 10 b is the address translation enforcement example of the SPARC V9 general-purpose register window of being made up of 8 windows realized according to the described structure of Figure 10 a;
Window registers heap address generating structure synoptic diagram when Figure 10 c is used for the moving window design for the structure with Figure 10;
Figure 10 d is the address translation enforcement example of the stack area of the Itanium general-purpose register of being made up of 128 physical registers that realizes according to the described structure of Figure 10 c;
Figure 11 is the spin register address generation block diagram based on look-up table;
Figure 11 a is that the spin register address based on look-up table that comprises the register file with 3 reading-writing port (two read writes) of 128 64 bit registers generates block diagram;
When Figure 11 b is 8 1-12 times for the rotation territory, the gauge outfit of lookup table circuit and list item;
The spin register address translation block diagram of Figure 11 c on the basis of mobile register window, designing;
Figure 12 is the structural representation of restructural register file;
Figure 12 a is the restructural register file modular structure block diagram with register window, mobile register window and two kinds of mode of operations of spin register;
Figure 12 b is the structural drawing of global register heap among Figure 12 a;
Figure 12 c is address selection parts and a data output alternative pack structural drawing among Figure 12 a;
Figure 12 d is the restructural register file modular structure block diagram with random read-write register file, register window, mobile register window and three kinds of mode of operations of spin register.
Figure 12 e is the address selection modular construction figure among Figure 12 d;
Figure 12 f is input data alternative pack and the output data alternative pack structural drawing among Figure 12 d.
Following with reference to accompanying drawing detailed description the specific embodiment of the present invention.
Fig. 1 a to Fig. 1 d is four kinds of approach that realize compatibility relatively, are described as follows:
Same problem uses the processor of different architecture to handle, and will obtain same result.Its basic skills is with higher level lanquage problem to be described, through the scheduling of operating system and the compilation process of optimization compiler, to change into the instruction (hereinafter to be referred as system) of particular architecture, be that processor chips (hereinafter to be referred as hardware) go up execution at the hardware based on this instruction set then.
Do not considering to same result the N paths is arranged from same problem, that is: under the compatible situation
Problem → system A → hardware A → result;
Problem → system B → system B → result;
······
Problem → system N → system N → result.
Below among each figure hardware B be processor according to the compatible highway route design of difference.
With AMD be representative compatible design route as shown in Figure 1a, the path that its instruction is carried out is:
Problem → system A → hardware B → result.
This is a kind of pure compatible, and itself does not design new architecture, but directly according to the architecture of other processor, designs new hardware, satisfies the demand of other processor architecture.Do that so most design and development are avoided by reverse-engineering, but this Model Innovation deficiency of following in sb's footsteps.
The instruction execution path of microprocessor that with Itanium is representative comprises two paths shown in Fig. 1 b:
Problem → system B → hardware B (comprising hardware A) → result;
Problem → system A → hardware B (comprising hardware A) → result;
Promptly integrated different hardware is carried out the instruction of system A and system B respectively in same chip, utilizes jump instruction to realize between the two conversion.Article one, instruction execution path is based on new architecture B, and the second instruction execution path is used for compatible purpose, because hardware B is integrated hardware A own, so the instruction of system A can directly be carried out on hardware B.This method only is applicable to the product design of same company.
The instruction execution path of microprocessor that with Transmeta is representative comprises two paths shown in Fig. 1 c:
Problem → system B → hardware B → result;
Problem → system A → system B → hardware B → result;
Article one, instruction execution path is based on new architecture B, and the second instruction execution path is used for compatible purpose, earlier problem is converted into the instruction of system A, then by software finish system A and system B between instruction transformation, final result is carried out by hardware B decoding.The architecture Design of this mode can keep the characteristics of oneself when realizing compatible system A, promptly can autonomous Design system B.
The defective of this method is that the cost by the conversion of software code is bigger fully.
The instruction execution path of the compatible architecture of MISC can comprise the N paths shown in Fig. 1 d:
Problem → system B → reconfigurable hardware B → result;
Problem → system A → system B → reconfigurable hardware B → result;
Problem → system C → system B → reconfigurable hardware B → result;
······
Problem → system N → system B → reconfigurable hardware B → result;
Article one, instruction execution path is based on new architecture B, and the N-1 bar instruction execution path of back is used for compatible purpose, and its characteristics are:
1. the design of hardware B is that instruction set and the specification of reference hierarchy A, system B, system N designs, and hardware itself has the restructural characteristic.Compatibility is to realize under the compatible parts with restructural characteristic are supported.
2. be convenient to realize compatibility to multiple different architecture.
The instruction executive mode of the compatible architecture of MISC that the mode shown in employing Fig. 1 d designs as shown in Figure 2.The compatible instruction of target (sourse instruction) becomes the operator assembly unit sequence under certain order format restriction that MISC architecture processes device can be discerned by the instruction transformation program, such instruction is deciphered execution by the hardware with restructural characteristic after process code compaction process.
Introduce the command interface of the compatible architecture of MISC below in conjunction with Fig. 3 a and Fig. 3 b.
Different with the order set of common processor, it is operator (comprising the function operator, data operator, route operator and composition operators) that the minimum of MISC architecture is carried out composition, and each operator is corresponding to a definite operation.Instruction is defined as the set of the operation of a certain particular moment execution.Operator set and operator queueing discipline (order format) are the two big key elements that constitute this order set.
The general morphology of order format comprises SYS, CBFF, CONTROL and four parts of OPERATOR shown in Fig. 3 a, wherein SYS is for keeping the position, and CBFF is order format morphology Control territory, and CONTROL is the operator control domain, and OPERATOR is the operator encoding domain.OPERATOR is divided into several operator sections, is designated as Opi, each operator section can assembly unit certain operator in definite operator set; Corresponding to each operator section, corresponding subformat control domain (being designated as CBCFi) is arranged among the CONTROL, the coding of CBCFi is unique has determined that operator section Opi goes up the operator of assembly unit.Code translator is once accepted an instruction word, carries out according to the rule decoding that coding is formulated.
Operator is assembled into instruction word according to certain rule, but in the process of implementation, because instructions such as time-delay and replacement constitute strategy, the actual instruction variable-length of actual execution of phase weekly.
The notion and the structure of relevant operator are described below.
Some controllable node that will have certain contact on function are divided in together, form a unit, become the directly actuated least unit of instruction, and its coded representation is referred to as operator.
So-called controllable node is meant can be by the directly actuated device of instruction in the hardware, and in register file, controllable node can be an independent read-write register, perhaps a register read/write port.The function of instruction is that the controllable node in the above controlled composition of control is gathered to realize certain semantic function.
According to the difference of the hardware cell of being controlled, operator can be divided into function operator, data operator, route operator and composition operators.Instruction realizes control to each hardware cell by four class operators.
The operator sets definition is as follows: OP=FOP+DOP+ROP+COP
Function operator set FOP is defined as:
FOP={ (fopi, tfopi) }, fopi is the function operator, tfopi is the performance period of this function operator.
<character 〉: if (fopi tfopi) belongs to FOP, then:
Figure A0212622200241
Accri ∈ ACCR, if accri is worth constantly at t and is VALUEO, t+tfopi becomes VALUE1 constantly; VALUE1 is the result that the content of the Data Source register of fopi sign is operated through the functional part of fopi sign.Claim that fopi is relevant to accri.
The function operator is the coded representation corresponding to functional unit (shown in Fig. 4 a) control.Functional unit (FUNCTIONAL CELL) refers to the hardware cell that can finish a feature operation that is made of data path controllable node and execution unit controllable node.
Can realize that the DAU operator that 32/64 integer signed magnitude arithmetic(al)s are operated is an example, relevant controlled node comprises two data source gatings, data width, action type.Operator domain is constructed as follows:
OPDAU<3:0> DAUW<0> RSDAUAx<1:0> RSDAUAy<1:0>
Each territory in the operator is represented operation coding (adding, subtract, ask absolute value, supplement etc.) the operand width (32/64) of computing and the control coding in two operand sources respectively.
Data operator sets definition is: and DOP={ (dopi, tdopi) }, dopi is the data operator, tdopi is the performance period of this data operator.
<character 〉: if (dopi tdopi) belongs to DOP, then: Comri ∈ COMR, if comri is worth constantly at t and is VALUE0, t+tdopi becomes VALUE1 constantly; VALUE1 is the value in the Data Source register of dopi sign, claims that dopi is relevant to comri; Or:
Figure A0212622200243
Accri ∈ ACCR, if accri is worth constantly at t and is VALUE0, t+tdopi becomes VALUE1 constantly, VALUE1 is the value of the functional part of dopi sign, claims that dopi is relevant to accri.
The data operator is the coded representation corresponding to data cell (shown in Fig. 4 b) control, and data cell (DATA CELL) refers to by the data path controllable node and deposits the hardware cell that can finish a data storage that constitutes of controllable node.Data cell can be a fixed width, also can be the register of variable-width; Can be single register, also can be register file.On physical significance, the data operator is corresponding to register-stored parts in the hardware model (write port that comprises single register or register file) and Data Source control.
The data operator M TN1RA of control register group write port for example, this operator has 3 territories, and form is as follows:
MTN1RA(8)
MTN1NO<2:0> RSMTN1RA<2:0> ??MFLD<1:0>
Wherein, MTN1NO represents register coding (MR0-MR7); RSMTN1RA represents first input end mouth Data Source coding; MFLD represents register section coding (high-end, low side, full word, perhaps invalid).
Route operator sets definition is: and ROP={ (ropi, tropi) }, ropi is the route operator, tropi is the execution time (execution time of route operator is generally less than one-period) of this route operator.
<character 〉: if (ropi tropi) belongs to ROP, then:
Figure A0212622200251
Comri ∈ COMR, if comri is worth constantly at t and is VALUE0, t+tropi becomes VALUE1 constantly; VALUE1 is the value in the Data Source register of ropi sign, claims that ropi is relevant to comri; Or:
Figure A0212622200252
Accri ∈ ACCR, if accri is worth constantly at t and is VALUE0, t+tropi becomes VALUE1 constantly, VALUE1 is the value of the functional part of ropi sign, claims that ropi is relevant to accri.The route operator is controlled corresponding to the route in the hardware model.
The route operator is the coded representation corresponding to path unit (shown in Fig. 4 c) control.Path unit (ROUTE CELL) refers to merely by importing the routing operations unit that data path constitutes.
Route operator PATH1 with the Data Source selection operation that can realize four tunnel bus B US0, BUS1, BUS2 and BUS3 is an example, and domain of instruction is constructed as follows:
PATH(12)
RSBUS0<2:0> RSBUS1<2:0> RSBUS2<2:0> RSBUS3<2:0>
Wherein the control of each RS domain representation control bus BUS0, BUS1, BUS2 and BUS3 Data Source gate is encoded.
The composition operators set is defined as: and COP={ (copi, tcopi) }, copi is a composition operators, tcopi is the performance period of this composition operators.
<character 〉: if (copi tcopi) belongs to COP, then:
Figure A0212622200253
Comrj..., comri, comrj... ∈ COMR, if comri, comrj.. is worth constantly at t and is VALUE0i.VALUE0j...., t+tfopi becomes VALUE1i.VALUE1j.... constantly; VALUE1i, VALUE1j... are the result of the operation of copi sign.Claim that copi is relevant to comri, comrj....
Composition operators is the coded representation corresponding to recombiner unit (shown in Fig. 4 d) control.Recombiner unit (COMBINED CELL) refers to the hardware cell that can finish a kind of specific function that is made of indivisible some functional units or data cell, so-called " indivisible " is meant that the part controllable node in the unit can not produce the action with meaning when controlled, have only when all controllable node are controlled in the unit, just produce action with definite meaning.What for example stack manipulation operator M STACK controlled is exactly a recombiner unit, and relevant controlled node comprises registers all on the storehouse, stack pointer source gate and stack pointer register etc.
Below in conjunction with Fig. 5 is example with SPARC, the instruction form before and after the contrast instruction transformation.The order format of SPARC-V9 is 32 among the figure, comprises source operand, destination operand, command function control domain and order format control domain.After changing into the compatible architecture of MISC, the instruction form is order format control domain, operator section control domain and operator coding, and can increase the reservation position.Operator wherein is four classes such as function operator, data operator, route operator and composition operators, corresponds respectively to the control of different hardware unit.
With the add instruction is example, and the instruction form before and after the conversion is as follows:
SPARC assembly instruction: ADD o2%12%o1%
Coded format:
10 Rd Op3 Rs1 I=0 -- Rs2
Realize function: with the operand addition among the register o2,12, the result deposits register o1. in
Instruction form after the conversion is as follows:
The MISC assembly instruction:
ADD?RPORT1,RPORT2||RPORT1?L1||RPORT?L2||WPORT?AUDD
The coding form:
SYS CBFF CBCF1 ???··· CBCFn DAU PATH1 PATH2 RFW1 ????···
Wherein SYS is for keeping the position.
Realize function: finish an add operation, two register file read operations and a register file write operation, write operator RFW1 by addition operator DAU, route operator PATH1 and PATH1 and register file respectively and control.Like this, on command interface, what this instruction reflected is totalizer, the parallel control of register file read port 1, register read port 2 and register file write port 1.This design philosophy is called as " explicit hardware cell control (EHCC) "
Fig. 6 is the design cycle with compatible configurable component;
If compatible complex is system A, system B, system N, then the design process of configurable component is:
1. released the formalized description of these parts by different systems, process is as follows:
System A → parts dependent instruction analyzes A → operator extraction A → determine hardware resource A → formalized description A;
System B → parts dependent instruction analyzes B → operator extraction B → determine hardware resource B → formalized description B;
······
System N → parts dependent instruction analyzes N → operator extraction N → determine hardware resource N → formalized description N;
Above process can executed in parallel
2. will carry out the serial stack according to the component form description that different systems draw;
3. the stack design rule according to theoretical model carries out design optimization;
4. finally determine the hardware configuration of configurable component.
Fig. 7 is compatible target instruction target word analysis and operator extraction flow process;
Process is as follows:
1. analyze the sourse instruction function, will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator;
2. source operand is independently become the route operator, corresponding to the read port of register file;
3. destination operand is independently become the destination register territory of data operator;
4. according to function, reorganize internal path, form the Data Source territory of function operator and the Data Source territory of data operator, and finally determine the quantity of route operator and data operator;
5. the operational design that must control a plurality of parts execution simultaneously is a composition operators;
6. carry out the operator assembly unit, investigate whether to realize the function of all sourse instructions, if can not realize then change the 1st going on foot, if can realize then the operator extraction end.
Introduce model description and design rule in the components designing below.
In the MISC architecture, the structure of parts, function and use interface can be described by a theoretical model (claiming the Component Design model at this), and a Component Design model model is defined as a five-tuple, that is:
M=(OP,E,C,Ctrl,T)
This five elements that designs a model is:
OP: refer to the operative relationship set, the function that the reflection parts are realized.With the OP operational set is that the basis is designed, and can obtain following E, C, Ctrl and T;
E: refer to the set of resource, as hardware component such as data path, memory bank, logic, comparison, condition, sign, computing and support repetitions, judgement etc. to control the hardware component of behaviors.The realization of OP demand is supported in the set of E;
Ctrl: accuse the system set of relationship, when realizing the OP demand process, to architecture dynamic organization, Ctrl comprises reference mark and steering logic by control relation Ctrl.The set of Ctrl is supported in the dynamic control that realizes the OP demand under the condition of definite resource E and time T.Particularly the controlling element of MISC functional part comprises operator OPERATOR, sign IDS and the steering logic that is produced the controllable node control signal by operator and id signal;
T: refer to the set that is connected with resource, operation, control and the optimization of minimum time relation performance; T is a chronomere, and the OP demand is supported in the set of T.
And annexation set C divides at preceding 4 set, for:
C=Cop+Ce+Ct+Cctrl
Wherein, Cop is the set of behavior annexation, and Ce is the annexation set of hardware cell, and Ct is a time composition annexation, and Cctrl is control composition annexation.
Based on the above-mentioned five-tuple that designs a model, any one behavior or a behavior set (note is made Bi) can be expressed as from 4 aspects:
OP (Bi): the power function of Bi, the process that define behavior Bi implements;
E (Bi): the resource function of Bi.Hardware resource that define behavior Bi uses in implementation process and quantity thereof;
CTRL (Bi): the control function of Bi, define behavior Bi in implementation process to the control of hardware cell.
T (Bi): the function of time of Bi, the time that define behavior Bi implementation process is used;
Procreation property and stipulations according to behavior have:
OP(Bi)=op1&op2&..&opn,opi∈OP,&∈Cop
E(Bi)=e1&e2&...&en,ei∈E,&∈Ce
CTRL(Bi)=ctr11&ctr12&...&ctrln,ctrli∈CTRL,&∈Cctrl
T(Bi)=t1&t?2&....&tn,ti∈E,&∈Ct
More than the formalized description of each composition be described as follows.For sake of convenience, Cop, Ce, Ct, Cctrl are combined explanation with OP, E, T, CTRL respectively.
1, minimum operation behavior set OP and annexation Cop thereof
1)OP={Opb,+}
OPb: first operational set.Definition unit is operating as minimum microoperation, is the mini-components that function is divided, and is indivisible, as the assignment of a register, and an add operation etc., its descriptor is μ OP.Any one operation OP can be decomposed into the combination of each different μ OP constantly, can repeat between the μ OP, but orthogonal between the different μ OP.In the designing a model of register file, unit is operating as assign operation (with " * " expression), is that by hardware cell connects the behavior that realizes, any register manipulation can be described with operation of assignment unit and stack (representing with "+") thereof.
2) Cop={|, || ,=.., { * } }, the serial of expression operation respectively, walk abreast, select, repeat etc.;
" b1|b2 ": promptly in a certain concrete moment, has only one among behavior b1 and the behavior b2 effectively, b1, b2 ∈ OPR, be a kind of repellency " or " relation, it should be noted that " | " symbol only is used in when describing concerning between the operation to use, when describing the sequence of operation, serial operation is to distinguish with different markers.b1,b2∈OP。
" b1||b2 " represents executed in parallel.Promptly in a certain concrete moment, behavior b1 and behavior b2 be the section complete operation at one time, b1, b2 ∈ OP.
" b1=>b2 b3 " expression condition is carried out.Behavior b1 carries out earlier, and when behavior b1 execution result is a true time, behavior b2 carries out, otherwise act of execution b3.b1,b2,b3∈OP。
" b1*b2} " represent to repeat.Behavior b1 carries out earlier, if behavior b1 execution result be true, and act of execution b2 then, otherwise end.Repeat this process, till the execution result of behavior b1 is vacation, b1, b2 ∈ OP.
2, minimal hardware unit set E and annexation thereof set Ce
1) E={e1, e2 ... en}, ei are a hardware cell.
The type of hardware cell comprises:
(1) path unit set P
Refer to constitute the abstract Data Source control element of register, with the MUX is example, when it links to each other with memory body, be expressed as name (capitalization) that P adds memory body (or bus) and the one group of line name that is comprised by bracket [], the part of being separated by symbol " | " in the bracket is for selecting control signal.PE[a for example, b, c, d|Mr] represent that four select one selection control, the whereabouts of data is memory body E, and four circuit-switched data source is respectively memory body A, B, C, D, and Mr is the gating control signal, the output line name of selecting for the multichannel that links to each other with memory body can be expressed as corresponding small letter form pe, can directly be represented by the small letter form of bus name for the output line name that the multichannel that links to each other with bus is selected.
(2) mnemon set M
Refer to constitute the abstract data storage elements of storage/register, memory body (register) name is represented with capitalization.As A (La), B (Lb), C (Lc), R1 (Lr1), R2 (Lr2), R3 (Lr3) ..., the signal with the L beginning in the memory body bracket is a latch control signal;
(3) arithmetic operation unit set A U refers to finish the parts of certain calculation function;
(4) arithmetic logic unit is gathered L, refers to finish the parts of certain logic function;
(5) judging unit is gathered J, refers to finish the parts of comparison arbitration functions;
(6) branch units set B R refers to finish the parts of branch operation;
2) Ce={ ,+, using in order and using simultaneously of resource represented respectively
Eiej: and if only if ei, there is a k in ej ∈ E, ei[out] and ej[in_k] between have an on line;
Ei+ej: and if only if ei, there is not a k in ej ∈ E, ei[out] and ej[in_k] between have an on line;
Connection between the memory body is set up by line, and the output line name (NET name) of definition configurable component is and the identical small letter form of capitalization of representing memory body.For example adder, fau, a, b, c, r1<15:0〉be used for representing totalizer ADDER respectively, floating-point adder FAU, register A, B, C and R1<15:0〉the output line name, wherein<and 15:0〉the expression bit wide, when a line is when being formed by a plurality of different signal combination, with, expression, for example r1<15:8 〉, r0<7:0〉} the new line that combines by the output line of R1 and two registers of R0 of expression, r1<15:8 wherein〉be its most-significant byte, r0<7:0〉be its least-significant byte.
3, minimum time composition set T and annexation Ct thereof
1) T=positive number
2) Ct={max ,+, time maximal value and time addition are got in expression respectively.Carry out if the OP operative relationship is serial, then time relationship is the time addition; If the OP operative relationship is an executed in parallel, then time relationship is the maximal value of the time of getting.
4. minimum is controlled to branch set CTRL and annexation Cctrl thereof
1)CTRL={Operator,IDS,CtrlL}
(1) Operator: the operation operator set of expression configurable component, because configurable component can be realized the function that parts had of different architecture, therefore same parts also are the operational set that realizes above-mentioned functions using on the interface;
(2) IDS: represent the sign set relevant with configurable component, it is the set of the sign that compatible target produced of configurable component;
(3) CtrlL: expression is to the steering logic of each controllable node, steering logic had both comprised sequential circuit such as counter, latch, comprise combinational logic such as code translator etc. again, its input is operator and sign and system signals such as clock and look-at-me, is output as the coding on each controllable node.
Operator and id signal all use corresponding capitalization form to represent, the register controllable node in the steering logic by one group of path signal and latch signal to (M Ri, L Ri) represent M Ri, L RiThe generation logic describe by Boolean algebra, for the path controllable node by M PRepresent M PThe generation logic describe by Boolean algebra.
2) Cctrl={ ,+, order of representation control and control respectively simultaneously.
When carrying out the configurable component design, there is following design rule:
1. (Fig. 8 a): the resource requirement of satisfying OP set serial stack is a union of finishing all OP needed corresponding resource under different time scales to reconfigurable design resource rule.
2. reconfigurable design concatenate rule (Fig. 8 b): (circuit of finishing multiple function is described and is superposeed in the serial stack, but have only a kind of function effective at synchronization) time, identical Data Source can merge, and different Data Source parallel arranged changes corresponding gating controlled condition simultaneously.
3. reconfigurable design control law (Fig. 8 c): the union that the control of each OP was described before the control of satisfying OP serial stack was described as superposeing, to the new controlled condition of same operation be old terms or relation.
4. reconfigurable design sequential rule (Fig. 8 d): the critical path after the OP serial stack for stack before the maximal value of critical path and the delay value sum of the on-off circuit that increases separately of each OP.
Be the method for designing that example is introduced configurable component with restructural register file parts below.
Definition restructural register file is so a kind of register file, when this register file of dissimilar instruction access in the use specified scope, it can adapt to the variation of instruction and change self structure, makes it to show the ardware feature with the instruction same type of being visited.
The reconfigurable design of register file is that the design optimization method by register file is realized on the basis of register reconfigurable design theoretical model.The design optimization of definition register heap is register manipulation set (OPSET R) at an optimum circuit (E R, C R, CTRL R) on realization.The OP to the effect that of register file design optimization research RThe principle of optimality in stack, the stepwise refinement process, consequently design optimization regular collection---RULE_DESIGN.Register file reconfigurable design method is a kind of forward design, and its core is to analyze according to demand to determine the operator set, and this need have enough understandings to this hardware configuration.
Fig. 9 is the outer block diagram of restructural register file, this register file is to have 4 read port and 4 are formed by stacking by hardware on the basis of the random read-write register file of write port at random at random common, it can finish the read-write operation of common random read-write register file, the register window operation of SPARC-V9 general-purpose register, the mobile register window of Itanium general-purpose register and register rotary manipulation, it is input as the Data Source of 4 write ports, with finish register file random read-write operation, the control model of register window operation and mobile register window and register rotary manipulation is output as data and the write conflict sign and the window overflow indication of 4 read ports.The generation of write conflict sign is former because the destination address of two write operations is identical in the above-mentioned sign; The window overflow indication comprises overflow and underflow, when window is expired, carry out the SAVE operation again and will produce the overflow sign, when window sky, carry out the RESTORE operation again and will produce the underflow sign, after window overflows, by self-trapping handling procedure control register heap and memory transactions data, up to finishing desired operation.
Feature when Fig. 9 a is used as the random read-write register file for the restructural register file on the system interface.When as the random read-write register file, the restructural register file sees to have 4 read ports and 4 write ports on the system interface, and each read/write port can be visited 128 registers.Control to reading-writing port is controlled by 8 operators such as RANDOMPATH1, RANDOMPATH2, RANDOMPATH3, RANDOMPATH4, RANDOMRFW1, RANDOMRFW2, RANDOMRFW3, RANDOMRFW4.
RANDOMPATHi (I=1-4) operator is four route operators, and the read operation at random of 4 read ports of control is that example is described below with RANDOMPATH1:
1) operator form: comprise data source territory RSRANDOM1<6:0 〉, by first at random read port control from the register file that comprises 128 registers, select data to read;
2) assembler syntax: RANDRD1<Data Source 〉;
Wherein, Data Source is R0-R127.
3) operation is described:
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint: cooperate function operator, data operator and composition operators.
RANDOMRFWi (i=1-4) is four data operators, and four of control register heap write ports at random respectively are that example is described below with RANDOMRFW1:
1) operator form
RDRANDO1<6:0> RSRFRANDW1<1:0>
RDRANDO1<6:0 wherein〉be the destination operand address, RSRFRANDW1<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD1, functional part result bus AUDD, the gating of several immediately IMMD and storage port MD0.
2) assembler syntax: RANDWR1<destination register〉<Data Source 〉
Wherein, destination register is R0-R127, and Data Source is PRD1, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore cooperates the RANDOMRFW1 operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
Write operator at random and act on simultaneously when two, and the destination register territory is when identical, it is unusual to produce write conflict.
Feature when Fig. 9 b is used as register window for the restructural register file on the system interface, when as register window, the restructural register file is seen on the architecture interface, is a loop stack that is made of 8 windows, and its operation meets the standard of SPARC V9.
The SPARC-V9 standard is 64 bit processors in the SPARC series, by the SPARC architecture council of SPARCInternational in issue in 1993.Concerning the SPARC processor, whenever the visible general-purpose register of user all is 32, and wherein R0-R7 is global register (Globals), R[0] be complete 0, read-only; R8-R15 is output register (outs), and R16-R23 is local register (locals); R24-R31 is input register (ins).The number of the general-purpose register of SPARC is relevant with realization, can not wait from 64 to 528, corresponding two groups of global registers are with 3 to 32 registers group of comprising 16 registers relevant with machine, registers group overlaps register window, 64 of register lengths.The input register of each register window and output register are overlapping with two adjacent register windows respectively, the window slogan is that the input register of output register (to the number of physical registers NWINDOWS delivery in the register window) and current window of register window of CWP-1 (CWP is the current window pointer) is overlapping, the output register of current window and window slogan are the input register overlapping (to the number of physical registers NWINDOWS delivery in the register window) of the register window of CWP+1, and local register is unique to each register window.The window number that actual software can be used lacks 1 than hard-wired window number, because the output of last register window will be washed out valid data with the input of a oldest register window is overlapping.Invocation of procedure instruction (CALL and JMPL) does not change CWP, and process can be called and not change window.
The control interface of register window is that the instruction analysis by SPARC-V9 obtains.The analytic process of instruction is as follows.
The SPARC-V9 instruction set amounts to and comprises 135 RISC instructions, 32 of instruction lengths, and order format is divided 4 classes, and the instruction of every kind of form is divided into different instruction forms according to the value difference of each control domain, has 31 kinds of different instruction forms.Function difference according to instruction can be divided into following a few class with the SPARC instruction: memory reference order, storage synchronic command, integer arithmetic instruction, transfer-control instruction, condition assignment directive, register window supervisory instruction, status register access instruction, privileged register access instruction, floating-point operation instruction, instruction and the reserve statement relevant with realization.
SPARC-V9 general-purpose register dependent instruction divides two big classes:
First kind instruction is as memory access, integer arithmetic, transfer control, condition assignment, status register visit etc., only use the common read-write capability of register window, be the read-write of corresponding current active window (32 visible registers of user), more single to the operating function of register file;
The instruction of second class is finished the control to window and state thereof for the register window supervisory instruction.Be described below respectively:
1.SAVE and RESTORE instruction
1) assembler syntax:
save?reg(rs1),reg_or_imm,reg(rd)
restore?reg(rs1),reg_or_imm,reg(rd)
2) instruction form
10 Rd Op3 Rs1 I=0 --- Rs2
31???30 29???25 24???19 18???14 13 12???5 4????0
Perhaps,
10 Rd Op3 Rs1 I=1 Simm13
31???30 29????25 24????19 18????14 13 12????0
3) implementation
The SAVE instruction provides a routine that uses new register window to carry out.The output register OUT of old window becomes the input register IN of new window, and the value that comprises among the OUT of new window and the local register LOCAL is 0 or the value of the process carried out, and what promptly this process was seen is a clean window; The register window that last SAVE instruction that the RESTORE instruction recovers to be carried out by active procedure is preserved.Input register in the old window becomes the output register in the new window, and input in the new window and local register comprise the value of previous window.
When not producing SPILL/FILL when self-trapping, the effect of SAVE and RESTORE instruction is equivalent to add instruction, just their source operand r (rs1) and/or r (rs2) read from old window the window of original current window pointer CWP addressing (promptly by), and addition results writes the r (rd) window of new CWP addressing (promptly by) of new window.To make a save register window number register CANSAVE subtract one but carry out the SAVE instruction, and can recover register window number CANRESTORE and increase one; Carry out the RESTORE instruction and will make register CANRESTORE subtract one, register CANSAVE increases one.
4) unusual
If CANSAVE=0 A., the execution of SAVE instruction will cause WINDOW_SPILL unusual;
If CANSAVE ≠ 0 B., but clean window number is 0, promptly
(CLEANWIN-CANRESTORE)=0 o'clock, the execution of SAVE instruction will cause
WINDOW_CLEAN is unusual;
If CANRESTORE=0 C., the execution of RESTORE instruction will cause that WINDOW_FILL is unusual.
2.SAVED and RESTORED instruction
1) assembler syntax: SAVED, RESTORED
2) instruction form
10 Fcn 110001 ------------
31????30 29????25 24????19 ?18?????0
3) implementation:
The execution of SAVED instruction makes CANSAVE increase one, if OTHERWIN is 0, CANRESTORE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one, and the SAVED instruction can be used for representing that the SPILL of a window successfully finishes by the self-trapping controller of SPILL;
The RESTORED instruction makes CANRESTORE increase one, if OTERWIN is 0, CANSAVE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one.In addition, if CLEANWIN is not equal to NWINDOWS, the RESTORED instruction will make CLEANWIN increase one.RESTORED instruction can be represented that a window is by FILL successfully by the self-trapping controller of FILL.
3.FLUSHW instruction
1) assembler syntax: FLUSHW
2) instruction form
10 ------------- Op3 ----- I=0 ------------
31????30 29????25 24????19 18????14 13 12?????0
3) implementation: when the register window outside any one current window comprises valid data, the execution of FLUSHW instruction will be self-trapping by repeating SPILL, make that all valid windows beyond the current window are all spilt into storer.The register window number that comprises valid data calculates with following formula: NWINDOWS-2-CANSAVE, if result of calculation is 0, FLUSHW is invalid, is equivalent to a blank operation.
According to above-mentioned instruction analysis, the operator of design restructural register file correspondence as register window the time is as follows.For for simplicity, the reading-writing port of only considering register file is the primary demand of 3 (two read writes).
1. read operator: WINPATH1<4:0 〉; WINPATH2<4:0 〉
1) operator form: comprise RSWIN1NO<4:0 respectively〉and RSWIN2NO<4:0 two territories, be used to control two operations of reading port, corresponding to RS1 and the RS2 territory in the instruction;
2) assembler syntax:
WINRD1<source-register 〉
WINRD2<source-register 〉
As<source-register〉the register manipulation number encoder as follows:
Coding Operational character Coding Operational character Coding Operational character Coding Operational character
00000 G0 01000 00 10000 L0 11000 I0
00001 G1 01001 01 10001 L1 11001 I1
00010 G2 01010 02 10010 L2 11010 I2
00011 G3 01011 03 10011 L3 11011 I3
00100 G4 01100 04 10100 L4 11100 I4
00101 G5 01101 05 10101 L5 11101 I5
00110 G6 01110 06 10110 L6 11110 I6
00111 G7 01111 07 10111 L7 11111 I7
Wherein, G0-G7 represents global register (Globle), and 00-07 represents output register (Out), and L0-L7 represents local register (Local), and I0-I7 represents input register (In).
3) operation is described
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint
Cooperate the route operator, should use other function operator, data operator or composition operators.
2. write operator: WINRFW<6:0 〉, the write operation under the control Window state.
1) operator form:
RDWN0<4:0> RSRFW<1:0>
RDWNO<4:0 wherein〉be the destination register address, corresponding to the RD territory in the instruction, RSRFW<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD2, functional part result bus AUDD, the gating of several immediately IMMD and storage port MD0.
2) assembler syntax: WINWR<destination register〉<Data Source 〉
Wherein, the operational character of destination register is identical with the source-register of reading operator with geocoding, and Data Source then is PRD2, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore, cooperates the WINRFW operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
The G0 perseverance is 0, and when destination operand was G0, it was invalid to operate.
3.DAU<5:0〉operator
1) operator function
Finish plus-minus method, full add subtraction, SAVE and the RESTORE of 64 integers.Situation when this paper only discusses this operator and is used for SAVE and RESTORE operation.
2) assembler syntax
SAVE????reg rs1,reg_or_imm,reg rd
RESTORE?reg rs1,reg_or_imm,reg rd
3) DAU<5:0〉the operator form:
OPDAU<3:0> RSDAUx<0> RSDAUx<0>
As OPDAU<3:0 〉=1111 the time, carry out the SAVE operation; As OPDAU<3:0 〉=1101 the time, carry out the RESTORE operation.
4) implementation is described:
DAU is the monocycle operator.At first, carry out the operand gating, carry out 64 additive operations according to operational code then, when being used for SAVE and RESTORE, source operand r (rs1) and/or r (rs2) read from old window the window of original current window pointer CWP addressing (promptly by), and addition results writes the r (rd) window of new CWP addressing (promptly by) of new window, produces error identification or other id signals simultaneously, and whether decision revises the CCR marker register according to coding.
SAVE operation provides one to use routine that new register window carries out (according to the definition of SPARC V9 architecture, CWP added 1 when SAVE was effective), the register window (according to the definition of SPARC V9 architecture, CWP subtracted 1 when RESTORE was effective) that last SAVE instruction that the RESTORE operation recovery is carried out by active procedure is preserved.SAVE operation simultaneously and RESTORE operation will be revised status register CANSAVE and CANRESTORE.CANSAVE is used to write down and is positioned at the register window number that is not used behind the CWP, and CANRESTORE is used for being recorded in the register window number that has been used by present procedure before the CWP.The SAVE operation makes register CANSAVE subtract one, and register CANRESTORE increases one, and the reset values of CANSAVE subtracts 2 (for current window and overlaid windows) for the physical window number, when the CANSAVE register is 0, carries out the SAVE operation and will cause the window overflow; The RESTORE operation makes register CANSAVE increase one, and register CANRESTORE subtracts one, and the reset values of CANRESTORE is 0, when the value of CANRESTORE register is 0, carries out the RESTORE operation and will produce the window underflow.
4.OPWIN<1:0〉operator
1) operator function: the management window state, carry out SAVED, RESTORED and FLUSHW operation.
2) assembler syntax:
SAVED
RESTORED
FLUSHWIN
3) OPWIN<1:0〉the operator form:
OPWIN<1:0 〉, it is as follows to encode,
OPWIN<1:0> Operation OPWIN<1:0> Operation
00 SAVED 10 FLUSHW
01 RESTORED 11 Keep
4) implementation is described:
Carrying out the SAVED operation makes CANSAVE increase one; If status register OTHERWIN is 0, CANRESTORE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one, and wherein OTHERWIN is the effective value that contains the address space outside the space, current address.
Carrying out the RESTORED operation makes CANRESTORE increase one; If OTERWIN is 0, CANSAVE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one.In addition, if status register CLEANWIN is not equal to NWINDOWS (for the register window sum of physics realization), the RESTORED instruction will make CLEANWIN increase one, and wherein the CLEANWIN indication can not produced the unusual register window number of CLEAN_WIN by SAVE instruction use.
Carry out the situation of FLUSHW operation and divide two kinds: when NWINDOWS-2-CANSAVE is not 0, it is self-trapping to produce a SPILL, by self-trapping handling procedure control, after finishing the overflowing an of window, will re-execute the FLUSHW operation, be spilt into storer up to all register windows except that the current active window; When NWINDOWS-2-CANSAVE was 0, FLUSHW was equivalent to a blank operation (NOP).
Feature when Fig. 9 c is used as mobile register window and spin register for the restructural register file on the system interface.When as mobile register window and spin register, its operation meets the working specification of Itanium architecture.
The Itanium general-purpose register comprises 128 register GR0-GR127, to the program of all authorities all as seen, each register is 65, most significant digit is NAT (Not a thing) position, be used for the predicted anomaly sign, represent that when NAT is 1 register comprises a delay abnormality mark, whether effective and concrete execution is relevant for the interior data of register this moment, restructural register file of the present invention is not supported this function, so register width only is 64.
It is static general-purpose register territory that register in the Itanium general-purpose register is divided into two subclass: GRO-GR31, and GR0 is complete 0, and is read-only; GR32-GR127 is storehouse general-purpose register territory.Static register GR0-GR31 to all processes as seen, and corresponding to each process a corresponding mobile register window (shifting window) is arranged in the stack register territory, the size of window can be by software definition, between 0-96, change, automatic exchange parameter when overlapping CALL and RETuRN operation by register between the window, thus visit avoided to storer.When process is switched, static register must carry out SAVE and RESTORE operation according to the software convention, and the switching of moving window is finished automatically by hardware in the stack register, does not need explicit software intervention, and the rename application programs of register is sightless.The moving window size is decided by SOF and two parameters of SOL, and SOF and SOL are set by instruction, and SOF is the size of moving window, initial value is 96, SOL is the number (comprising input register and local register) of local register in the window, and initial value is 0, both poor of the number of output register.When carrying out the CALL instruction, the physical address of the GR32 of current active window becomes the address of GR32 in the last window and the SOL sum of a last window, and the big or small SOF of new window is the output register territory of a last window, and the SOL of new window and SOR are 0.Recover original SOF and SOL when carrying out the RETURN instruction.Actual physical stack register number is relevant with realization, but is necessary for 16 even-multiple, and minimum is 96.
The part of stack register can be a spin register by software definition, is used for acceleration cycle to handle.When register rotated, the physical register address computation of actual access was as follows:
Physical register number=(register number<6:0 that instruction provides 〉+RRB) mod spin register territory.
Wherein, RRB is the spin register base register, and 7, initial value is 0, whenever finishes to subtract one after once iterating.The rotation territory of spin register is defined as from GR32, and size is 8*SOR<3:0 〉, SOR can be by software set, and initial value is 0, is 12 to the maximum, and the maximal value of promptly rotating the territory is 96.Have only when RRB is 0, instruction could change the size in rotation territory in the register stack.Usually, guarantee that by software the rotation territory is not overlapping with the domain output of active window, perhaps before the output parameter register is set, at first RRB is put 0.
Dependent instruction to the Itanium general-purpose register is analyzed as follows.
Itanium processor adopting IA-64 architecture, it is unit that instruction is carried out with the instruction group, the instruction group can by one or arbitrarily many instruction bundles (bundle) form.Each instruction bundles of 128 comprises three 41 bit instruction grooves and one 5 s' Template Information territory, instruct 41 long, divide 6 types: integer ALU class, non-ALU integer class, the storer class, the floating-point class, branch's class and the instruction of expansion class have the instruction of kind more than 110 form.Instruction is carried out from certain given instruction bundles address and certain instruction slots, comprise up to first stop or all instruction slots that increase according to the order of sequence and instruction bundles till shifting branch, the IA-64 architecture allows many intrafascicular independent instructions of emission different instruction, also can be at many instruction bundles of a clock period emission.The instruction relevant with general-purpose register also can be divided into two big classes in the IA-64 instruction set:
The first kind is the read-write operation to static register and current mobile register window FRAME, is example with the ADD instruction, and form is as follows:
?????8 ???x2a ???Ve ????x4 ????x2b ?????r3 ?????r2 ????r1 ????qp
??40???37 ?36 ?35??34 ???33 ??32???29 ???28???27 ???26???20 ???19???13 ??12???6 ???5???0
The instruction of second class realizes the control to register stack and spin register, is described below respectively:
1.Alloc?Stack?Frame
1) assembler syntax:
(qp)allocr1=ar.pfs,i,1,o,r
2) instruction form:
????1 ????x3 ?????sor ???????Sol ???????Sof ?????r1 ?????qp
??40???37 ???36 ?35???33 ??32???31 ???30???27 ?????26????20 ?????19????13 ??12????6 ????5???0
3) implementation: a new mobile register window is assigned on the GRS, and Previous Function State (PFS) register is copied on the general-purpose register GR1.The change of mobile register window size is finished immediately, writes GR1 and other operation is subsequently all carried out on new mobile register window.I, 1, o, r represent the magnitude range of input register, local register, output register and spin register respectively.For new mobile register window, SOF (size of local frame) is i, l and o three's sum, and SOL (size of local regeon) is i and 1 sum, and input register and local register are not distinguished physically, the rotation territory is less than SOF, and size is 8 multiple.
4) unusual: attempt to revise SOR (size of local rotating) territory when alloc instructs, and the RRB register is not at 0 o'clock, it is unusual to produce Reserved register/Field; If SOF is greater than 96, perhaps SOR is greater than SOF, and the generation illegal operation is unusual; If there are not enough registers to finish the distribution of mobile register window, processor will produce waits for finishing of STORE operation, and produces relevant abnormalities.
2.Branch
1) assembler syntax:
(qp)br.btype.bwh.ph.dh?target 25
(qp)br.btype.bwh.ph.dh?b1=target 25
br.btype.bwh.ph.dh?target 25
br.ph.dhtarget 25
(qp)br.btype.bwh.ph.dhb2
(qp)br.btype.bwh.ph.dhb1=b2
(qp)br.ph.dh?b2
2) instruction form:
A.IP-relative branch:
??????4 ????S ??d ????Wh ?????Imm20b ????P ??btype ????qp
???40???37 ???36 ??35 ???34???33 ?????32???13 ???12 ???11????9 ??8????6 ???5???0
B.Indirect branch:
????0 ???S ???d ???wh ?????x6 ?????B2 ???p ???btype ???qp
??40???37 ??36 ??35 34???33 ???32???27 ???15???13 ??12 ????119 ???8???6 ??5??0
3) implementation: finish branch condition and judge, produce branch operation or continue subsequent operation.For IP relationship type branch, the target in the compilation 25Be the branch target address sign, actual destination address is imm21=target 25-IP>>4; For indirect type branch, destination address is BRb2.The type and the function of branch are as shown in the table:
Branch pattern Function Branch condition Destination address
Cond?or?none Conditional branching The Qualifying predicate IP_rel, or indirect
Call Conditioning process calls The Qualifying predicate IP_rel, or indirect
Ret Conditioning process returns The Qualifying predicate indirect
Ia IA32 instructs activation Unconditionally indirect
Cloop Decide loop branches Cycle counter IP_rel
Ctop,cexit Modulo scheduling circulates surely Cycle count and coda counter IP_rel
Wtop,wexit The indefinite circulation of modulo scheduling Qualifying predicate and coda counter IP_rel
Each branch's class declaration is as follows:
(1) cond if Qualifying predicate (qp) is 1, produces branch, otherwise does not take place.
(2) Call if qp is 1, produces branch, has following operation: CFM (current frame marker), EC (coda counter) and current authority to be arrived PFS (previous functional state) register by SAVE simultaneously; The mobile register window of caller is by SAVE, and callee is distributed a new mobile register window automatically, and size is the output register territory for caller; RRB register among the CFM is by clear 0; Return the LINK value and be written into BR b1.
(3) return if qp is 1, produces branch, following operation is arranged simultaneously: recover CFM, EC and current authority by the PFS register; The mobile register window of caller is by RESTORE.
(4) ctop and cexit, implementation is shown in Fig. 9 d.Operation is described below.
Ctop or cexit efficient in operation,
Cycle counter LC is not equal to 0,
LC subtracts one, and EC is constant for the coda counter, and RRB subtracts one, the register rotation;
Cycle counter LC equals 0
Coda counter EC is greater than 1
LC is constant, and EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 1
LC is constant, and EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 0
LC is constant, and EC is constant, and RRB is constant, and circulation is withdrawed from.
(5) wtop and wexit, implementation is shown in Fig. 9 e.Operation is described below.
Predicate register file PR[qp] be not equal to 0,
EC is constant for the coda counter, and RRB subtracts one, the register rotation;
Predicate register file PR[qp] equal 0
Coda counter EC is greater than 1
EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 1
EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 0
EC is constant, and RRB is constant, and circulation is withdrawed from.
(6) cloop, LC are not 0 o'clock, and LC subtracts one, and branch produces.
(7) IA, unconditional transfer
3.Clear?RRB
1) assembler syntax:
clrrrb
clrrrb.pr
2) instruction form:
?????0 ??????x6 ????Qp
??40????37 ????36????33 ????32????27 ????26????6 ??5????0
3) implementation: the ALL type is clear 0 with all RRB (general-purpose register, flating point register heap and predicate register file heap), and the PRED type is only with the RRB register clear 0 of predicate register file heap.
4) unusual: this instruction is necessary for the last item instruction in the instruction group, otherwise produces the illegal operation mistake.
4.Cover?Stack?Frame
1) assembler syntax: cover
2) instruction form:
?????0 ??????x6 ????Qp
??40????37 ????36????33 ???32????27 ????26????6 ??5????0
3) implementation: distributing a new size is that 0 storehouse moves register window, and this moves the register that does not comprise in the register window in any last mobile register window, and RRB is reset.
4) unusual: this instruction is necessary for the last item instruction in the instruction group, otherwise produces the illegal operation mistake.
5.Flush?register?stack
1) assembler syntax: flushrs
2) instruction form:
????0 ????x3 ???x2 ?????x6 ????Qp
??40???37 ?36 ??35???33 32???31 ???30????27 ????26????6 ??5????0
3) implementation: the register in the DIRTY territory in all stack registers (comprising that all previous processes move the register that does not deposit in as yet in the register window in the backing register) is written into backing memory.
4) unusual: this instruction is necessary for article one instruction in the instruction group, and must be arranged in the SLOT0 and the SLOT1 of instruction bundles, otherwise the result does not have definition.
6.Load?register?stack
1) assembler syntax: loadrs
2) instruction form:
?????0 ?????x3 ????x2 ???????x6 ?????Qp
??40???37 ?36 ??35???33 ??32???31 ????30????27 ????26????6 ???5????0
3) implementation: the value that is arranged in current BSP pointer some in the past in this instruction assurance storer is written into the DIRTY territory of stack register, and all other registers are flagged as invalid, but do not deposit backing memory in.The data of LOAD what by the decision of RSC.loadrs register, and when this register was 0, its effect was invalid for all registers beyond the current mobile register window are changed to.
4) unusual: this instruction is necessary for article one instruction in the instruction group, and must be arranged in the SLOT0 and the SLOT1 of instruction bundles, otherwise the result does not have definition.
According to above-mentioned instruction analysis, the design operator is as follows.Discuss for convenience, only consider the primary demand of 3 register ports (two read one writes), more the register file operator method for designing of multiport is identical.
1. read operator: ROTPATH1<6:0 〉; ROTPATH2<6:0 〉.
1) operator form:
Comprise RSROT1NO<6:0 respectively〉and RSROT2NO<6:0 two territories, be used to control two read ports operations.
2) assembler syntax
ROTRD1<source-register 〉
ROTRD2<source-register 〉
Wherein source-register is the arbitrary register among the R0-R127.
3) operation is described
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint
Cooperate the route operator, should use other function operator, data operator or composition operators.
2. write operator: ROTRFW<8:0 〉, the write operation under the control Window state.
1) operator form
RDRTO<6:0> RSRFW<1:0>
RDRTO<6:0 wherein〉be the destination operand address, RSRFW<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD3, functional part result bus AUDD counts the gating of IMMD and storage port MD0 immediately.
2) assembler syntax
ROTWR<destination register〉<Data Source 〉
Wherein, destination register is the arbitrary register among the R0-R127, and Data Source is PRD3, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore, cooperates the ROTRFW operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
The R0 perseverance is 0, and when destination operand was R0, it was invalid to operate.
Annotate: above register R0-R127 as source operand/destination operand is by 7 bit address sequential encodings.
3. mobile register window assignment operators Allocframe<17:0 〉
1) function: the size in mobile register window length, local register territory and rotation territory is set, and the rotation domain register is one 4 bit register, and the value in actual rotation territory moves to left 3 for the value of rotation domain register.
2) operator form:
Sor<3:0> sol<6:0> Sof<6:0>
3) assembler syntax:
ALLOC#i,#l,#o,#r
// annotate: sol=I+1, sof=I+1+o, sor=sor<<3.
4) operation is described
The monocycle operator is finished moving window size and register rotation territory and is provided with, and the result that this operator is carried out will revise SOR, SOL and SOF register.
4. branch controls operator BRANCH<3:0 〉
1) function: fixed circulation of control and indefinite round-robin branch operation.
2) operator form:
BTYPE<2:0> RDBRANCH<0>
BTYPE<2:0〉be the type control domain, it is as follows to encode:
BTYPE<2:0> Operation BTYPE<2:0> Operation
000 Ctop 100 CLOOP
001 CEXIT 101 CALL
010 Wtop 110 TETURN
011 WEXIT
111 Keep
RSTYPE<0〉be source, address control domain, the control address source is: IMMD (RSTYPE<0〉be 0), SMDI (RSTYPE<0〉be 1).
3) operation is described
For the CALL operation, will revise BOF register (BOF is the physical address of first register of current window), automatically new SOF territory is arranged in original SOF-SOL zone, window is moved.
For the RETURN operation, recover previous BOF, SOL and SOF, window moves in the other direction.
3) assembler syntax:
CTOP?IMMD
CEXIT?IMMD;
WTOP?IMMD;
WEXIT?IMMD;
CLOOP?IMMD;
CALL?IMMD;
CALL?REG;
RETURN?REG;
5. operator OPSTK<1:0 〉
1) function: the managing stack state, carry out clrrrb, cover, flushrs and Loadrs operation.
2) operator coding:
OPSTK<1:0> Operation OPSTK<1:0> Operation
00 CLRRRB 10 FLUSHRS
01 COVER 11 LOADRS
3) assembler syntax:
CLRRRB;
COVER;
FLUSHRS;
LOADRS;
4) implementation is described:
During the CLRRRB efficient in operation, the RRB register is put 0;
During the COVER efficient in operation, SOF, SOL, SOR and RRB register whole clear 0;
During the FLUSHRS efficient in operation,, carry out the STORE operation if BSP is not equal to BSPSTORE;
During the LOADRS efficient in operation, the value of all registers between BSP and BSP-Number_of_Bytes is advanced register stack by LOAD, and is changed to the DIRTY state.
The register window structural representation that Figure 10 separates with window registers for global register.These register window parts comprise window registers heap address converting member 101, window registers heap 102, global register heap 103 and data output alternative pack 104.
If the visible register number of the system in the register window is 2 a, the number of global register is 2 b(b<a), then the register number that comprises in the global register heap is 2 b, the number of physical registers that comprises in the window registers heap is 2 m, for stationary window, m=b+1+k, k is a nonnegative integer, for moving window, 2 m〉=2 a-2 b
When carrying out write operation, register address in the instruction (register source operand in the route operator and the register destination operand in the data operator, bit wide is physical register address (bit wide is m) the access window register file 102 that the process window registers heap address converting member 101 in a) converts the overlapping characteristic of coincidence window to, the low order address corresponding to the global register number [b-1:0] of instruction address is directly visited global register heap 103 simultaneously, in arbitrary concrete moment, write operation can only act on global register heap and window registers and pile among both one, and whether the write operation of register file is effectively determined by the write address enable signal.The write address enable signal carries out logical combination and forms by writing in operator enable signal and the instruction high position [a-1:b] of register address: when write operator enable to instruct simultaneously in a register address high position when being 0, effective to the write operation of global register heap; Write operator enable to instruct simultaneously in a register address high position be not entirely at 0 o'clock, effective to the write operation of window registers.The global register heap is identical with the write operation Data Source of window registers heap.
When carrying out read operation, [b-1:0] section of register address in the instruction is visit global register heap 103 directly, carry out address translation access window register file later on through window registers heap address converting member 101 simultaneously, the principle of address translation is identical with write operation, select correct data to export by output data alternative pack 104 between the output data of two separation register files, whether the condition of output is complete 0 for the high position [a-1:b] of register address in the instruction: if the address high position [a-1:b] of read operation is complete 0, then effective to the read operation of global register heap, select the data output of global register heap; If be not complete 0, then effective to the read operation of window registers heap, select the data output of window registers heap.
Above structure not only is applicable to the fixedly design of register window, also is applicable to the design of mobile register window.Difference between the two is: for the fixedly design of register window, the register number in each register window and the size of register input, part and domain output are fixed; For the design of mobile register window, register number in each register window and register local field and domain output size are able to programme.Contact between the two is: they all comprise a global register (or claiming a static register) territory and a window registers territory (or claiming the stack area), global register to arbitrary process as seen, and each window in the window registers territory only to a certain detailed process as seen, overlap each other between the window and constitute a loop stack, the input register territory of new window comprises the valid data in the output register territory in the old window, switching by window reduces the visit for storer, to improve executing efficiency.
More than the realization of two kinds of register windows all can adopt parts shown in Figure 10 to finish, difference between the two mainly is the generation principle difference of window registers heap read/write address converting member.What Figure 10 a and Figure 10 b described is the address translation parts of window registers heap in the stationary window design, and what Figure 10 c and Figure 10 d described is the address translation parts of window registers heap in the moving window design.
Parts shown in Figure 10 a are used for fixing the window registers heap address conversion of register window corresponding to the window registers heap address converting member 101 among Figure 10.Because size and register input domain, local field and the domain output of stationary window are all determined, so can determine the physical address of register by current window pointer CWP uniquely.Window registers heap address converting member in the stationary window design comprises:
CWP (current window pointer) generates parts 1011, its reset values that is used to reset 0, with be used to control reset signal RST, SAVE operation enable signal and the RESTORE operation enable signal that CWP changes, wherein SAVE and RESTORE signal are that functional domain according to the enable signal EDAU (low level is effective) of operator DAU and operator DAU carries out logical combination and produces; It is output as value CWP1 and the CWP2 of the CWP of two different sequential, be respectively applied for the generation of reading address and write address, this is because the sequential of the read pointer of SAVE operation and RESTORE operation and write pointer is different, be source operand from old window (SAVE and RESTORE operation finish before window), and destination operand is from new window (SAVE and RESTORE operation finish before window).According to the definition of up-to-date SPARC architecture, the alter mode of CWP has two classes: the mode that SPARC V8 adopts be SAVE when effective CWP subtract 1, CWP added 1 when RESTORE was effective; The mode that SPARC V9 adopts be SAVE when effective CWP add 1, CWP subtracted 1 when RESTORE was effective.1011 specific implementation will further specify in Figure 10 b.
Address pretreatment component 1012 deducts definite value 2 with each register address in the instruction b
CWP address extension parts 1013 with the CWP logical shift left, extend to the m position, when realizing the conversion of SPARCV9 window address, also need to increase supplement operation, CWP increases one during with assurance SAVE efficient in operation, and the output register of the input register of N+1 window and N window is overlapping simultaneously.
Adding unit 1014.The value of the CWP1 of 1011 outputs after through 1013 address extension with finish add operation through the 1012 pretreated addresses of reading by adding unit 1014, obtain the physical address of final window registers heap read operation; The value of the CWP2 of 1011 outputs after through 1013 address extension with finish add operation through 1012 pretreated write addresses by adding unit 1014, obtain the physical address of final window registers heap write operation.
Figure 10 b is that example is described further Figure 10 a with the window registers heap address converting member of the fixedly register window that is made of 8 windows, a=5 wherein, b=3, k=3, m=b+1+k=7.Each register stationary window comprises 32 (2 a) individual register, they are respectively by 8 (2 b) individual global register, 8 input registers, 8 local registers and 8 output registers constitute, and the window number is 8 (2 k), the number of total physical register is 136 (2 b+ 2 m).The feature operation of register window meets the standard of SPARC V9, and the enforcement of window registers heap address converting member is described below.
For for simplicity, the situation of two read operations and a write operation executed in parallel only is discussed, need produce three address WINRP1 simultaneously, WINRP2 and WINWP (two read one writes) this moment.From the register destination operand territory of the register source operand field and the data operator of path operator, address width is 5 to read/write address in the instruction, corresponding to 32 registers in the window respectively.When carrying out read-write operation, low three direct visit global register groups (comprising 8 registers) of address, 5 bit register addresses in the instruction convert 102 (the comprising 128 registers) of window registers heap among 7 physical register addresses visit Figure 10 of the overlapping standard of coincidence window to through the window registers heap address converting member 101 among Figure 10 simultaneously.
CWP generates parts 1011 ' and comprises (3 in 3 registers, corresponding to 8 windows), preserve CWP respectively, the value of CWP-1 and CWP+1, definition according to SPARC V9 architecture, when the SAVE efficient in operation, select 1 value that increases of CWP, when the RESTORE efficient in operation, select 1 value that subtracts of CWP, select the CWP initial value when not carrying out SAVE and RESTORE operation, the input channel of IMMD (being 0 in a preferred embodiment) is used to reset, when reset signal RST is effective, select IMMD, the CWP reset values is 0.Because the source operand of SAVE and RESTORE operation is from old window, and destination operand is from new window, therefore CWP generates parts and exports two values that the CWP register latchs front and back simultaneously, and the value CWP1 after latching is used for read operation, and the value CWP2 before latching is used for write operation.Two the output valve CWP1 of CWP and CWP2 become 7 physical addresss (mending 40 behind CWP) through CWP address extension parts 1013 ', because it is overlapping that SPARC V9 defines the output register of the input register of n window and n-1 window, therefore before address extension, increase supplement operation (negate adds).5 bit address RS1 of register in the instruction, RS2 and RD are after deducting definite value 8 through pretreatment component 1012 ', respectively with the output valve addition (parts 1014 ') of parts 1013 ', obtain final window registers stacking yard reason address, comprise that two are read address WINRP1, WINRP2 and one write physical address WINWP.
Like this, the 5 bit register addresses that provide corresponding to instruction can be mapped to global register heap GR0-GR7 and above the window registers heap RF00-RF7f, and realize the overlapping of window.The register of each window actual access is as follows:
For No. 0 window, the actual register of visit is GR0-GR7, RF00-RF17;
For No. 1 window, the actual register of visit is GR0-GR7, RF70-RF07;
For No. 2 windows, the actual register of visit is GR0-GR7, RF60-RF77;
For No. 3 windows, the actual register of visit is GR0-GR7, RF50-RF67;
For No. 4 windows, the actual register of visit is GR0-GR7, RF40-RF57;
For No. 5 windows, the actual register of visit is GR0-GR7, RF30-RF47;
For No. 6 windows, the actual register of visit is GR0-GR7, RF20-RF37;
For No. 7 windows, the actual register of visit is GR0-GR7, RF10-RF27.
Figure 10 c is the another kind of window registers heap read/write address converting member based on global register shown in Figure 10 and window registers separate design method, because therefore the register number in this window registers heap in each register window can claim mobile register window again by software set.Register in the mobile register window also comprises two parts: a part is global register (claiming static register again), and this component register to arbitrary process as seen; Another part is a window registers, register window corresponding to each detailed process has nothing in common with each other, the physical register start address of each window and window size are determined by BOF, SOF, three parameters of SOL, BOF is the physical address of first register of current window, SOF is the size of moving window, and SOL is the size in local register territory in the moving window.These two values of SOF and SOL can be by software set, SOL≤SOF, and the modification of BOF then is by hard-wired, its general alter mode is: BOF n=BOF N-1+ SOL N-1, BOF wherein n, SOL N-1The SOL value of representing the BOF and n-1 the moving window of n moving window respectively.Each moves the size of register window can be different, but between the window by overlapped formation loop stack, the output register territory OUTS of the overlapping region current window of mobile register window, because mobile register window is not distinguished input register territory and output register territory on hardware, actual output register territory is determined by following formula: SOF-SOL.The overlap mode of mobile register window is: when carrying out the CALL operation, the output register territory of current window becomes the SOF of new window automatically, the SOL of while parent window and the BOF that the BOF sum becomes new window, when carrying out ALLOC (window size batch operation) operation, the value of SOF and SOL can enlarge according to the requirement of instruction or dwindle; When carrying out the RETURN operation, recover CALL operation BOF, SOF and SOL before.
Parts shown in Figure 10 c are used for the window registers heap address conversion of mobile register window also corresponding to the window registers heap address converting member 101 among Figure 10.This structure comprises:
Pretreatment component 1015 deducts (comprising read/write) number of global register with the register address in the instruction;
BOF (physical address of first register of current moving window) generates parts 1016, and it is input as the local register territory SOL and control signal CALL, the RETURN operation enable signal (CALL and RETURN signal are formed by the functional domain and the enable signal EBRANCH logical combination of BRANCH operator) of current window; Be output as the BOF value (physical address of first register of current window) of new window; Function is to revise the value of BOF according to instruction definition, and the reset values of BOF is 0, when the CALL efficient in operation, BOF automatically and the local register territory SOL addition of current window, form the BOF value of new window, when the RETURN efficient in operation, BOF reverts to the BOF value of previous window.
Adding unit 1017, with the output valve of parts 1015 respectively with 1016 output valve addition, obtain the final physical address (comprising read/write) of window registers heap, be used to visit the window registers heap of mobile register window.
Figure 10 d introduces the window registers heap address converting member that the Itanium that realizes according to this method moves register window, is further specifying Figure 10 c.
According to the definition of Itanium architecture, the Itanium general-purpose register comprises the visible register of 128 individual system, and it is static general-purpose register territory that these general-purpose registers are divided into two subclass: GR0-GR31; GR32-GR127 is storehouse general-purpose register territory.The address translation of Itanium general-purpose register mainly contains two classes: a class is mobile register window address translation, and another kind of is register rotation address translation.The register rotation is what to carry out on the basis of mobile register window, and rotation only limits to the SOL inside, local register territory (the register rotation under the mobile register window will further specify) of the mobile register window of corresponding active procedure in Figure 11 c.When as mobile register window, static register GR0-GR31 in the Itanium general-purpose register to all processes as seen, and corresponding to each process a corresponding mobile register window is arranged in the stack register territory, the size of window can be by software definition, between 0-96, change, automatic exchange parameter when overlapping CALL and RETURN operation by register between the window, thus visit avoided to storer.The moving window size is decided by SOF and two parameters of SOL, and SOF is the size of moving window, and initial value is 96, and SOL is the number (comprising input register) of local register in the window, both poor of the number of output register.When carrying out the CALL operation, the physical address BOF of the GR32 of current active window nBecome the address BOF of GR32 in the last window N-1Local register territory SOL with a last window N-1Sum, the output register territory (SOF of a last window N-1-SOL N-1) become the SOF of new window automatically nWhen carrying out the ALLOC operation, three values of SOF, SOL and SOR are set simultaneously under instruction control; Recover the last CALL operation SOF and SOL before when carrying out the RETURN instruction.
Above-mentioned Itanium general-purpose register can realize with parts shown in Figure 10, and wherein mobile register window address conversion module 101 can realize by the structure shown in Figure 10 c, this moment a=7, b=5, m=7.Because static register to arbitrary process as seen, so the minimum value of the register number that comprises of moving window is 32 (2 b), and maximal value is 128 (2 a), the number of total physical register is 160 (2 b+ 2 m), physical circuit is shown in Figure 10 d.For for simplicity, the situation (need produce three physical addresss this moment simultaneously, and two read writes) of two read operations and a write operation executed in parallel only is discussed.
Read/write address in the instruction (RS1<6:0 〉, RS2<6:0, RD<6:0) respectively from the register destination operand territory of the register source operand field and the data operator of path operator, when carrying out the register read write operation, register read write address RS1 in the instruction, low 5 direct access static register files of RS2 and RD, each read/write address (7) converts overlapping 7 the physical register address access window register files (comprising 128 registers) of coincidence window to through the window registers heap address converting member shown in Figure 10 d simultaneously.By write address enable signal decision and effectively to one of write operation of window registers heap to the write operation of global register heap, wherein the write address enable signal is the high-order RD<6:5 by write address〉and write the operator enable signal and combine, combination condition is: write that operator enables and RD<6:5 〉=00 o'clock, write operation to the global register heap is effective, write that operator enables and RD<6:5〉00 o'clock, effective to the write operation of window registers heap.Select the output of one of the sense data of global register heap and the sense data of window registers heap by reading useful signal control data output selection device, wherein read useful signal by reading the high-order RS1<6:5 in address〉be combined into RS2<6:5, combination condition is RSi (i=1,2) high two is 0 o'clock, the sense data of global register heap is effective, high two of RSi (i=1,2) is not 0 o'clock, and the sense data of window registers heap is effective.
In the window registers heap address converting member shown in Figure 10 d, comprise and subtract 32 pretreatment component 1015 ', by 37 adding units 1017 ' that totalizer constitutes, and comprise the BOF that the loop stack of N register constitutes by 7 totalizers and one and generate parts 1016 '.The function that BOF generates parts 1016 ' is that BOF is carried out reset operation and according to the value of modifying of order BOF, when the CALL efficient in operation, and BOF n=BOF N-1+ SOL N-1, when the RETURN efficient in operation, BOF reverts to the BOF value of previous window.Parts 1016 ' are a kind of preferred structures of realizing above-mentioned functions, the register cycle storehouse that this structure can be come to determine by a register number as required constitutes, MUX is a gate among the figure, LAT is a latch, ADDER is a totalizer, when carrying out the SAVE operation, (value of SOL is determined by the ALLOCFRAME operator according to the Itanium instruction fetch to select BOF and local register territory SOL by signal 10161 control gates, the operating result of ALLOC is preserved by relevant register) additive value, refresh stack top register BOF, cooperate by 10161 and 10162 signals simultaneously and carry out push operation; During the RETURN efficient in operation, play stack operation, recover the initial value of BOF by signal 10161 and 10162 cooperations.Wherein signal 10161 and signal 10162 are combined by CALL signal and RETURN signal logic, and its principle is: circuit-switched data source, a control gate MUX gating left side when carrying out the CALL operation, and all latchs are opened simultaneously, promptly carry out a stack-incoming operation; When carrying out the RETURN operation, control gate MUX gating right wing Data Source, all latchs are opened simultaneously, promptly carry out one and go out stack operation.
Because the stack area of Itanium definition general-purpose register is from GR32, so the 7 bit register addresses that provide corresponding to instruction can be mapped to global register heap static register heap SR0S-SR31 and above the window registers heap WINRF (RF00-RF7f), and realize the overlapping of window.
Figure 11 is the spin register address generation block diagram based on look-up table.Spin register is a kind of register file control technology that produces in order to adapt to the development of optimizing technique of compiling, is used for the modulo scheduling of support software flowing water, and the name of eliminating data when round-robin scheduling is relevant.Spin register heap provides a kind of register renaming mechanism, makes in the new circulation that software flow constitutes that in fact the write operation to some registers in the instruction writes distinct register continuously, thereby guarantees correct semanteme.A spin register heap has a corresponding with it spin register base register (RRB).Article one, the register number of appointment adds the summation of RRB value in the instruction, and register rotation territory is used to actual register address on the mould.Special branch operation makes RRB cut down when new each time iteration begins in the modulo scheduling, comes to distribute different registers for the identical operations in the different iteration with this and comes event memory.
If the number of physical registers of register file is 2 n, rotation territory multiple is SOR (Size OfRotating), the size in actual rotation territory is SOR*2 m, SOR be big I by software set, value 1-s, s are natural number and s*2 m<2 n, m can be by architectural definition, and m is that 0 o'clock SOR is the rotation territory.
Address when parts shown in Figure 11 are used for the register rotation generates, and these parts comprise 3 parts:
Rotation base register (RRB) generates parts 111, and its function is RRB to be resetted and finish according to command request RRB is carried out 0 operation clearly and subtracts 1 operation.It is input as control signal CLR (controlling clear 0 operation) and rotation useful signal ROTATING (control RRB decrement operations); Be output as the rotation base RRB that this iterates.When the CLR signal is effective, RRB is put 0.The generation of CLR signal may be the RESET that resets, also may be clear 0 operation (for example by the CLRRRB operation of controlling according to the OPSTK operator of ITANIUM instruction fetch) in the instruction, the ROTATING signal then be to carry out logical combination by the operator of the special branch operation of control (for example by the BRANCH operator that is used to control CTOP, CEXIT, WTOP, WEXIT operation according to the ITANIUM instruction fetch) functional domain and enable signal thereof to form.
Adding unit 112, with the register address in the instruction respectively with the RRB addition, carry out register address rotation;
The parts 113 of tabling look-up, the full arrangement of the high position of the OPADD with 112 [n-1:m] and all possible value of SOR as rower and row mark, ask mould as list item to the row target rower respectively, are used for the register rotary manipulation in the rotation territory of different sizes.The low level of the OPADD of result who tables look-up and adding unit 112 [m-1,0] combination, the actual physical address of formation spin register.Above rower is selected and can be changed according to the different of physics realization mode with the row target.
Figure 11 a is an example of above-mentioned spin register, n=7 in this structure, m=3, s=12.This be one by 128 (2 n) register file that individual register is formed, the rotation territory can be 8 (2 m) 1-12 register rotary indicator doubly generate the structural drawing of parts.For convenience's sake, establishing the register file port number is 3 (two read one writes).This structure comprises a RRB who is made up of 7 digit counters and generates parts 111 ', and clear 0 by CLR signal controlling RRB when carrying out RRB and empty operation, RRB subtracts 1 when carrying out the rotation branch operation.In the preferred structure of Figure 111 ', MUX21_7 is 7 a gate, and DEC_7 is 71 device that subtracts, and DFF_7 is 7 a trigger.The latch signal 1111 of RRB is the logical OR relation of CLR and ROTATING.When resetting or carrying out clear 0 when operation of RRB, select definite value 1 by the CLR signal, trigger DFF preserves the output valve 0 that subtracts 1 device, and RRB is put 0; When the register rotation took place, the gate acquiescence was selected the value of feedback of RRB, and trigger DFF preserves 1 value that subtracts of RRB, and RRB successively decreases.
This structure also comprises an adding unit 112 ' and the parts 113 ' of tabling look-up.Adding unit 112 ' is used for three register address RS1 with instruction, RS2 and RD (two read one writes) respectively with the RRB addition, then with the value of high 4 [6:3] of each addition results and SOR (4) together as the input of lookup table circuit 113 ', and with the list item of output high 4 [6:3] as spin register physical address (comprise two read write), piece together 7 new bit address (comprise two read write) respectively with low three [2:0] of 112 ' OPADD, promptly form register rotation physical address corresponding to register address in the instruction.
Figure 11 b is the gauge outfit and the list item of lookup table circuit 113 ' among Figure 11 c, and wherein row mark 1131 is the binary representation of SOR (1-12), and rower 1132 is high four the full arrangements of 112 ' parts OPADD, and list item 1133 is that rower is to row target delivery value.Rower and row mark can be comparatively speaking, the implementation of form also can be ROM or register, a kind of preferred hardware implementation is to store with 12 64 bit registers to ask the mould value, when resetting, data in the register are reset to the data of each row among Figure 11 b, when the SOR value is determined in the ALLOC operation, at SOR[3:0] control under a gating column data (be kept in the register of a 64bit) corresponding with current SOR, when the register rotary manipulation that causes when branch instruction is effective, only high 4 with the address of tabling look-up are index, asking in the mould value of 16 4bit at these row selected, and obtains corresponding list item.The 3rd merging of this list item and the address of tabling look-up promptly becomes the postrotational physical address of register.When adopting this method to carry out register rotation address translation, only select the time-delay cost of one gate can finish the modular arithmetic of asking of rotating the address with one 16.
Figure 11 c is the circuit diagram of the spin register address translation parts in the moving window of realizing that Figure 11 b is combined with Figure 10 d, and the purpose that designs this structure is in order to realize the respective operations of Itanium processor general-purpose register.As mentioned above, the Itanium general-purpose register piles up the architecture interface to comprise a size is that 32 static register territory (GR0-GR31) and size are 96 storehouse territory (GR32-GR127).The register in storehouse territory is made up of the programmable mobile register window of size, the adjunct register window formation loop stack that overlaps each other.
The Itanium general-purpose register supports mobile register window and register to rotate two generic operations simultaneously, the effect difference of two generic operations, the former effect be the process that overlaps by register window swap data is to reduce the visit to storer when switching, the latter's effect then is the modulo scheduling for support software flowing water on hardware.
The effect of aforesaid operations is described below: when carrying out the CALL operation, and the physical address BOF of the GR32 of current active window nBecome the address BOF of GR32 in the last window N-1Local register territory SOL with a last window N-1Sum, the output register territory (SOF of a last window N-1-SOL N-1) become the SOF of new window automatically nWhen carrying out the ALLOC operation, three values of SOF, SOL and SOR are set simultaneously under instruction control; Recover the last CALL operation SOF and SOL before when carrying out the RETURN instruction; The operation of executive software flowing water loop branches (CTOP, CEXIT, WTOP, in the time of WEXIT), register rotates, but the register rotation is what to carry out in the scope of mobile register window, the size in rotation territory is SOR*8.The register rotation can only (window registers) be carried out in the stack area, and the rotation territory is 8 1-12 times, and the rotation territory is defined as from GR32.
The Itanium general-purpose register can realize with parts shown in Figure 10, the circuit structure of wherein mobile register window part carried out description in conjunction with Figure 10 d, because the register rotation is only carried out in window registers heap (register stack territory), therefore when on the basis of Figure 10 d, increasing the register spinfunction, only need to revise the address translation parts of window registers heap, to the visit of global register heap and constant to the control of data output alternative pack.The circuit diagram of the window registers heap address converting member that increase register spinfunction obtains on the basis of Figure 10 d in order discussing conveniently, only to be introduced two and is read a three-address architecture of writing shown in Figure 11 c, and more read/write address transfer principle is identical.
In Figure 11 c, the physical address of window registers heap produces the path two.In article one path, read/write address in the instruction (RS1<6:0 〉, RS2<6:0, RD<6:0) through after pretreatment component 1015 ' deducts definite value 32 separately, directly go to rotation address selection control assembly 114, select this value and BOF to generate the BOF value addition of parts 1016 ' output by control signal 1141 controls, obtain final physical address, the physical address of this moment only is the address of carrying out mobile register window operation, when register does not rotate, can select this path for use.In the second path, read/write address in the instruction (RS1<6:0 〉, RS2<6:0 〉, RD<6:0 〉) through pretreatment component 1015, after deducting definite value 32 separately, generate the RRB value addition of parts 111 ' output respectively with RRB, high 4 and SOR with the output valve of adding unit 112 ' are index accesses lookup table circuit 113 ', low 3 amalgamations of output valve of the list item that obtains and parts 112 ' of tabling look-up form input rotation address, rotation addresses alternative pack 114, select this circuit-switched data and BOF to generate the BOF value addition of parts 1016 ' output by control signal 1141 controls, obtain final physical address, this address is the register physical address when carrying out the register rotation in the mobile register window.
114 parts are rotation address selection control assembly, and its controlled condition is register rotation useful signal ROTATING, and this signal is combined by functional domain and the BRANCH operator enable signal according to the operator BRANCH of branch of Itanium instruction fetch.When register rotates when invalid, select above-mentioned article one path to produce physical address, when register rotates when effective, select above-mentioned second path to produce physical address.Owing to be to adopt mobile register window that static register shown in Figure 10 and window registers separated structures realize the Itanium general-purpose register and register rotary manipulation, therefore register GR0-GR31 is mapped on the global register heap GPR (GPR00-GPR31), GR32-GR127 is mapped on the window registers heap WINRF (RF00-RF7f), the reset values of BOF is 0, corresponding to first register among the window registers heap WINRF.
Figure 12 is the structural representation of restructural register file.A restructural register file comprises three parts at least:
Address translation and address selection parts 121, register address in the instruction is converted to the register file physical address that satisfies the System Design requirement, control simultaneously and from the register physical address that difference in functionality obtains, select the current effective operation address, register file is conducted interviews, according to reconfigurable design resource rule, the number of reading address and write address of parts 121 outputs is respectively to finish the needed union of reading address and write address number of each reconstruct element, this means that the reading-writing port number that register file need be provided with is the union that realizes the reading-writing port number of each function needs.
A data input alternative pack 122, the register that control obtains from difference in functionality write selects current effective to write data the data, cooperate the write operation of realizing register file with the register write address.
Register file parts 123, according to reconfigurable design resource rule, required the number of registers is for realizing the union of the needed register number of each function, and the number of register file reading-writing port is also for realizing the union of the needed register file reading-writing port of each function number.
Control signal 124 is selected signal for mode of operation, be controlled under a certain definite mode of operation, selector is should the input data and the read/write address of pattern, this signal can be combined by the enable signal of the register manipulation operator of inhomogeneity function, also can realize by in instruction, increasing the mode of operation control domain, when putting in order unanimity, the signal that control address is selected to select with control data is identical.
Figure 12 a is the restructural register file structure block diagram with register window, mobile register window and two kinds of mode of operations of spin register.For for simplicity, every kind of mode of operation only considers that all two read a primary demand of writing.Wherein, stationary window comprises 32 registers on the architecture interface, and the number of global register, input register, local register and output register all is 8; Moving window comprises 128 registers on the architecture interface, wherein static (overall situation) the number of registers is 32, and can realize register rotary manipulation on the basis of mobile register window, the register rotary manipulation can only carry out in the zone of non-static register.The method for designing that this register file adopts global register shown in Figure 10 to separate with window registers comprises as lower member:
Register address conversion and alternative pack 121 ', comprise: stationary window address translation parts 1211, the register address 125 that is used for will instructing under the stationary window mode of operation is converted to the physical address of window registers heap, and concrete change-over circuit has detailed description in Figure 10 b; Moving window and register rotation address converting member 1212, the register address 126 that is used for will instructing under moving window and register rotary work pattern is converted to the physical address of window registers heap, concrete change-over circuit is seen Figure 11 c, it should be noted that owing to the R32-R127 of the physical register in the window registers heap corresponding to register address in the instruction, therefore the initial physical addresses BOF reset values of R32 is 0, corresponding to first physical register of window registers heap; Address selection parts 1213, be used for selecting effective address access window register file WINRF (RF00-RF127) between register window and mobile register window and two kinds of mode of operations of spin register, the inner structure of address selection parts 1213 will further specify in Figure 12 c;
Data input alternative pack 122 ', be used between register window and mobile register window and two kinds of mode of operations of spin register, selecting valid data to write global register heap GPR (GR00-GR31) or window registers heap WINRF (RF00-RF127), because every kind of pattern has only a write port, therefore these parts can be reduced to the selection of importing data to two in this example, a kind of combination that is preferably designed for the enable signal of operator of selecting signal 124 ' (with read-write operator enable signals all in the generic operation or relation, operator enable signal low level is effective), be used for showing that current operation is register window operation or the operation of mobile register window;
Register file 123 ', owing to adopt global register heap shown in Figure 10 and window registers heap separated structures, so register file 123 ' comprises global register heap 1231, window registers heap 1232 and data output alternative pack 1233.
Wherein, global register heap 1231 comprises 32 64 bit registers, 2 write ports and 4 read ports.This is because when as stationary window, need 8 registers, and need 32 registers during as moving window and spin register, according to reconfigurable design resource rule, total register number is 32, and the reading-writing port number is that 6 (4 read 2 writes) are because the read/write address width difference of the global register heap between the different mode.When being used for fixing register window when operation of compatible SPARC V9, the read/write address of global register heap is the low level of each read/write address in the address signal 125: WRS1[2:0], WRS2[2:0] and WRD[2:0]; Move register window when operation when being used for compatible Itanium, the read/write address of global register heap is the low level SWRS1[4:0 of each read/write address in the address signal 126], SWRS2[4:0] and SWRD[4:0].The inner structure of global register will further specify in Figure 12 b.
Window registers heap 1232, comprise 128 64 bit registers, physics realization corresponding to the GRS district of the window registers of SPARC and Itanium, because the reading-writing port number of two kinds of pattern needs all is 3 (two read one writes), according to reconfigurable design resource rule, the reading-writing port number of register file is both unions, promptly also is 3 reading-writing port (two read one writes);
Data output alternative pack 1233, its function are to select correct sense data when global register and window registers separation between global register heap and window registers heap, and the inner structure of data output alternative pack will further describe in Figure 12 c.
Figure 12 b is the structural drawing of global register heap 1231 among Figure 12 a.During as the stationary window register file, two read 3 global register addresses (3) of writing acts on register GR0-GR7; During as mobile register window and spin register, two read 3 static register addresses (5) of writing acts on register GR0-GR31.Wherein GR0-GR7 is overlapping resource, its preferred structure sees 12311 ', the input data of register are in two kinds of patterns, according to the reconfigurable design control law, latch controlled condition and be that two kinds of patterns latch controlled condition or relation, and the latch signal of control register GR8-GR31 is only relevant with a kind of mode of operation of mobile register window, and its basic structure is shown in Figure 123 12 '.
Figure 12 c is the structural drawing of address selection parts 1213 and data output alternative pack 1233 among Figure 12 a.The address selection parts are realized the selection of reading an address and a write address to two respectively, WRPi (i=1 among the figure, 2) and SWRPi (i=1,2) represent respectively the rotation of stationary window and moving window and register window registers heap read the address, WWP and SWWP represent the write address of the window registers heap of stationary window and moving window and register rotation respectively, and alternative condition is each self-corresponding operator enable signal of writing.Data output alternative pack is realized the selection corresponding to two sense datas of two sense datas of the global register heap of a certain mode of operation of register file and window registers heap respectively, alternative condition is whether the enable signal and the address high position of operator is 0, (signal 12331 and signal 12332 correspond respectively to the selection signal under stationary window and the moving window pattern), GPRRPORTi among the figure (i=1-4) is the data of the output port of global register heap, GPRRPORT1 wherein, output data when GPRRPORT2 represents that global register is GR0-GR7, GPRRPORT3, output data when GPRRPORT4 represents that global register is GR0-GR31; RFRPORTi (i=1-2) is the data of the output port of window registers heap; The final output data of WRPORTi (i=1,2) expression stationary window, the final output data of SWPORTi (i=1,2) expression moving window and register rotation.
Figure 12 d is the restructural register file structure block diagram with register window, mobile register window and spin register, three kinds of mode of operations of random read-write register file.Random read-write pattern wherein comprises the executed in parallel of 4 read operations and 4 write operations.
The basic structure of this register file and Figure 12 a are similar, and the main difference part is to have increased random read-write address 127 (comprising that 4 are read address and 4 write addresses, each 7).According to reconfigurable design resource rule, this moment, the reading-writing port number of window registers heap 1232 ' actual needs was 8 (four read four writes), needed modified address alternative pack 1213 ', data input part part 122 simultaneously ", the structure 1233 ' and the address/data of data output section part select control signal 124 ".The structure of address selection parts 1213 ' is shown in Figure 12 e, 14 addresses of input (comprise that reading 1 for 2 of fixing register window writes, reading 1 for 2 of mobile register window writes, and read 4 for 4 of random read-write register file and write) (4 read 4 writes to merge into 8 read/write address through address selection, union for the needed read/write address number of above different mode), control four read ports and four write ports of window registers heap respectively, alternative condition 124 " a kind of logical OR (operator enable signal low level is effective; be used for showing current random read-write efficient in operation) that is preferably designed for the enable signal of random read-write operator; operate when invalid when random read-write, acquiescence is selected the left circuit-switched data among the figure; RANDRPi among the figure (i=1~4) expression is with machine-readable address, and RANDWPi (i=1~4) represents write address at random; RPi and WPi (i=1~4) represent finally to control the read/write address of window registers heap respectively.Data inputs alternative pack 122 " and the structure of data output section part 1233 ' shown in Figure 12 f; RANDDi among the figure (i=1~4) represents the input Data Source of write operation at random; WIND represents the input Data Source of stationary window; SWIND represents the input Data Source of mobile register window and register rotation; WPiD (i=1~4) expression is piled the corresponding Data Source of write port with window registers, and RFRPORTi (i=1~4) is the output data of the read port piled corresponding to window registers.
By Figure 12 d~Figure 12 f as seen, when increasing a kind of new function---in the time of the random read-write operation, by the reconfigurable design of register file, the hardware resource of increase is very little, and according to reconfigurable design sequential rule, the growth of delaying time on the critical path only is the time that address and data are selected.Its effect is to realize fixedly register window and the mobile register window of Itanium and the read-write operation of spin register of SPARC, and can increase user-defined register manipulation (this sentences the example that is simply operated as of random read-write).
Though illustrated and described better embodiment of the present invention in detail, will be appreciated that and to make variations and modifications to the present invention and do not break away from the scope of claims.

Claims (12)

1, a kind of restructural register file, at least have two kinds of mode of operations, it is characterized in that comprising a register file addresses conversion and alternative pack, a register file and a data input alternative pack, the read-write that the physical address of depositing heap that register read write address process register file addresses conversion in wherein instructing and alternative pack convert associative mode to comes the control register heap, the input data of associative mode are imported alternative pack through data and are written into register file.
2, restructural register file as claimed in claim 1, it is characterized in that described register file addresses conversion and alternative pack comprise a window registers heap address conversion and alternative pack, described register file is divided into a global register heap and a window registers heap, and described restructural register file also comprises an output data alternative pack, wherein the low order address of the register read write address in the instruction directly is used for controlling the read-write of global register heap, register read write address in instruction process window registers heap address throw-over gear simultaneously converts window to and deposits the read-write that the physical address of heap is controlled the window registers heap, from the data of data inputs alternative pack output under the control of a global register heap write address enable signal and window registers heap write address enable signal or be written into the global register heap, perhaps be written into the window registers heap, described output data alternative pack is used for selecting the output correct data between the output data of the output data of global register heap and window registers heap.
3, restructural register file as claimed in claim 2, it is characterized in that conversion of described window registers heap address and alternative pack comprise stationary window address translation parts and moving window and register rotation address converting member at least, with address selection parts, be used for selecting of output of the output of described stationary window address translation parts and described moving window and register rotation address converting member, and described data input alternative pack is used for selecting valid data to write global register heap or window registers heap between register window and mobile register window and two kinds of mode of operations of spin register.
4, restructural register file as claimed in claim 3 is characterized in that also comprising the random read-write address, and therefore, register address conversion and alternative pack and data input alternative pack also reach to import between the data at random at random address respectively and select.
5, as claim 3 or 4 described restructural register files, it is characterized in that described stationary window register file addresses converting member comprises that a current window pointer (CWP) generates parts, an address pretreatment component, each register address that is used for instructing deducts the number of global register; CWP address extension parts are used for the figure place of CWP address is expanded into the required figure place of read-write window registers heap; And an adding unit, be used for the output of pretreatment component and the output addition of CWP address extension parts, obtain the physical address of final window registers heap read-write operation.
6, as claim 3 or 4 described restructural register files, it is characterized in that described moving window address translation parts, comprise pretreatment component, be used for the register address of instruction is deducted the number of global register; The physical address of first register of current moving window (BOF) generates parts; And adding unit, be used for the output valve of pretreatment component and the output valve addition of BOF generation parts are obtained the final physical address of window registers heap.
7, as claim 3 or 4 described restructural register files, it is characterized in that described spin register address generates parts and comprises: a rotation base register (RRB) generates parts, is used for RRB resetted and finishes according to command request RRB is carried out 0 operation clearly and subtracts 1 operation; An adding unit, be used for will instruction register address respectively with the RRB addition, carry out register address rotation; The parts of tabling look-up are used for the register rotary manipulation in the rotation territory of different sizes; The low bit pattern of the result who wherein tables look-up and the OPADD of adding unit, the actual physical address of formation spin register.
8,, it is characterized in that supposing that the visible register number of system in the fixing register window is 2 as claim 3 or 4 described restructural register files A1, the number of global register is 2 B1(b1<a1), the visible register number of the system in the mobile register window is for being 2 A2, the number of global register is 2 B2(b2<a2), then the register number that comprises in the global register of the restructural register file heap is 2 b, and b is the maximal value among b1 and the b2, the number of physical registers that comprises in the window registers heap is 2 m, and m satisfies m=b1+1+k simultaneously, and k is nonnegative integer and 2 m〉=2 A2-2 B2, can carry out the operation of stationary window or the operation of moving window as required.
9,, it is characterized in that the inbound port of writing of described register file is respectively the required unions of writing inbound port and reading port of described at least two kinds of mode of operations with reading port as any described restructural register file of claim 1-4.
10, in the microprocessor of a compatible system, a kind of method of work of restructural register file is characterized in that comprising step:
To instruct some low levels input global registers of the register address in the operator to pile according to the working method of the compatible register file of want;
Thereby will instruct the address of corresponding address translation subassembly generation access window register file in the register address Input Address converting member in the operator according to the working method of the compatible register file of want;
In the time that register will be write, also data are write global register heap or window registers heap according to the working mode selection suitable data input of the compatible register file of want and according to writing enable signal accordingly;
When wanting read register,, then export the data of reading from the global register heap if the high address is zero entirely; If the high address is not zero entirely, then export the data of reading from the window registers heap.
11, a kind of operator extraction method that is used for compatible purpose, described method is characterised in that and may further comprise the steps:
(1) the compatible instruction set function of target is analyzed;
(2) according to the result of compatible target instruction target word set functional analysis, will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator; Source operand is independently become the route operator, corresponding to the read port of register file; Destination operand is independently become the destination register territory of data operator; The operational design that must control a plurality of parts execution simultaneously is a composition operators;
(3) result who gathers functional analysis according to compatible target instruction target word to small part determines internal path;
(4) determine the quantity of route operator and data operator according to internal path;
(5) determine the Data Source territory of function operator and the Data Source territory of data operator.
12, a kind of configurable component method for designing with compatibility is characterized in that may further comprise the steps:
(1) carries out hardware design respectively according to the operator set of the compatible target of difference, determine to satisfy hardware resource, annexation, control relation and the sequential relationship of each operator consolidation function respectively;
(2) will carry out formalized description according to the Component Design that the compatible complex of difference draws;
(3) formalized description with parts carries out optimized overlap-add;
To realize that the complete identical same base part of function set operates (OP) stack, and difference in functionality set (serial stack) effectively the time simultaneously, superpose according to following rule:
<resource rule〉resource requirement of satisfying OP set serial stack is the union of finishing all OP needed corresponding resource under different time scales;
<concatenate rule〉during serial stack (circuit of finishing multiple function is described and superposeed, but has only a kind of function effective at synchronization), identical Data Source can merge, and different Data Source parallel arranged changes corresponding gating controlled condition simultaneously;
<control law〉control of satisfying OP serial stack be described as superposeing before the union described of the control of each OP, to the new controlled condition of same operation be old terms or relation;
<sequential rule〉critical path after the OP serial stack for stack before the maximal value of critical path and the delay value sum of the on-off circuit that increases separately of each OP.
(4) change above formalized description into circuit design.
CN 02126222 2002-07-15 2002-07-15 Register stack capable of being reconfigured and its design method Expired - Lifetime CN1228711C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02126222 CN1228711C (en) 2002-07-15 2002-07-15 Register stack capable of being reconfigured and its design method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02126222 CN1228711C (en) 2002-07-15 2002-07-15 Register stack capable of being reconfigured and its design method

Publications (2)

Publication Number Publication Date
CN1469236A true CN1469236A (en) 2004-01-21
CN1228711C CN1228711C (en) 2005-11-23

Family

ID=34143261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02126222 Expired - Lifetime CN1228711C (en) 2002-07-15 2002-07-15 Register stack capable of being reconfigured and its design method

Country Status (1)

Country Link
CN (1) CN1228711C (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100412789C (en) * 2004-06-25 2008-08-20 富士通株式会社 Reconfigurable processor and semiconductor device
CN103390070A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Reconfigurable operator array structure
CN114008603A (en) * 2020-07-28 2022-02-01 深圳市汇顶科技股份有限公司 RISC processor with dedicated data path for dedicated registers
CN114008604A (en) * 2020-07-28 2022-02-01 深圳市汇顶科技股份有限公司 RISC processor with special purpose register
CN115292053A (en) * 2022-09-30 2022-11-04 苏州速显微电子科技有限公司 CPU, GPU and NPU unified scheduling method of mobile terminal CNN
CN116841618A (en) * 2023-07-04 2023-10-03 上海耀芯电子科技有限公司 Instruction compression method and system, decompression method and system of TTA processor

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100412789C (en) * 2004-06-25 2008-08-20 富士通株式会社 Reconfigurable processor and semiconductor device
CN103390070A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Reconfigurable operator array structure
CN114008603A (en) * 2020-07-28 2022-02-01 深圳市汇顶科技股份有限公司 RISC processor with dedicated data path for dedicated registers
CN114008604A (en) * 2020-07-28 2022-02-01 深圳市汇顶科技股份有限公司 RISC processor with special purpose register
CN115292053A (en) * 2022-09-30 2022-11-04 苏州速显微电子科技有限公司 CPU, GPU and NPU unified scheduling method of mobile terminal CNN
CN115292053B (en) * 2022-09-30 2023-01-06 苏州速显微电子科技有限公司 CPU, GPU and NPU unified scheduling method of mobile terminal CNN
CN116841618A (en) * 2023-07-04 2023-10-03 上海耀芯电子科技有限公司 Instruction compression method and system, decompression method and system of TTA processor
CN116841618B (en) * 2023-07-04 2024-02-02 上海耀芯电子科技有限公司 Instruction compression method and system, decompression method and system of TTA processor

Also Published As

Publication number Publication date
CN1228711C (en) 2005-11-23

Similar Documents

Publication Publication Date Title
CN1135468C (en) Digital signal processing integrated circuit architecture
CN1103961C (en) Coprocessor data access control
CN1246772C (en) Processor
CN1244051C (en) Storing stack operands in registers
CN1584824A (en) Microprocessor frame based on CISC structure and instruction realizing style
CN1186718C (en) Microcontroller instruction set
CN1308818C (en) Dynamic optimizing target code translator for structure simulation and translating method
CN1625731A (en) Configurable data processor with multi-length instruction set architecture
CN1126030C (en) Data processing device
CN1860441A (en) Efficient high performance data operation element for use in a reconfigurable logic environment
CN1605058A (en) Interface architecture for embedded field programmable gate array cores
CN1226323A (en) Data processing apparatus registers
CN1472646A (en) Adaptable compiling device with optimization
CN1890630A (en) A data processing apparatus and method for moving data between registers and memory
CN1107983A (en) System and method for processing datums
CN1306697A (en) Processing circuit and processing method of variable length coding and decoding
CN1484787A (en) Hardware instruction translation within a processor pipeline
CN1269052C (en) Constant reducing processor capable of supporting shortening code length
CN1469241A (en) Processor, program transformation apparatus and transformation method and computer program
CN1103959C (en) Register addressing in a data processing apparatus
CN1228711C (en) Register stack capable of being reconfigured and its design method
CN1137421C (en) Programmable controller
CN1993673A (en) Data processor, data processing program and recording miduem recording the data processing program
CN1104679C (en) Data processing condition code flags
CN1523491A (en) A digital signal processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING DUOSI TECHNOLOGY INDUSTRIAL PARK CO., LTD

Free format text: FORMER OWNER: BEIJING NANSIDA TECHNOLOGY DEVELOPMENT CO., LTD.

Effective date: 20071026

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20071026

Address after: 100091, No. three, building 189, new complex building, maintenance group 3, red pass, Haidian District, Beijing

Patentee after: DUOSI SCIENCE AND TECHNOLOGY I

Address before: 100083 Haidian District Xueyuan Road No. 30 integrated circuit center, Beijing

Patentee before: BEIJING NANSIDA TECHNOLOGY DEVELOPMENT CO.,LTD.

ASS Succession or assignment of patent right

Owner name: BEIJING WISDOM TECHNOLOGY DEVELOPMENT CO., LTD. BE

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20100412

Address after: 100091, No. three, building 189, new complex building, maintenance group 3, red pass, Haidian District, Beijing

Patentee after: DUOSI SCIENCE AND TECHNOLOGY I

Patentee after: BEIJING DUOSI TECHNOLOGY DEVELOPMENT Co.,Ltd.

Patentee after: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

Address before: 100091, No. three, building 189, new complex building, maintenance group 3, red pass, Haidian District, Beijing

Patentee before: DUOSI SCIENCE AND TECHNOLOGY I

PP01 Preservation of patent right

Effective date of registration: 20121018

Granted publication date: 20051123

RINS Preservation of patent right or utility model and its discharge
DD01 Delivery of document by public notice

Addressee: Executive Board of the people's Court of Haidian District executes a court of Zhou Lijia

Document name: Notice of preservation procedure

PD01 Discharge of preservation of patent

Date of cancellation: 20130418

Granted publication date: 20051123

RINS Preservation of patent right or utility model and its discharge
ASS Succession or assignment of patent right

Owner name: NANSI SCIENCE AND TECHNOLOGY DEVELOPMENT CO LTD, B

Free format text: FORMER OWNER: DUOSI SCIENCE + TECHNOLOGY INDUSTRY FIELD CO., LTD., BEIJING

Effective date: 20141010

Owner name: DUOSI SCIENCE + TECHNOLOGY INDUSTRY FIELD CO., LTD

Free format text: FORMER OWNER: BEIJING WISDOM TECHNOLOGY DEVELOPMENT CO., LTD. BEIJING T-MACRO NETWORK TECHNOLOGY CO., LTD.

Effective date: 20141010

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20141010

Address after: 100091, Beijing Haidian District red mountain Yamaguchi 3 maintenance group new building 189, a layer

Patentee after: BEIJING NANSIDA TECHNOLOGY DEVELOPMENT CO.,LTD.

Patentee after: DUOSI SCIENCE AND TECHNOLOGY I

Patentee after: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

Address before: 100091, No. three, building 189, new complex building, maintenance group 3, red pass, Haidian District, Beijing

Patentee before: DUOSI SCIENCE AND TECHNOLOGY I

Patentee before: BEIJING DUOSI TECHNOLOGY DEVELOPMENT Co.,Ltd.

Patentee before: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160225

Address after: 100095, room 108, building G, quiet core garden, No. 25, North Hollywood Road, Beijing, Haidian District

Patentee after: Beijing Duosi security chip technology Co.,Ltd.

Address before: 100091, Beijing Haidian District red mountain Yamaguchi 3 maintenance group new building 189, a layer

Patentee before: BEIJING NANSIDA TECHNOLOGY DEVELOPMENT CO.,LTD.

Patentee before: DUOSI SCIENCE AND TECHNOLOGY I

Patentee before: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

DD01 Delivery of document by public notice

Addressee: Zhou Yan

Document name: Notification of Passing Examination on Formalities

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160713

Address after: 100195, room 106, building G, quiet core garden, No. 25, North Hollywood Road, Beijing, Haidian District

Patentee after: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

Address before: 100195, room 108, building G, quiet core garden, No. 25, North Hollywood Road, Beijing, Haidian District

Patentee before: Beijing Duosi security chip technology Co.,Ltd.

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160721

Address after: 100195, room 109, block G, Beijing quiet garden, 25 North Road, North Hollywood village, Beijing, Haidian District

Patentee after: BEIJING DUOSI TECHNOLOGY SERVICE CO.,LTD.

Address before: 100195, room 106, building G, quiet core garden, No. 25, North Hollywood Road, Beijing, Haidian District

Patentee before: BEIJING TIANHONGYI NETWORK TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 100036 room 731, 7 / F, building 2, 2 Wanshou Road West Street, Haidian District, Beijing

Patentee after: BEIJING DUOSI TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: 100195 room 109, block G, Beijing static core garden 25, North Wu Village, Haidian District, Beijing.

Patentee before: BEIJING DUOSI TECHNOLOGY SERVICE CO.,LTD.

CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20051123