Summary of the invention
Therefore the objective of the invention is to solve above-mentioned about realizing compatible problem.
To achieve these goals, the invention provides a kind of operator extraction method that is used for compatible purpose, described method is characterised in that and may further comprise the steps:
(1) the compatible instruction set function of target is analyzed;
(2) according to the result of compatible target instruction target word set functional analysis, will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator; Source operand is independently become the route operator, corresponding to the read port of register file; Destination operand is independently become the destination register territory of data operator; The operational design that must control a plurality of parts execution simultaneously is a composition operators;
(3) result who gathers functional analysis according to compatible target instruction target word to small part determines internal path;
(4) determine the quantity of route operator and data operator according to internal path;
(5) determine the Data Source territory of function operator and the Data Source territory of data operator.
To achieve these goals, the present invention also provides a kind of compatible configurable component method for designing that has, and it is characterized in that may further comprise the steps:
(1) carries out hardware design respectively according to the operator set of the compatible target of difference, determine to satisfy hardware resource, annexation, control relation and the sequential relationship of each operator consolidation function respectively;
(2) will carry out formalized description according to the Component Design that the compatible complex of difference draws;
(3) formalized description with parts carries out optimized overlap-add;
To realize that the complete identical same base part of function set operates (OP) stack, and difference in functionality set (serial stack) effectively the time simultaneously, superpose according to following rule:
<resource rule〉resource requirement of satisfying OP set serial stack is the union of finishing all OP needed corresponding resource under different time scales;
<concatenate rule〉during serial stack (circuit of finishing multiple function is described and superposeed, but has only a kind of function effective at synchronization), identical Data Source can merge, and different Data Source parallel arranged changes corresponding gating controlled condition simultaneously;
<control law〉control of satisfying OP serial stack be described as superposeing before the union described of the control of each OP, to the new controlled condition of same operation be old terms or relation;
<sequential rule〉critical path after the OP serial stack for stack before the maximal value of critical path and the delay value sum of the on-off circuit that increases separately of each OP.
(4) change above formalized description into circuit design.
To achieve these goals, the present invention also provides a kind of restructural register file, at least have two kinds of mode of operations, it is characterized in that comprising a register file addresses conversion and alternative pack, a register file and a data input alternative pack, the read-write that the physical address of depositing heap that register read write address process register file addresses conversion in wherein instructing and alternative pack convert associative mode to comes the control register heap, the input data of associative mode are imported alternative pack through data and are written into register file.
To achieve these goals, the present invention also provides in the microprocessor of a compatible system, and a kind of method of work of restructural register file is characterized in that comprising step:
To instruct some low levels input global registers of the register address in the operator to pile according to the working method of the compatible register file of want;
Thereby will instruct the address of corresponding address translation subassembly generation access window register file in the register address Input Address converting member in the operator according to the working method of the compatible register file of want;
In the time that register will be write, also data are write global register heap or window registers heap according to the working mode selection suitable data input of the compatible register file of want and according to writing enable signal accordingly;
When wanting read register,, then export the data of reading from the global register heap if the high address is zero entirely; If the high address is not zero entirely, then export the data of reading from the window registers heap.
Overall design philosophy with compatible micro-processor architecture of the present invention is: the reconfigurable design technology that will optimize technique of compiling and processor combines, utilize the reconfigurable hardware design to support the reconfigurable instruction design, with compatible and efficient execution of realizing target instruction target word is gathered.
For this reason, the invention provides a kind of reconfigurable computing system method for designing.So-called system reconfigurable design is meant that system can reorganize according to different compatible target processor architectures, thereby realizes the function of different compatible target processors.This method for designing comprises:
1) instruction set restructural;
2) memory model restructural;
3) interrupt and sign model restructural;
4) register file restructural;
5) streamline restructural
The restructural of instruction set is the basis of realizing the instruction-level compatibility.The restructural of memory model, interruption sign model, register file, streamline is to the reconfigurable hardware supported of instruction-level.Instruction set reconstruct is to be the steps necessary that realizes that the different application demand is carried out, and the implementation procedure of hardware reconstruct is based on the hardware stack and the optimizing process of theoretical model.Instruction set reconstruct shows as the array mode of different operators on the system interface.Hardware reconstruct shows as dissimilar operators on the architecture interface, guarantee to have only the operator of same type can participate in the instruction transformation process when the architecture of a certain particular type of compatibility by optimizing compiler.
When architecture of the present invention is used for the microprocessor System Design, after the instruction of other architecture is carried out binary code conversion and code compaction processing through the optimization compiler, can be carried out efficiently on reconfigurable hardware, the feasible simultaneously application oriented instruction based on this architecture is designed to possibility.
The present invention be more particularly directed to a kind of order structure, it is characterized in that based on explicit hardware cell control (EHCC) thought with compatibility feature:
1. instruction set is made of order format and operator set;
2. order format comprises three parts at least: form control domain CBFF, operator section control domain CONTROL and operator domain, wherein the form control domain is used for indicating different order format, operator section control domain is corresponding one by one with the operator groove, determines that by operator section control domain operator in the operator groove is encoded to the numbering of concrete operator in the operator set of this operator groove correspondence.
3. operator is the least unit of instruction control, is the control coded representation corresponding to the hardware controllable node in realization, is the reflection of hardware cell on the architecture interface that can finish certain function.The result that each operator is carried out will finish one and have certain function operations.According to the difference of the mode of action, operator is divided into function operator, data operator, route operator and composition operators four classes.Its 26S Proteasome Structure and Function is described as follows: the function operator comprises function control domain, source operand control domain, operand width control domain, is used for the control of functional unit (hardware cell that can finish a feature operation that is made of data path controllable node and execution unit controllable node); The data operator comprises source operand control domain, destination operand control domain and lever piece control domain, is used for the control of data cell (by data path controllable node and the hardware cell that can finish a data storage that constitutes of depositing controllable node); The route operator comprises the source operand control domain, is used for the control of routing unit (hardware cell that can finish a switch control that is made of input data path and on-off circuit merely); Composition operators comprises the function control domain, be used for the control of recombiner unit (hardware cell that can finish a kind of specific function that constitutes by indivisible some functional units or data cell), so-called recombiner unit is meant, so-called " indivisible " is meant that the part controllable node in the unit can not produce the action with meaning when controlled, have only when all controllable node are controlled in the unit, just produce action with definite meaning.
4. according to the purpose of design difference, the operator set is divided into two classes: a class is application-oriented demand, designs voluntarily for efficient solution application problem; One class is in order to realize compatible purpose, carry out according to the function of compatible target instruction set, have the corresponding relation of determining between this class operator coding and the compatible target instruction target word set, any instruction in the promptly compatible target instruction target word set can be expressed as operator in gathering an operator or the assembly unit of a plurality of operators.
5. operator can comprise time-delay, ordering, replace the territory; The territory of wherein delaying time shows that can delay time several cycles of this operator carry out, and the ordering territory shows the order that operator is carried out, and replaces the territory and shows the number of times that this operator need repeat.
The order structure that more than has compatible explicit hardware cell control provides a kind of new way that realizes the instruction-level compatibility.Compatible target instruction set directly is converted to and the corresponding operator combination of hardware cell through the processing of optimizing compiler, the part relevant with machine is optimized and code compaction (assembly unit in the compiler by optimizing then, time-delay, ordering, replacement), form the operator stream of variable length macro instruction word (VLMIW), have efficiently execution on the compatible configurable component.Also new instruction provides convenience according to Application Design for the user based on the instruction method for designing of operator.
Significant advantage of the present invention is: the current function set that can finish of hardware is not only reflected at the system interface, and reflect the situation of hardware resource fully, comprise quantity, application characteristic, annexation of various hardware etc., and method is provided, make the user can realize finishing the flow process of command function based on the ardware feature design.In such structure, hardware basis, comprise that functional unit, register, data path, control module etc. directly are exposed to the user, and on command interface, provide direct control device for the user, make the user finish the function of expectation most effectively by direct control to hardware cell.Consequently, the elementary instruction of instruction set and semantic propagation rule, pragmatic rules, syntax rule are determined when hardware design is determined; Each composition in the instruction set can directly be realized by hardware, carries out the efficient height.
Another significant advantage of the present invention is: be convenient to realize the compatibility to the plurality of target instruction set, the operator that compatible target instruction target word set is extracted according to difference can be distributed in the different operator grooves, is combined into towards the instruction of the compatible target architecture of difference by grand processing rule (time-delay, assembly unit, ordering, replacement).
Another significant advantage of the present invention is: order format can horizontal extension, helps the executed in parallel of a large amount of parallel processing parts;
Another significant advantage of the present invention is: the granularity that the size ratio of operator is instructed usually is little, is convenient to the new instruction of application-oriented demand structure, and structure is flexible, and adaptability is strong.
Another significant advantage of the present invention is: the array mode of operator can change, and promptly can change the time that operator is carried out by means such as time-delay, assembly unit, ordering, replacements, can significantly reduce the shared space of instruction code.
Another fundamental purpose of the present invention is to disclose a kind of operator extraction method that is used for compatible purpose, and described method is characterised in that and may further comprise the steps:
(1) the compatible instruction set functional analysis of target.
(2) will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator; Source operand is independently become the route operator, corresponding to the read port of register file; Destination operand is independently become the destination register territory of data operator; The operational design that must control a plurality of parts execution simultaneously is a composition operators.
(3) design internal path.
(4) determine the quantity of route operator and data operator.
(5) determine the Data Source territory of function operator and the Data Source territory of data operator.
Operator method for designing of the present invention adopts the forward design, promptly according to the design of application demand (compatible target instruction target word) derivation operator, for the design of reconfigurable hardware provides foundation.
Significant advantage of the present invention is: the compatible target instruction target word of operator design considerations is carried out, and the implementation effect of the combination of assurance operator or operator is consistent with compatible target instruction target word.
Another significant advantage of the present invention is: the operator according to compatible target instruction target word design can reconfigure, thereby can construct new instruction.
The number that another significant advantage of the present invention is operator is variable, the quantity that is hardware cell is variable, be convenient to utilize duplicating of hardware cell to increase the degree of parallelism that instruction is carried out, improve the efficient that instruction is carried out, and make this architecture have better extensibility.
Another fundamental purpose of the present invention provides the register window structure that a kind of global register and window registers be separated and a kind of method that above-mentioned register file is controlled is provided.If the visible register number of the system in the register window is 2
a, described register window structure is characterised in that and comprises:
A global register heap comprises 2
bIndividual register, b are natural number, b<a, and the read/write address width of global register heap is the b position;
A window registers heap comprises 2
mIndividual physical register, m is a natural number, for stationary window, must satisfy m=b+1+k, k is a natural number, 2
kFor the number of stationary window,, must satisfy 2 for moving window
m〉=2
a-2
b, the read/write address width of window registers heap is the m position;
A window registers heap address converting member according to command function, is that the address translation of a is that the width of access window register file is the physical address of m position with width in the instruction;
An output data alternative pack is selected the output correct data between the output data that the output data and the window registers of global register heap are piled.
The control method of described register file is characterised in that and may further comprise the steps:
(1) reset values of the bottom of stack pointer BOF of the current window pointer CWP of all registers, control stationary window, control moving window is 0 in the register file;
(2), carry out the address and separate and conversion according to the instruction decode mode of operation of specified data as a result.Register window can be divided into stationary window (window in input register, output register, local register number fix) and moving window two classes (input register in the window, output register, local register number can by software set), and principle is distinguished as follows:
(21) stationary window address computation
(211) global register heap read/write address
The low level [b-1:0] of the register address in register file physical address=instruction
(212) window registers stacking yard reason address
(2121) alter mode as CWP is the described mode of step 411 a period of time,
Physical address=(the register address in the instruction-2
b)+{ CWP, 0}
m
Wherein (CWP, 0}
mThe m bit address that expression obtains the effective value arithmetic shift left of CWP.
(2122) when the alter mode of CWP is the described mode two of step 411,
Physical address=(the register address in the instruction-2
b)+{ CWP
Mend, 0}
m
{ CWP wherein
Mend, 0}
mExpression is carried out the effective value supplement of CWP again arithmetic shift left and is got
The m bit address that arrives.
(22) moving window
The low level [b-1:0] of the register address in the address=instruction of global register stacking yard reason
Window registers stacking yard reason address=(register address in the instruction-2
b)+BOF
Wherein BOF is the physical address of first register of current moving window.
(3) finish the register file read-write operation
(31) when the register file write operation is effective, under the control of write address enable signal, have only one between the two effectively to the write operation of global register heap with to the write operation that window registers is piled, Rule of judgment is as follows:
(311) the register write operation is effective, and the register address [a-1:b] in the instruction be complete 0, and is then effective to the write address enable signal of visit global register heap, to window registers pile the write address enable signal invalid;
(312) the register write operation is effective, and the register address [a-1:b] in the instruction is not 0 entirely, and is then effective to the write address enable signal of window registers heap, invalid to the write address enable signal of global register heap;
(32) when the register file read operation is effective, the read operation of global register heap and window registers heap is carried out simultaneously, whether last be that 0 decision condition selects effective sense data to export according to reading an address high position:
(321) register address [a-1:b] in the instruction be complete 0, selects the sense data of global register heap to export;
(322) register address [a-1:b] in the instruction is not 0 entirely, selects the sense data output of window registers heap;
(4) according to the initial value of instruction sum counter, revise current window pointer CWP or current bottom of stack pointer BOF:
(41) the CWP alter mode of stationary window
(411) mode one
(4111) during the SAVE efficient in operation, CWP
N+1=CWP
n-1
(4112) during the RESTORE efficient in operation, recover the last SAVE and operate it
Preceding CWP, CWP
N+1=CWP
n+ 1
(411) mode two
(4111) during the SAVE efficient in operation, CWP
N+1=CWP
n+ 1
(4112) during the RESTORE efficient in operation, recover the last SAVE and operate it
Preceding CWP, CWP
N+1=CWP
n-1
(42) the BOF alter mode of moving window
(421) during the CALL efficient in operation, BOF
N+1=BOF
n+ SOL
n
(422) during the RETURN efficient in operation, recover before the last CALL operation
BOF,BOF
n+1=BOF
n-SOL
n-1
Original advantage of the present invention is: global register and window registers independently address, and can simplify and realize that the register number is 2 in the window registers heap
mThe time register window design, the calculating of window registers address pointer can directly be finished with signless addition, no demand modulo operation has been simplified circuit design.
Another original advantage of the present invention is: this method is applicable to the design of stationary window and moving window, applied range simultaneously.
Another fundamental purpose of the present invention provides a kind of spin register address based on look-up table and generates parts and control method thereof.If register file comprises 2
n(n is a natural number) individual physical register, the size in rotation territory is 2
mSOR doubly, m is a nonnegative integer, and m≤n, SOR=1~s, s are natural number and s≤2
N-m, described address generation parts are characterised in that and comprise:
Rotation base register (RRB) control assembly, rotation base register bit wide is the n position, and reset values is 0, whenever finishes once iterating, and the RRB register value subtracts one, and rotation base register RRB can also be in zero clearing under the instruction control;
The one-level adding unit is finished the add operation of register address (bit wide n position) and RRB register in the instruction;
A hardware consulting table circuit that is made of register or ROM, the rower of this table are that bit wide is the full arrangement of n-m, and the row mark is the SOR territory, and list item is that rower is to row target delivery.
Described control method is characterised in that and comprises the steps:
(1) resets
The reset values of RRB register is 0, and when using register design lookup table circuit, the reset values of list item is the delivery value in the lookup table circuit;
(2) by the rotation territory selection that rotation territory multiple SOR (value 1~s, this value arithmetic shift left m position obtains rotating the territory) in the instruction controls the list item of corresponding respective column is set;
(3) the output valve addition of instruction address and RRB, low m position [m-1:0] is directly as the low m position [m-1:0] that rotates the address among the result of addition, and high n-m bit address [n-1:m] is imported lookup table circuit;
(4) high n-m position [n-1:m] and the SOR with the OPADD of step 3 is that index carries out table lookup operation, with the list item that obtains high n-m bit address [n-1:m] as the rotation address, low m position [m-1:0] combination with addition results in the step 3 becomes final spin register physical address;
(5) RRB successively decreases or clear 0 under instruction control.
(51) identify when effective when the software flow loop branches, RRB subtracts 1 automatically;
(52) instruct when effective as clear RRB, RRB clear 0.
The register rotation is by RAU, and people such as B.R. took the lead in proposing in the research of Cydra5 giant computer in 1989, its objective is the modulo scheduling for support software flowing water, and the general formula of register rotation is: (register address+RRB) mod rotates the territory
Ask the method for spin register physical address to compare with common with modular arithmetic, circuit design structure of the present invention is simple, the efficient height.Above-mentioned formula is changed a little:
(register address+RRB-2
n) mod rotates territory+2
n
Can be implemented in the same register file and guarantee 2
nIndividual register is a static register, does not rotate.It is similar to the above that the address generates parts, only increases definite value 2
nEach once add reducing, the controlling party rule is in full accord.
Original advantage of the present invention is: based on the register rotation of look-up table, can avoid the modulo operation of asking in the computation process of register rotation address, simplify circuit greatly and realize, and can obtain to rotate the address fast.
Another original advantage of the present invention is: the rotation territory can be 2
mMultiple, be a kind of method in common.
Another object of the present invention is to disclose a kind of compatible configurable component method for designing that has, and it is characterized in that may further comprise the steps:
(1) carries out hardware design respectively according to the operator set of the compatible target of difference, determine to satisfy hardware resource, annexation, control relation and the sequential relationship of each operator consolidation function respectively;
(2) will carry out formalized description according to the Component Design that the compatible complex of difference draws;
(3) formalized description with parts superposes;
(4) change above formalized description into circuit design.
The structure of the configurable component that obtains by stack design rule and cluster analysis technology can reconfigure according to dissimilar operator stream, realizes that operator combination that this hardware reconfigures can support the function of the instruction set of different systems.When finishing the instruction-level compatibility, can save hardware resource.
Significant advantage of the present invention is: can carry out conforming specification description to the design with compatible configurable component;
Another significant advantage of the present invention is: can set up unified design rule, be widely used for the design with base part;
Another significant advantage of the present invention is: possess the resource of multiplying that the configurable component design needs, and this resource is distributed, organized and controls;
Another fundamental purpose of the present invention provides a kind of have compatible configurable component, especially a kind of restructural register file.It is characterized in that comprising:
A register file comprises two read ports and a write port at least;
Address selection parts, it is input as the address pointer of realizing the difference in functionality register file operation, and it is output as effective address pointer under the current operation, and the number of OPADD is in importing being the maximal value of finishing the needed address of same function number; And
An input data alternative pack, it is input as the data that write that realize the difference in functionality register file operation, and it is output as and effectively writes data under the current operation, and the number of output data is in importing being the maximal value of finishing the needed data number of same function.
Restructural register file of the present invention has increased address selection parts and input data alternative pack on the basis of general register heap, realize the control of the register file addresses and the input data of difference in functionality demand, just make same register file can satisfy the functional requirement of different operators.
Original advantage of the present invention is: the hardware that the function reconstruct of realization register file needs only is union and two alternative packs for the needed set of hardware of each function of realization, and hardware spending is little;
Another original advantage of the present invention is: the input of address selection and data alternative pack increases and decreases according to the different of compatible target, has good expandability.
Following with reference to accompanying drawing detailed description the specific embodiment of the present invention.
Fig. 1 a to Fig. 1 d is four kinds of approach that realize compatibility relatively, are described as follows:
Same problem uses the processor of different architecture to handle, and will obtain same result.Its basic skills is with higher level lanquage problem to be described, through the scheduling of operating system and the compilation process of optimization compiler, to change into the instruction (hereinafter to be referred as system) of particular architecture, be that processor chips (hereinafter to be referred as hardware) go up execution at the hardware based on this instruction set then.
Do not considering to same result the N paths is arranged from same problem, that is: under the compatible situation
Problem → system A → hardware A → result;
Problem → system B → system B → result;
······
Problem → system N → system N → result.
Below among each figure hardware B be processor according to the compatible highway route design of difference.
With AMD be representative compatible design route as shown in Figure 1a, the path that its instruction is carried out is:
Problem → system A → hardware B → result.
This is a kind of pure compatible, and itself does not design new architecture, but directly according to the architecture of other processor, designs new hardware, satisfies the demand of other processor architecture.Do that so most design and development are avoided by reverse-engineering, but this Model Innovation deficiency of following in sb's footsteps.
The instruction execution path of microprocessor that with Itanium is representative comprises two paths shown in Fig. 1 b:
Problem → system B → hardware B (comprising hardware A) → result;
Problem → system A → hardware B (comprising hardware A) → result;
Promptly integrated different hardware is carried out the instruction of system A and system B respectively in same chip, utilizes jump instruction to realize between the two conversion.Article one, instruction execution path is based on new architecture B, and the second instruction execution path is used for compatible purpose, because hardware B is integrated hardware A own, so the instruction of system A can directly be carried out on hardware B.This method only is applicable to the product design of same company.
The instruction execution path of microprocessor that with Transmeta is representative comprises two paths shown in Fig. 1 c:
Problem → system B → hardware B → result;
Problem → system A → system B → hardware B → result;
Article one, instruction execution path is based on new architecture B, and the second instruction execution path is used for compatible purpose, earlier problem is converted into the instruction of system A, then by software finish system A and system B between instruction transformation, final result is carried out by hardware B decoding.The architecture Design of this mode can keep the characteristics of oneself when realizing compatible system A, promptly can autonomous Design system B.
The defective of this method is that the cost by the conversion of software code is bigger fully.
The instruction execution path of the compatible architecture of MISC can comprise the N paths shown in Fig. 1 d:
Problem → system B → reconfigurable hardware B → result;
Problem → system A → system B → reconfigurable hardware B → result;
Problem → system C → system B → reconfigurable hardware B → result;
······
Problem → system N → system B → reconfigurable hardware B → result;
Article one, instruction execution path is based on new architecture B, and the N-1 bar instruction execution path of back is used for compatible purpose, and its characteristics are:
1. the design of hardware B is that instruction set and the specification of reference hierarchy A, system B, system N designs, and hardware itself has the restructural characteristic.Compatibility is to realize under the compatible parts with restructural characteristic are supported.
2. be convenient to realize compatibility to multiple different architecture.
The instruction executive mode of the compatible architecture of MISC that the mode shown in employing Fig. 1 d designs as shown in Figure 2.The compatible instruction of target (sourse instruction) becomes the operator assembly unit sequence under certain order format restriction that MISC architecture processes device can be discerned by the instruction transformation program, such instruction is deciphered execution by the hardware with restructural characteristic after process code compaction process.
Introduce the command interface of the compatible architecture of MISC below in conjunction with Fig. 3 a and Fig. 3 b.
Different with the order set of common processor, it is operator (comprising the function operator, data operator, route operator and composition operators) that the minimum of MISC architecture is carried out composition, and each operator is corresponding to a definite operation.Instruction is defined as the set of the operation of a certain particular moment execution.Operator set and operator queueing discipline (order format) are the two big key elements that constitute this order set.
The general morphology of order format comprises SYS, CBFF, CONTROL and four parts of OPERATOR shown in Fig. 3 a, wherein SYS is for keeping the position, and CBFF is order format morphology Control territory, and CONTROL is the operator control domain, and OPERATOR is the operator encoding domain.OPERATOR is divided into several operator sections, is designated as Opi, each operator section can assembly unit certain operator in definite operator set; Corresponding to each operator section, corresponding subformat control domain (being designated as CBCFi) is arranged among the CONTROL, the coding of CBCFi is unique has determined that operator section Opi goes up the operator of assembly unit.Code translator is once accepted an instruction word, carries out according to the rule decoding that coding is formulated.
Operator is assembled into instruction word according to certain rule, but in the process of implementation, because instructions such as time-delay and replacement constitute strategy, the actual instruction variable-length of actual execution of phase weekly.
The notion and the structure of relevant operator are described below.
Some controllable node that will have certain contact on function are divided in together, form a unit, become the directly actuated least unit of instruction, and its coded representation is referred to as operator.
So-called controllable node is meant can be by the directly actuated device of instruction in the hardware, and in register file, controllable node can be an independent read-write register, perhaps a register read/write port.The function of instruction is that the controllable node in the above controlled composition of control is gathered to realize certain semantic function.
According to the difference of the hardware cell of being controlled, operator can be divided into function operator, data operator, route operator and composition operators.Instruction realizes control to each hardware cell by four class operators.
The operator sets definition is as follows: OP=FOP+DOP+ROP+COP
Function operator set FOP is defined as:
FOP={ (fopi, tfopi) }, fopi is the function operator, tfopi is the performance period of this function operator.
<character 〉: if (fopi tfopi) belongs to FOP, then:
Accri ∈ ACCR, if accri is worth constantly at t and is VALUEO, t+tfopi becomes VALUE1 constantly; VALUE1 is the result that the content of the Data Source register of fopi sign is operated through the functional part of fopi sign.Claim that fopi is relevant to accri.
The function operator is the coded representation corresponding to functional unit (shown in Fig. 4 a) control.Functional unit (FUNCTIONAL CELL) refers to the hardware cell that can finish a feature operation that is made of data path controllable node and execution unit controllable node.
Can realize that the DAU operator that 32/64 integer signed magnitude arithmetic(al)s are operated is an example, relevant controlled node comprises two data source gatings, data width, action type.Operator domain is constructed as follows:
OPDAU<3:0> | DAUW<0> | RSDAUAx<1:0> | RSDAUAy<1:0> |
Each territory in the operator is represented operation coding (adding, subtract, ask absolute value, supplement etc.) the operand width (32/64) of computing and the control coding in two operand sources respectively.
Data operator sets definition is: and DOP={ (dopi, tdopi) }, dopi is the data operator, tdopi is the performance period of this data operator.
<character 〉: if (dopi tdopi) belongs to DOP, then:
Comri ∈ COMR, if comri is worth constantly at t and is VALUE0, t+tdopi becomes VALUE1 constantly; VALUE1 is the value in the Data Source register of dopi sign, claims that dopi is relevant to comri; Or:
Accri ∈ ACCR, if accri is worth constantly at t and is VALUE0, t+tdopi becomes VALUE1 constantly, VALUE1 is the value of the functional part of dopi sign, claims that dopi is relevant to accri.
The data operator is the coded representation corresponding to data cell (shown in Fig. 4 b) control, and data cell (DATA CELL) refers to by the data path controllable node and deposits the hardware cell that can finish a data storage that constitutes of controllable node.Data cell can be a fixed width, also can be the register of variable-width; Can be single register, also can be register file.On physical significance, the data operator is corresponding to register-stored parts in the hardware model (write port that comprises single register or register file) and Data Source control.
The data operator M TN1RA of control register group write port for example, this operator has 3 territories, and form is as follows:
MTN1RA(8) | |
MTN1NO<2:0> | RSMTN1RA<2:0> | ??MFLD<1:0> |
Wherein, MTN1NO represents register coding (MR0-MR7); RSMTN1RA represents first input end mouth Data Source coding; MFLD represents register section coding (high-end, low side, full word, perhaps invalid).
Route operator sets definition is: and ROP={ (ropi, tropi) }, ropi is the route operator, tropi is the execution time (execution time of route operator is generally less than one-period) of this route operator.
<character 〉: if (ropi tropi) belongs to ROP, then:
Comri ∈ COMR, if comri is worth constantly at t and is VALUE0, t+tropi becomes VALUE1 constantly; VALUE1 is the value in the Data Source register of ropi sign, claims that ropi is relevant to comri; Or:
Accri ∈ ACCR, if accri is worth constantly at t and is VALUE0, t+tropi becomes VALUE1 constantly, VALUE1 is the value of the functional part of ropi sign, claims that ropi is relevant to accri.The route operator is controlled corresponding to the route in the hardware model.
The route operator is the coded representation corresponding to path unit (shown in Fig. 4 c) control.Path unit (ROUTE CELL) refers to merely by importing the routing operations unit that data path constitutes.
Route operator PATH1 with the Data Source selection operation that can realize four tunnel bus B US0, BUS1, BUS2 and BUS3 is an example, and domain of instruction is constructed as follows:
PATH(12) | |
RSBUS0<2:0> | RSBUS1<2:0> | RSBUS2<2:0> | RSBUS3<2:0> |
Wherein the control of each RS domain representation control bus BUS0, BUS1, BUS2 and BUS3 Data Source gate is encoded.
The composition operators set is defined as: and COP={ (copi, tcopi) }, copi is a composition operators, tcopi is the performance period of this composition operators.
<character 〉: if (copi tcopi) belongs to COP, then:
Comrj..., comri, comrj... ∈ COMR, if comri, comrj.. is worth constantly at t and is VALUE0i.VALUE0j...., t+tfopi becomes VALUE1i.VALUE1j.... constantly; VALUE1i, VALUE1j... are the result of the operation of copi sign.Claim that copi is relevant to comri, comrj....
Composition operators is the coded representation corresponding to recombiner unit (shown in Fig. 4 d) control.Recombiner unit (COMBINED CELL) refers to the hardware cell that can finish a kind of specific function that is made of indivisible some functional units or data cell, so-called " indivisible " is meant that the part controllable node in the unit can not produce the action with meaning when controlled, have only when all controllable node are controlled in the unit, just produce action with definite meaning.What for example stack manipulation operator M STACK controlled is exactly a recombiner unit, and relevant controlled node comprises registers all on the storehouse, stack pointer source gate and stack pointer register etc.
Below in conjunction with Fig. 5 is example with SPARC, the instruction form before and after the contrast instruction transformation.The order format of SPARC-V9 is 32 among the figure, comprises source operand, destination operand, command function control domain and order format control domain.After changing into the compatible architecture of MISC, the instruction form is order format control domain, operator section control domain and operator coding, and can increase the reservation position.Operator wherein is four classes such as function operator, data operator, route operator and composition operators, corresponds respectively to the control of different hardware unit.
With the add instruction is example, and the instruction form before and after the conversion is as follows:
SPARC assembly instruction: ADD o2%12%o1%
Realize function: with the operand addition among the register o2,12, the result deposits register o1. in
Instruction form after the conversion is as follows:
The MISC assembly instruction:
ADD?RPORT1,RPORT2||RPORT1?L1||RPORT?L2||WPORT?AUDD
The coding form:
SYS | CBFF | CBCF1 | ???··· | CBCFn | DAU | PATH1 | PATH2 | RFW1 | ????··· |
Wherein SYS is for keeping the position.
Realize function: finish an add operation, two register file read operations and a register file write operation, write operator RFW1 by addition operator DAU, route operator PATH1 and PATH1 and register file respectively and control.Like this, on command interface, what this instruction reflected is totalizer, the parallel control of register file read port 1, register read port 2 and register file write port 1.This design philosophy is called as " explicit hardware cell control (EHCC) "
Fig. 6 is the design cycle with compatible configurable component;
If compatible complex is system A, system B, system N, then the design process of configurable component is:
1. released the formalized description of these parts by different systems, process is as follows:
System A → parts dependent instruction analyzes A → operator extraction A → determine hardware resource A → formalized description A;
System B → parts dependent instruction analyzes B → operator extraction B → determine hardware resource B → formalized description B;
······
System N → parts dependent instruction analyzes N → operator extraction N → determine hardware resource N → formalized description N;
Above process can executed in parallel
2. will carry out the serial stack according to the component form description that different systems draw;
3. the stack design rule according to theoretical model carries out design optimization;
4. finally determine the hardware configuration of configurable component.
Fig. 7 is compatible target instruction target word analysis and operator extraction flow process;
Process is as follows:
1. analyze the sourse instruction function, will be arranged in by the function that same base part is finished together, recompile becomes the operational code of function operator;
2. source operand is independently become the route operator, corresponding to the read port of register file;
3. destination operand is independently become the destination register territory of data operator;
4. according to function, reorganize internal path, form the Data Source territory of function operator and the Data Source territory of data operator, and finally determine the quantity of route operator and data operator;
5. the operational design that must control a plurality of parts execution simultaneously is a composition operators;
6. carry out the operator assembly unit, investigate whether to realize the function of all sourse instructions, if can not realize then change the 1st going on foot, if can realize then the operator extraction end.
Introduce model description and design rule in the components designing below.
In the MISC architecture, the structure of parts, function and use interface can be described by a theoretical model (claiming the Component Design model at this), and a Component Design model model is defined as a five-tuple, that is:
M=(OP,E,C,Ctrl,T)
This five elements that designs a model is:
OP: refer to the operative relationship set, the function that the reflection parts are realized.With the OP operational set is that the basis is designed, and can obtain following E, C, Ctrl and T;
E: refer to the set of resource, as hardware component such as data path, memory bank, logic, comparison, condition, sign, computing and support repetitions, judgement etc. to control the hardware component of behaviors.The realization of OP demand is supported in the set of E;
Ctrl: accuse the system set of relationship, when realizing the OP demand process, to architecture dynamic organization, Ctrl comprises reference mark and steering logic by control relation Ctrl.The set of Ctrl is supported in the dynamic control that realizes the OP demand under the condition of definite resource E and time T.Particularly the controlling element of MISC functional part comprises operator OPERATOR, sign IDS and the steering logic that is produced the controllable node control signal by operator and id signal;
T: refer to the set that is connected with resource, operation, control and the optimization of minimum time relation performance; T is a chronomere, and the OP demand is supported in the set of T.
And annexation set C divides at preceding 4 set, for:
C=Cop+Ce+Ct+Cctrl
Wherein, Cop is the set of behavior annexation, and Ce is the annexation set of hardware cell, and Ct is a time composition annexation, and Cctrl is control composition annexation.
Based on the above-mentioned five-tuple that designs a model, any one behavior or a behavior set (note is made Bi) can be expressed as from 4 aspects:
OP (Bi): the power function of Bi, the process that define behavior Bi implements;
E (Bi): the resource function of Bi.Hardware resource that define behavior Bi uses in implementation process and quantity thereof;
CTRL (Bi): the control function of Bi, define behavior Bi in implementation process to the control of hardware cell.
T (Bi): the function of time of Bi, the time that define behavior Bi implementation process is used;
Procreation property and stipulations according to behavior have:
OP(Bi)=op1&op2&..&opn,opi∈OP,&∈Cop
E(Bi)=e1&e2&...&en,ei∈E,&∈Ce
CTRL(Bi)=ctr11&ctr12&...&ctrln,ctrli∈CTRL,&∈Cctrl
T(Bi)=t1&t?2&....&tn,ti∈E,&∈Ct
More than the formalized description of each composition be described as follows.For sake of convenience, Cop, Ce, Ct, Cctrl are combined explanation with OP, E, T, CTRL respectively.
1, minimum operation behavior set OP and annexation Cop thereof
1)OP={Opb,+}
OPb: first operational set.Definition unit is operating as minimum microoperation, is the mini-components that function is divided, and is indivisible, as the assignment of a register, and an add operation etc., its descriptor is μ OP.Any one operation OP can be decomposed into the combination of each different μ OP constantly, can repeat between the μ OP, but orthogonal between the different μ OP.In the designing a model of register file, unit is operating as assign operation (with " * " expression), is that by hardware cell connects the behavior that realizes, any register manipulation can be described with operation of assignment unit and stack (representing with "+") thereof.
2) Cop={|, || ,=.., { * } }, the serial of expression operation respectively, walk abreast, select, repeat etc.;
" b1|b2 ": promptly in a certain concrete moment, has only one among behavior b1 and the behavior b2 effectively, b1, b2 ∈ OPR, be a kind of repellency " or " relation, it should be noted that " | " symbol only is used in when describing concerning between the operation to use, when describing the sequence of operation, serial operation is to distinguish with different markers.b1,b2∈OP。
" b1||b2 " represents executed in parallel.Promptly in a certain concrete moment, behavior b1 and behavior b2 be the section complete operation at one time, b1, b2 ∈ OP.
" b1=>b2 b3 " expression condition is carried out.Behavior b1 carries out earlier, and when behavior b1 execution result is a true time, behavior b2 carries out, otherwise act of execution b3.b1,b2,b3∈OP。
" b1*b2} " represent to repeat.Behavior b1 carries out earlier, if behavior b1 execution result be true, and act of execution b2 then, otherwise end.Repeat this process, till the execution result of behavior b1 is vacation, b1, b2 ∈ OP.
2, minimal hardware unit set E and annexation thereof set Ce
1) E={e1, e2 ... en}, ei are a hardware cell.
The type of hardware cell comprises:
(1) path unit set P
Refer to constitute the abstract Data Source control element of register, with the MUX is example, when it links to each other with memory body, be expressed as name (capitalization) that P adds memory body (or bus) and the one group of line name that is comprised by bracket [], the part of being separated by symbol " | " in the bracket is for selecting control signal.PE[a for example, b, c, d|Mr] represent that four select one selection control, the whereabouts of data is memory body E, and four circuit-switched data source is respectively memory body A, B, C, D, and Mr is the gating control signal, the output line name of selecting for the multichannel that links to each other with memory body can be expressed as corresponding small letter form pe, can directly be represented by the small letter form of bus name for the output line name that the multichannel that links to each other with bus is selected.
(2) mnemon set M
Refer to constitute the abstract data storage elements of storage/register, memory body (register) name is represented with capitalization.As A (La), B (Lb), C (Lc), R1 (Lr1), R2 (Lr2), R3 (Lr3) ..., the signal with the L beginning in the memory body bracket is a latch control signal;
(3) arithmetic operation unit set A U refers to finish the parts of certain calculation function;
(4) arithmetic logic unit is gathered L, refers to finish the parts of certain logic function;
(5) judging unit is gathered J, refers to finish the parts of comparison arbitration functions;
(6) branch units set B R refers to finish the parts of branch operation;
2) Ce={ ,+, using in order and using simultaneously of resource represented respectively
Eiej: and if only if ei, there is a k in ej ∈ E, ei[out] and ej[in_k] between have an on line;
Ei+ej: and if only if ei, there is not a k in ej ∈ E, ei[out] and ej[in_k] between have an on line;
Connection between the memory body is set up by line, and the output line name (NET name) of definition configurable component is and the identical small letter form of capitalization of representing memory body.For example adder, fau, a, b, c, r1<15:0〉be used for representing totalizer ADDER respectively, floating-point adder FAU, register A, B, C and R1<15:0〉the output line name, wherein<and 15:0〉the expression bit wide, when a line is when being formed by a plurality of different signal combination, with, expression, for example r1<15:8 〉, r0<7:0〉} the new line that combines by the output line of R1 and two registers of R0 of expression, r1<15:8 wherein〉be its most-significant byte, r0<7:0〉be its least-significant byte.
3, minimum time composition set T and annexation Ct thereof
1) T=positive number
2) Ct={max ,+, time maximal value and time addition are got in expression respectively.Carry out if the OP operative relationship is serial, then time relationship is the time addition; If the OP operative relationship is an executed in parallel, then time relationship is the maximal value of the time of getting.
4. minimum is controlled to branch set CTRL and annexation Cctrl thereof
1)CTRL={Operator,IDS,CtrlL}
(1) Operator: the operation operator set of expression configurable component, because configurable component can be realized the function that parts had of different architecture, therefore same parts also are the operational set that realizes above-mentioned functions using on the interface;
(2) IDS: represent the sign set relevant with configurable component, it is the set of the sign that compatible target produced of configurable component;
(3) CtrlL: expression is to the steering logic of each controllable node, steering logic had both comprised sequential circuit such as counter, latch, comprise combinational logic such as code translator etc. again, its input is operator and sign and system signals such as clock and look-at-me, is output as the coding on each controllable node.
Operator and id signal all use corresponding capitalization form to represent, the register controllable node in the steering logic by one group of path signal and latch signal to (M
Ri, L
Ri) represent M
Ri, L
RiThe generation logic describe by Boolean algebra, for the path controllable node by M
PRepresent M
PThe generation logic describe by Boolean algebra.
2) Cctrl={ ,+, order of representation control and control respectively simultaneously.
When carrying out the configurable component design, there is following design rule:
1. (Fig. 8 a): the resource requirement of satisfying OP set serial stack is a union of finishing all OP needed corresponding resource under different time scales to reconfigurable design resource rule.
2. reconfigurable design concatenate rule (Fig. 8 b): (circuit of finishing multiple function is described and is superposeed in the serial stack, but have only a kind of function effective at synchronization) time, identical Data Source can merge, and different Data Source parallel arranged changes corresponding gating controlled condition simultaneously.
3. reconfigurable design control law (Fig. 8 c): the union that the control of each OP was described before the control of satisfying OP serial stack was described as superposeing, to the new controlled condition of same operation be old terms or relation.
4. reconfigurable design sequential rule (Fig. 8 d): the critical path after the OP serial stack for stack before the maximal value of critical path and the delay value sum of the on-off circuit that increases separately of each OP.
Be the method for designing that example is introduced configurable component with restructural register file parts below.
Definition restructural register file is so a kind of register file, when this register file of dissimilar instruction access in the use specified scope, it can adapt to the variation of instruction and change self structure, makes it to show the ardware feature with the instruction same type of being visited.
The reconfigurable design of register file is that the design optimization method by register file is realized on the basis of register reconfigurable design theoretical model.The design optimization of definition register heap is register manipulation set (OPSET
R) at an optimum circuit (E
R, C
R, CTRL
R) on realization.The OP to the effect that of register file design optimization research
RThe principle of optimality in stack, the stepwise refinement process, consequently design optimization regular collection---RULE_DESIGN.Register file reconfigurable design method is a kind of forward design, and its core is to analyze according to demand to determine the operator set, and this need have enough understandings to this hardware configuration.
Fig. 9 is the outer block diagram of restructural register file, this register file is to have 4 read port and 4 are formed by stacking by hardware on the basis of the random read-write register file of write port at random at random common, it can finish the read-write operation of common random read-write register file, the register window operation of SPARC-V9 general-purpose register, the mobile register window of Itanium general-purpose register and register rotary manipulation, it is input as the Data Source of 4 write ports, with finish register file random read-write operation, the control model of register window operation and mobile register window and register rotary manipulation is output as data and the write conflict sign and the window overflow indication of 4 read ports.The generation of write conflict sign is former because the destination address of two write operations is identical in the above-mentioned sign; The window overflow indication comprises overflow and underflow, when window is expired, carry out the SAVE operation again and will produce the overflow sign, when window sky, carry out the RESTORE operation again and will produce the underflow sign, after window overflows, by self-trapping handling procedure control register heap and memory transactions data, up to finishing desired operation.
Feature when Fig. 9 a is used as the random read-write register file for the restructural register file on the system interface.When as the random read-write register file, the restructural register file sees to have 4 read ports and 4 write ports on the system interface, and each read/write port can be visited 128 registers.Control to reading-writing port is controlled by 8 operators such as RANDOMPATH1, RANDOMPATH2, RANDOMPATH3, RANDOMPATH4, RANDOMRFW1, RANDOMRFW2, RANDOMRFW3, RANDOMRFW4.
RANDOMPATHi (I=1-4) operator is four route operators, and the read operation at random of 4 read ports of control is that example is described below with RANDOMPATH1:
1) operator form: comprise data source territory RSRANDOM1<6:0 〉, by first at random read port control from the register file that comprises 128 registers, select data to read;
2) assembler syntax: RANDRD1<Data Source 〉;
Wherein, Data Source is R0-R127.
3) operation is described:
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint: cooperate function operator, data operator and composition operators.
RANDOMRFWi (i=1-4) is four data operators, and four of control register heap write ports at random respectively are that example is described below with RANDOMRFW1:
1) operator form
RDRANDO1<6:0> | RSRFRANDW1<1:0> |
RDRANDO1<6:0 wherein〉be the destination operand address, RSRFRANDW1<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD1, functional part result bus AUDD, the gating of several immediately IMMD and storage port MD0.
2) assembler syntax: RANDWR1<destination register〉<Data Source 〉
Wherein, destination register is R0-R127, and Data Source is PRD1, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore cooperates the RANDOMRFW1 operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
Write operator at random and act on simultaneously when two, and the destination register territory is when identical, it is unusual to produce write conflict.
Feature when Fig. 9 b is used as register window for the restructural register file on the system interface, when as register window, the restructural register file is seen on the architecture interface, is a loop stack that is made of 8 windows, and its operation meets the standard of SPARC V9.
The SPARC-V9 standard is 64 bit processors in the SPARC series, by the SPARC architecture council of SPARCInternational in issue in 1993.Concerning the SPARC processor, whenever the visible general-purpose register of user all is 32, and wherein R0-R7 is global register (Globals), R[0] be complete 0, read-only; R8-R15 is output register (outs), and R16-R23 is local register (locals); R24-R31 is input register (ins).The number of the general-purpose register of SPARC is relevant with realization, can not wait from 64 to 528, corresponding two groups of global registers are with 3 to 32 registers group of comprising 16 registers relevant with machine, registers group overlaps register window, 64 of register lengths.The input register of each register window and output register are overlapping with two adjacent register windows respectively, the window slogan is that the input register of output register (to the number of physical registers NWINDOWS delivery in the register window) and current window of register window of CWP-1 (CWP is the current window pointer) is overlapping, the output register of current window and window slogan are the input register overlapping (to the number of physical registers NWINDOWS delivery in the register window) of the register window of CWP+1, and local register is unique to each register window.The window number that actual software can be used lacks 1 than hard-wired window number, because the output of last register window will be washed out valid data with the input of a oldest register window is overlapping.Invocation of procedure instruction (CALL and JMPL) does not change CWP, and process can be called and not change window.
The control interface of register window is that the instruction analysis by SPARC-V9 obtains.The analytic process of instruction is as follows.
The SPARC-V9 instruction set amounts to and comprises 135 RISC instructions, 32 of instruction lengths, and order format is divided 4 classes, and the instruction of every kind of form is divided into different instruction forms according to the value difference of each control domain, has 31 kinds of different instruction forms.Function difference according to instruction can be divided into following a few class with the SPARC instruction: memory reference order, storage synchronic command, integer arithmetic instruction, transfer-control instruction, condition assignment directive, register window supervisory instruction, status register access instruction, privileged register access instruction, floating-point operation instruction, instruction and the reserve statement relevant with realization.
SPARC-V9 general-purpose register dependent instruction divides two big classes:
First kind instruction is as memory access, integer arithmetic, transfer control, condition assignment, status register visit etc., only use the common read-write capability of register window, be the read-write of corresponding current active window (32 visible registers of user), more single to the operating function of register file;
The instruction of second class is finished the control to window and state thereof for the register window supervisory instruction.Be described below respectively:
1.SAVE and RESTORE instruction
1) assembler syntax:
save?reg(rs1),reg_or_imm,reg(rd)
restore?reg(rs1),reg_or_imm,reg(rd)
2) instruction form
10 | Rd | Op3 | Rs1 | I=0 | --- | Rs2 |
31???30 | 29???25 | 24???19 | 18???14 | 13 | 12???5 | 4????0 |
Perhaps,
10 | Rd | Op3 | Rs1 | I=1 | Simm13 |
31???30 | 29????25 | 24????19 | 18????14 | 13 | 12????0 |
3) implementation
The SAVE instruction provides a routine that uses new register window to carry out.The output register OUT of old window becomes the input register IN of new window, and the value that comprises among the OUT of new window and the local register LOCAL is 0 or the value of the process carried out, and what promptly this process was seen is a clean window; The register window that last SAVE instruction that the RESTORE instruction recovers to be carried out by active procedure is preserved.Input register in the old window becomes the output register in the new window, and input in the new window and local register comprise the value of previous window.
When not producing SPILL/FILL when self-trapping, the effect of SAVE and RESTORE instruction is equivalent to add instruction, just their source operand r (rs1) and/or r (rs2) read from old window the window of original current window pointer CWP addressing (promptly by), and addition results writes the r (rd) window of new CWP addressing (promptly by) of new window.To make a save register window number register CANSAVE subtract one but carry out the SAVE instruction, and can recover register window number CANRESTORE and increase one; Carry out the RESTORE instruction and will make register CANRESTORE subtract one, register CANSAVE increases one.
4) unusual
If CANSAVE=0 A., the execution of SAVE instruction will cause WINDOW_SPILL unusual;
If CANSAVE ≠ 0 B., but clean window number is 0, promptly
(CLEANWIN-CANRESTORE)=0 o'clock, the execution of SAVE instruction will cause
WINDOW_CLEAN is unusual;
If CANRESTORE=0 C., the execution of RESTORE instruction will cause that WINDOW_FILL is unusual.
2.SAVED and RESTORED instruction
1) assembler syntax: SAVED, RESTORED
2) instruction form
10 | Fcn | 110001 | ------------ |
31????30 | 29????25 | 24????19 | ?18?????0 |
3) implementation:
The execution of SAVED instruction makes CANSAVE increase one, if OTHERWIN is 0, CANRESTORE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one, and the SAVED instruction can be used for representing that the SPILL of a window successfully finishes by the self-trapping controller of SPILL;
The RESTORED instruction makes CANRESTORE increase one, if OTERWIN is 0, CANSAVE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one.In addition, if CLEANWIN is not equal to NWINDOWS, the RESTORED instruction will make CLEANWIN increase one.RESTORED instruction can be represented that a window is by FILL successfully by the self-trapping controller of FILL.
3.FLUSHW instruction
1) assembler syntax: FLUSHW
2) instruction form
10 | ------------- | Op3 | ----- | I=0 | ------------ |
31????30 | 29????25 | 24????19 | 18????14 | 13 | 12?????0 |
3) implementation: when the register window outside any one current window comprises valid data, the execution of FLUSHW instruction will be self-trapping by repeating SPILL, make that all valid windows beyond the current window are all spilt into storer.The register window number that comprises valid data calculates with following formula: NWINDOWS-2-CANSAVE, if result of calculation is 0, FLUSHW is invalid, is equivalent to a blank operation.
According to above-mentioned instruction analysis, the operator of design restructural register file correspondence as register window the time is as follows.For for simplicity, the reading-writing port of only considering register file is the primary demand of 3 (two read writes).
1. read operator: WINPATH1<4:0 〉; WINPATH2<4:0 〉
1) operator form: comprise RSWIN1NO<4:0 respectively〉and RSWIN2NO<4:0 two territories, be used to control two operations of reading port, corresponding to RS1 and the RS2 territory in the instruction;
2) assembler syntax:
WINRD1<source-register 〉
WINRD2<source-register 〉
As<source-register〉the register manipulation number encoder as follows:
Coding | Operational character | Coding | Operational character | Coding | Operational character | Coding | Operational character |
00000 | G0 | 01000 | 00 | 10000 | L0 | 11000 | I0 |
00001 | G1 | 01001 | 01 | 10001 | L1 | 11001 | I1 |
00010 | G2 | 01010 | 02 | 10010 | L2 | 11010 | I2 |
00011 | G3 | 01011 | 03 | 10011 | L3 | 11011 | I3 |
00100 | G4 | 01100 | 04 | 10100 | L4 | 11100 | I4 |
00101 | G5 | 01101 | 05 | 10101 | L5 | 11101 | I5 |
00110 | G6 | 01110 | 06 | 10110 | L6 | 11110 | I6 |
00111 | G7 | 01111 | 07 | 10111 | L7 | 11111 | I7 |
Wherein, G0-G7 represents global register (Globle), and 00-07 represents output register (Out), and L0-L7 represents local register (Local), and I0-I7 represents input register (In).
3) operation is described
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint
Cooperate the route operator, should use other function operator, data operator or composition operators.
2. write operator: WINRFW<6:0 〉, the write operation under the control Window state.
RDWNO<4:0 wherein〉be the destination register address, corresponding to the RD territory in the instruction, RSRFW<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD2, functional part result bus AUDD, the gating of several immediately IMMD and storage port MD0.
2) assembler syntax: WINWR<destination register〉<Data Source 〉
Wherein, the operational character of destination register is identical with the source-register of reading operator with geocoding, and Data Source then is PRD2, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore, cooperates the WINRFW operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
The G0 perseverance is 0, and when destination operand was G0, it was invalid to operate.
3.DAU<5:0〉operator
1) operator function
Finish plus-minus method, full add subtraction, SAVE and the RESTORE of 64 integers.Situation when this paper only discusses this operator and is used for SAVE and RESTORE operation.
2) assembler syntax
SAVE????reg
rs1,reg_or_imm,reg
rd
RESTORE?reg
rs1,reg_or_imm,reg
rd
3) DAU<5:0〉the operator form:
OPDAU<3:0> | RSDAUx<0> | RSDAUx<0> |
As OPDAU<3:0 〉=1111 the time, carry out the SAVE operation; As OPDAU<3:0 〉=1101 the time, carry out the RESTORE operation.
4) implementation is described:
DAU is the monocycle operator.At first, carry out the operand gating, carry out 64 additive operations according to operational code then, when being used for SAVE and RESTORE, source operand r (rs1) and/or r (rs2) read from old window the window of original current window pointer CWP addressing (promptly by), and addition results writes the r (rd) window of new CWP addressing (promptly by) of new window, produces error identification or other id signals simultaneously, and whether decision revises the CCR marker register according to coding.
SAVE operation provides one to use routine that new register window carries out (according to the definition of SPARC V9 architecture, CWP added 1 when SAVE was effective), the register window (according to the definition of SPARC V9 architecture, CWP subtracted 1 when RESTORE was effective) that last SAVE instruction that the RESTORE operation recovery is carried out by active procedure is preserved.SAVE operation simultaneously and RESTORE operation will be revised status register CANSAVE and CANRESTORE.CANSAVE is used to write down and is positioned at the register window number that is not used behind the CWP, and CANRESTORE is used for being recorded in the register window number that has been used by present procedure before the CWP.The SAVE operation makes register CANSAVE subtract one, and register CANRESTORE increases one, and the reset values of CANSAVE subtracts 2 (for current window and overlaid windows) for the physical window number, when the CANSAVE register is 0, carries out the SAVE operation and will cause the window overflow; The RESTORE operation makes register CANSAVE increase one, and register CANRESTORE subtracts one, and the reset values of CANRESTORE is 0, when the value of CANRESTORE register is 0, carries out the RESTORE operation and will produce the window underflow.
4.OPWIN<1:0〉operator
1) operator function: the management window state, carry out SAVED, RESTORED and FLUSHW operation.
2) assembler syntax:
SAVED
RESTORED
FLUSHWIN
3) OPWIN<1:0〉the operator form:
OPWIN<1:0 〉, it is as follows to encode,
OPWIN<1:0> | Operation | OPWIN<1:0> | Operation |
00 | SAVED | 10 | FLUSHW |
4) implementation is described:
Carrying out the SAVED operation makes CANSAVE increase one; If status register OTHERWIN is 0, CANRESTORE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one, and wherein OTHERWIN is the effective value that contains the address space outside the space, current address.
Carrying out the RESTORED operation makes CANRESTORE increase one; If OTERWIN is 0, CANSAVE subtracts one, if OTHERWIN is not 0, OTHERWIN subtracts one.In addition, if status register CLEANWIN is not equal to NWINDOWS (for the register window sum of physics realization), the RESTORED instruction will make CLEANWIN increase one, and wherein the CLEANWIN indication can not produced the unusual register window number of CLEAN_WIN by SAVE instruction use.
Carry out the situation of FLUSHW operation and divide two kinds: when NWINDOWS-2-CANSAVE is not 0, it is self-trapping to produce a SPILL, by self-trapping handling procedure control, after finishing the overflowing an of window, will re-execute the FLUSHW operation, be spilt into storer up to all register windows except that the current active window; When NWINDOWS-2-CANSAVE was 0, FLUSHW was equivalent to a blank operation (NOP).
Feature when Fig. 9 c is used as mobile register window and spin register for the restructural register file on the system interface.When as mobile register window and spin register, its operation meets the working specification of Itanium architecture.
The Itanium general-purpose register comprises 128 register GR0-GR127, to the program of all authorities all as seen, each register is 65, most significant digit is NAT (Not a thing) position, be used for the predicted anomaly sign, represent that when NAT is 1 register comprises a delay abnormality mark, whether effective and concrete execution is relevant for the interior data of register this moment, restructural register file of the present invention is not supported this function, so register width only is 64.
It is static general-purpose register territory that register in the Itanium general-purpose register is divided into two subclass: GRO-GR31, and GR0 is complete 0, and is read-only; GR32-GR127 is storehouse general-purpose register territory.Static register GR0-GR31 to all processes as seen, and corresponding to each process a corresponding mobile register window (shifting window) is arranged in the stack register territory, the size of window can be by software definition, between 0-96, change, automatic exchange parameter when overlapping CALL and RETuRN operation by register between the window, thus visit avoided to storer.When process is switched, static register must carry out SAVE and RESTORE operation according to the software convention, and the switching of moving window is finished automatically by hardware in the stack register, does not need explicit software intervention, and the rename application programs of register is sightless.The moving window size is decided by SOF and two parameters of SOL, and SOF and SOL are set by instruction, and SOF is the size of moving window, initial value is 96, SOL is the number (comprising input register and local register) of local register in the window, and initial value is 0, both poor of the number of output register.When carrying out the CALL instruction, the physical address of the GR32 of current active window becomes the address of GR32 in the last window and the SOL sum of a last window, and the big or small SOF of new window is the output register territory of a last window, and the SOL of new window and SOR are 0.Recover original SOF and SOL when carrying out the RETURN instruction.Actual physical stack register number is relevant with realization, but is necessary for 16 even-multiple, and minimum is 96.
The part of stack register can be a spin register by software definition, is used for acceleration cycle to handle.When register rotated, the physical register address computation of actual access was as follows:
Physical register number=(register number<6:0 that instruction provides 〉+RRB) mod spin register territory.
Wherein, RRB is the spin register base register, and 7, initial value is 0, whenever finishes to subtract one after once iterating.The rotation territory of spin register is defined as from GR32, and size is 8*SOR<3:0 〉, SOR can be by software set, and initial value is 0, is 12 to the maximum, and the maximal value of promptly rotating the territory is 96.Have only when RRB is 0, instruction could change the size in rotation territory in the register stack.Usually, guarantee that by software the rotation territory is not overlapping with the domain output of active window, perhaps before the output parameter register is set, at first RRB is put 0.
Dependent instruction to the Itanium general-purpose register is analyzed as follows.
Itanium processor adopting IA-64 architecture, it is unit that instruction is carried out with the instruction group, the instruction group can by one or arbitrarily many instruction bundles (bundle) form.Each instruction bundles of 128 comprises three 41 bit instruction grooves and one 5 s' Template Information territory, instruct 41 long, divide 6 types: integer ALU class, non-ALU integer class, the storer class, the floating-point class, branch's class and the instruction of expansion class have the instruction of kind more than 110 form.Instruction is carried out from certain given instruction bundles address and certain instruction slots, comprise up to first stop or all instruction slots that increase according to the order of sequence and instruction bundles till shifting branch, the IA-64 architecture allows many intrafascicular independent instructions of emission different instruction, also can be at many instruction bundles of a clock period emission.The instruction relevant with general-purpose register also can be divided into two big classes in the IA-64 instruction set:
The first kind is the read-write operation to static register and current mobile register window FRAME, is example with the ADD instruction, and form is as follows:
?????8 | | ???x2a | ???Ve | ????x4 | ????x2b | ?????r3 | ?????r2 | ????r1 | ????qp |
??40???37 | ?36 | ?35??34 | ???33 | ??32???29 | ???28???27 | ???26???20 | ???19???13 | ??12???6 | ???5???0 |
The instruction of second class realizes the control to register stack and spin register, is described below respectively:
1.Alloc?Stack?Frame
1) assembler syntax:
(qp)allocr1=ar.pfs,i,1,o,r
2) instruction form:
????1 | | ????x3 | | ?????sor | ???????Sol | ???????Sof | ?????r1 | ?????qp |
??40???37 | ???36 | ?35???33 | ??32???31 | ???30???27 | ?????26????20 | ?????19????13 | ??12????6 | ????5???0 |
3) implementation: a new mobile register window is assigned on the GRS, and Previous Function State (PFS) register is copied on the general-purpose register GR1.The change of mobile register window size is finished immediately, writes GR1 and other operation is subsequently all carried out on new mobile register window.I, 1, o, r represent the magnitude range of input register, local register, output register and spin register respectively.For new mobile register window, SOF (size of local frame) is i, l and o three's sum, and SOL (size of local regeon) is i and 1 sum, and input register and local register are not distinguished physically, the rotation territory is less than SOF, and size is 8 multiple.
4) unusual: attempt to revise SOR (size of local rotating) territory when alloc instructs, and the RRB register is not at 0 o'clock, it is unusual to produce Reserved register/Field; If SOF is greater than 96, perhaps SOR is greater than SOF, and the generation illegal operation is unusual; If there are not enough registers to finish the distribution of mobile register window, processor will produce waits for finishing of STORE operation, and produces relevant abnormalities.
2.Branch
1) assembler syntax:
(qp)br.btype.bwh.ph.dh?target
25
(qp)br.btype.bwh.ph.dh?b1=target
25
br.btype.bwh.ph.dh?target
25
br.ph.dhtarget
25
(qp)br.btype.bwh.ph.dhb2
(qp)br.btype.bwh.ph.dhb1=b2
(qp)br.ph.dh?b2
2) instruction form:
A.IP-relative branch:
??????4 | ????S | ??d | ????Wh | ?????Imm20b | ????P | | ??btype | ????qp |
???40???37 | ???36 | ??35 | ???34???33 | ?????32???13 | ???12 | ???11????9 | ??8????6 | ???5???0 |
B.Indirect branch:
????0 | ???S | ???d | ???wh | ?????x6 | | ?????B2 | ???p | | ???btype | ???qp |
??40???37 | ??36 | ??35 | 34???33 | ???32???27 | | ???15???13 | ??12 | ????119 | ???8???6 | ??5??0 |
3) implementation: finish branch condition and judge, produce branch operation or continue subsequent operation.For IP relationship type branch, the target in the compilation
25Be the branch target address sign, actual destination address is imm21=target
25-IP>>4; For indirect type branch, destination address is BRb2.The type and the function of branch are as shown in the table:
Branch pattern | Function | Branch condition | Destination address |
Cond?or?none | Conditional branching | The Qualifying predicate | IP_rel, or indirect |
Call | Conditioning process calls | The Qualifying predicate | IP_rel, or indirect |
Ret | Conditioning process returns | The Qualifying predicate | indirect |
Ia | IA32 instructs activation | Unconditionally | indirect |
Cloop | Decide loop branches | Cycle counter | IP_rel |
Ctop,cexit | Modulo scheduling circulates surely | Cycle count and coda counter | IP_rel |
Wtop,wexit | The indefinite circulation of modulo scheduling | Qualifying predicate and coda counter | IP_rel |
Each branch's class declaration is as follows:
(1) cond if Qualifying predicate (qp) is 1, produces branch, otherwise does not take place.
(2) Call if qp is 1, produces branch, has following operation: CFM (current frame marker), EC (coda counter) and current authority to be arrived PFS (previous functional state) register by SAVE simultaneously; The mobile register window of caller is by SAVE, and callee is distributed a new mobile register window automatically, and size is the output register territory for caller; RRB register among the CFM is by clear 0; Return the LINK value and be written into BR b1.
(3) return if qp is 1, produces branch, following operation is arranged simultaneously: recover CFM, EC and current authority by the PFS register; The mobile register window of caller is by RESTORE.
(4) ctop and cexit, implementation is shown in Fig. 9 d.Operation is described below.
Ctop or cexit efficient in operation,
Cycle counter LC is not equal to 0,
LC subtracts one, and EC is constant for the coda counter, and RRB subtracts one, the register rotation;
Cycle counter LC equals 0
Coda counter EC is greater than 1
LC is constant, and EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 1
LC is constant, and EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 0
LC is constant, and EC is constant, and RRB is constant, and circulation is withdrawed from.
(5) wtop and wexit, implementation is shown in Fig. 9 e.Operation is described below.
Predicate register file PR[qp] be not equal to 0,
EC is constant for the coda counter, and RRB subtracts one, the register rotation;
Predicate register file PR[qp] equal 0
Coda counter EC is greater than 1
EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 1
EC subtracts 1, and RRB subtracts 1, the register rotation;
Coda counter EC equals 0
EC is constant, and RRB is constant, and circulation is withdrawed from.
(6) cloop, LC are not 0 o'clock, and LC subtracts one, and branch produces.
(7) IA, unconditional transfer
3.Clear?RRB
1) assembler syntax:
clrrrb
clrrrb.pr
2) instruction form:
?????0 | | ??????x6 | | ????Qp |
??40????37 | ????36????33 | ????32????27 | ????26????6 | ??5????0 |
3) implementation: the ALL type is clear 0 with all RRB (general-purpose register, flating point register heap and predicate register file heap), and the PRED type is only with the RRB register clear 0 of predicate register file heap.
4) unusual: this instruction is necessary for the last item instruction in the instruction group, otherwise produces the illegal operation mistake.
4.Cover?Stack?Frame
1) assembler syntax: cover
2) instruction form:
?????0 | | ??????x6 | | ????Qp |
??40????37 | ????36????33 | ???32????27 | ????26????6 | ??5????0 |
3) implementation: distributing a new size is that 0 storehouse moves register window, and this moves the register that does not comprise in the register window in any last mobile register window, and RRB is reset.
4) unusual: this instruction is necessary for the last item instruction in the instruction group, otherwise produces the illegal operation mistake.
5.Flush?register?stack
1) assembler syntax: flushrs
2) instruction form:
????0 | | ????x3 | ???x2 | ?????x6 | | ????Qp |
??40???37 | ?36 | ??35???33 | 32???31 | ???30????27 | ????26????6 | ??5????0 |
3) implementation: the register in the DIRTY territory in all stack registers (comprising that all previous processes move the register that does not deposit in as yet in the register window in the backing register) is written into backing memory.
4) unusual: this instruction is necessary for article one instruction in the instruction group, and must be arranged in the SLOT0 and the SLOT1 of instruction bundles, otherwise the result does not have definition.
6.Load?register?stack
1) assembler syntax: loadrs
2) instruction form:
?????0 | | ?????x3 | ????x2 | ???????x6 | | ?????Qp |
??40???37 | ?36 | ??35???33 | ??32???31 | ????30????27 | ????26????6 | ???5????0 |
3) implementation: the value that is arranged in current BSP pointer some in the past in this instruction assurance storer is written into the DIRTY territory of stack register, and all other registers are flagged as invalid, but do not deposit backing memory in.The data of LOAD what by the decision of RSC.loadrs register, and when this register was 0, its effect was invalid for all registers beyond the current mobile register window are changed to.
4) unusual: this instruction is necessary for article one instruction in the instruction group, and must be arranged in the SLOT0 and the SLOT1 of instruction bundles, otherwise the result does not have definition.
According to above-mentioned instruction analysis, the design operator is as follows.Discuss for convenience, only consider the primary demand of 3 register ports (two read one writes), more the register file operator method for designing of multiport is identical.
1. read operator: ROTPATH1<6:0 〉; ROTPATH2<6:0 〉.
1) operator form:
Comprise RSROT1NO<6:0 respectively〉and RSROT2NO<6:0 two territories, be used to control two read ports operations.
2) assembler syntax
ROTRD1<source-register 〉
ROTRD2<source-register 〉
Wherein source-register is the arbitrary register among the R0-R127.
3) operation is described
Finish the gating of Data Source, and the result of gating write on the output bus of register file, a concrete operand gating process is: by route operator gated data from the register of determining, get selected data by the operand source control coding of function operator or data operator etc. from this data path again and operate.The data of gating should be the data at previous cycle stability under the control of route operator.State when the bus under the route operator keeps the last PATH to be called, and the state of bus can protected and recovery when interrupting.
4) use constraint
Cooperate the route operator, should use other function operator, data operator or composition operators.
2. write operator: ROTRFW<8:0 〉, the write operation under the control Window state.
RDRTO<6:0 wherein〉be the destination operand address, RSRFW<1:0〉be the Data Source control signal, be used to control 4 circuit-switched data source: read port data PRD3, functional part result bus AUDD counts the gating of IMMD and storage port MD0 immediately.
2) assembler syntax
ROTWR<destination register〉<Data Source 〉
Wherein, destination register is the arbitrary register among the R0-R127, and Data Source is PRD3, AUDD, IMMD and MD0.
3) operation is described
The monocycle operator is finished the gating of Data Source, and the result of gating is write a certain register in the current register window, and this register is determined by the destination register territory in the operator.
4) use constraint
Because described<Data Source〉what define is " path ", rather than the visible concrete register of user, therefore, cooperates the ROTRFW operator, should use the route operator to realize the selection of concrete register simultaneously.
5) unusual
The R0 perseverance is 0, and when destination operand was R0, it was invalid to operate.
Annotate: above register R0-R127 as source operand/destination operand is by 7 bit address sequential encodings.
3. mobile register window assignment operators Allocframe<17:0 〉
1) function: the size in mobile register window length, local register territory and rotation territory is set, and the rotation domain register is one 4 bit register, and the value in actual rotation territory moves to left 3 for the value of rotation domain register.
2) operator form:
Sor<3:0> | sol<6:0> | Sof<6:0> |
3) assembler syntax:
ALLOC#i,#l,#o,#r
// annotate: sol=I+1, sof=I+1+o, sor=sor<<3.
4) operation is described
The monocycle operator is finished moving window size and register rotation territory and is provided with, and the result that this operator is carried out will revise SOR, SOL and SOF register.
4. branch controls operator BRANCH<3:0 〉
1) function: fixed circulation of control and indefinite round-robin branch operation.
BTYPE<2:0〉be the type control domain, it is as follows to encode:
BTYPE<2:0> | Operation | BTYPE<2:0> | Operation |
000 | Ctop | 100 | CLOOP |
001 | CEXIT | 101 | CALL |
010 | Wtop | 110 | TETURN |
011 | WEXIT |
| 111 | Keep |
RSTYPE<0〉be source, address control domain, the control address source is: IMMD (RSTYPE<0〉be 0), SMDI (RSTYPE<0〉be 1).
3) operation is described
For the CALL operation, will revise BOF register (BOF is the physical address of first register of current window), automatically new SOF territory is arranged in original SOF-SOL zone, window is moved.
For the RETURN operation, recover previous BOF, SOL and SOF, window moves in the other direction.
3) assembler syntax:
CTOP?IMMD
CEXIT?IMMD;
WTOP?IMMD;
WEXIT?IMMD;
CLOOP?IMMD;
CALL?IMMD;
CALL?REG;
RETURN?REG;
5. operator OPSTK<1:0 〉
1) function: the managing stack state, carry out clrrrb, cover, flushrs and Loadrs operation.
2) operator coding:
OPSTK<1:0> | Operation | OPSTK<1:0> | Operation |
00 | CLRRRB | 10 | FLUSHRS |
01 | COVER | 11 | LOADRS |
3) assembler syntax:
CLRRRB;
COVER;
FLUSHRS;
LOADRS;
4) implementation is described:
During the CLRRRB efficient in operation, the RRB register is put 0;
During the COVER efficient in operation, SOF, SOL, SOR and RRB register whole clear 0;
During the FLUSHRS efficient in operation,, carry out the STORE operation if BSP is not equal to BSPSTORE;
During the LOADRS efficient in operation, the value of all registers between BSP and BSP-Number_of_Bytes is advanced register stack by LOAD, and is changed to the DIRTY state.
The register window structural representation that Figure 10 separates with window registers for global register.These register window parts comprise window registers heap address converting member 101, window registers heap 102, global register heap 103 and data output alternative pack 104.
If the visible register number of the system in the register window is 2
a, the number of global register is 2
b(b<a), then the register number that comprises in the global register heap is 2
b, the number of physical registers that comprises in the window registers heap is 2
m, for stationary window, m=b+1+k, k is a nonnegative integer, for moving window, 2
m〉=2
a-2
b
When carrying out write operation, register address in the instruction (register source operand in the route operator and the register destination operand in the data operator, bit wide is physical register address (bit wide is m) the access window register file 102 that the process window registers heap address converting member 101 in a) converts the overlapping characteristic of coincidence window to, the low order address corresponding to the global register number [b-1:0] of instruction address is directly visited global register heap 103 simultaneously, in arbitrary concrete moment, write operation can only act on global register heap and window registers and pile among both one, and whether the write operation of register file is effectively determined by the write address enable signal.The write address enable signal carries out logical combination and forms by writing in operator enable signal and the instruction high position [a-1:b] of register address: when write operator enable to instruct simultaneously in a register address high position when being 0, effective to the write operation of global register heap; Write operator enable to instruct simultaneously in a register address high position be not entirely at 0 o'clock, effective to the write operation of window registers.The global register heap is identical with the write operation Data Source of window registers heap.
When carrying out read operation, [b-1:0] section of register address in the instruction is visit global register heap 103 directly, carry out address translation access window register file later on through window registers heap address converting member 101 simultaneously, the principle of address translation is identical with write operation, select correct data to export by output data alternative pack 104 between the output data of two separation register files, whether the condition of output is complete 0 for the high position [a-1:b] of register address in the instruction: if the address high position [a-1:b] of read operation is complete 0, then effective to the read operation of global register heap, select the data output of global register heap; If be not complete 0, then effective to the read operation of window registers heap, select the data output of window registers heap.
Above structure not only is applicable to the fixedly design of register window, also is applicable to the design of mobile register window.Difference between the two is: for the fixedly design of register window, the register number in each register window and the size of register input, part and domain output are fixed; For the design of mobile register window, register number in each register window and register local field and domain output size are able to programme.Contact between the two is: they all comprise a global register (or claiming a static register) territory and a window registers territory (or claiming the stack area), global register to arbitrary process as seen, and each window in the window registers territory only to a certain detailed process as seen, overlap each other between the window and constitute a loop stack, the input register territory of new window comprises the valid data in the output register territory in the old window, switching by window reduces the visit for storer, to improve executing efficiency.
More than the realization of two kinds of register windows all can adopt parts shown in Figure 10 to finish, difference between the two mainly is the generation principle difference of window registers heap read/write address converting member.What Figure 10 a and Figure 10 b described is the address translation parts of window registers heap in the stationary window design, and what Figure 10 c and Figure 10 d described is the address translation parts of window registers heap in the moving window design.
Parts shown in Figure 10 a are used for fixing the window registers heap address conversion of register window corresponding to the window registers heap address converting member 101 among Figure 10.Because size and register input domain, local field and the domain output of stationary window are all determined, so can determine the physical address of register by current window pointer CWP uniquely.Window registers heap address converting member in the stationary window design comprises:
CWP (current window pointer) generates parts 1011, its reset values that is used to reset 0, with be used to control reset signal RST, SAVE operation enable signal and the RESTORE operation enable signal that CWP changes, wherein SAVE and RESTORE signal are that functional domain according to the enable signal EDAU (low level is effective) of operator DAU and operator DAU carries out logical combination and produces; It is output as value CWP1 and the CWP2 of the CWP of two different sequential, be respectively applied for the generation of reading address and write address, this is because the sequential of the read pointer of SAVE operation and RESTORE operation and write pointer is different, be source operand from old window (SAVE and RESTORE operation finish before window), and destination operand is from new window (SAVE and RESTORE operation finish before window).According to the definition of up-to-date SPARC architecture, the alter mode of CWP has two classes: the mode that SPARC V8 adopts be SAVE when effective CWP subtract 1, CWP added 1 when RESTORE was effective; The mode that SPARC V9 adopts be SAVE when effective CWP add 1, CWP subtracted 1 when RESTORE was effective.1011 specific implementation will further specify in Figure 10 b.
Address pretreatment component 1012 deducts definite value 2 with each register address in the instruction
b
CWP address extension parts 1013 with the CWP logical shift left, extend to the m position, when realizing the conversion of SPARCV9 window address, also need to increase supplement operation, CWP increases one during with assurance SAVE efficient in operation, and the output register of the input register of N+1 window and N window is overlapping simultaneously.
Adding unit 1014.The value of the CWP1 of 1011 outputs after through 1013 address extension with finish add operation through the 1012 pretreated addresses of reading by adding unit 1014, obtain the physical address of final window registers heap read operation; The value of the CWP2 of 1011 outputs after through 1013 address extension with finish add operation through 1012 pretreated write addresses by adding unit 1014, obtain the physical address of final window registers heap write operation.
Figure 10 b is that example is described further Figure 10 a with the window registers heap address converting member of the fixedly register window that is made of 8 windows, a=5 wherein, b=3, k=3, m=b+1+k=7.Each register stationary window comprises 32 (2
a) individual register, they are respectively by 8 (2
b) individual global register, 8 input registers, 8 local registers and 8 output registers constitute, and the window number is 8 (2
k), the number of total physical register is 136 (2
b+ 2
m).The feature operation of register window meets the standard of SPARC V9, and the enforcement of window registers heap address converting member is described below.
For for simplicity, the situation of two read operations and a write operation executed in parallel only is discussed, need produce three address WINRP1 simultaneously, WINRP2 and WINWP (two read one writes) this moment.From the register destination operand territory of the register source operand field and the data operator of path operator, address width is 5 to read/write address in the instruction, corresponding to 32 registers in the window respectively.When carrying out read-write operation, low three direct visit global register groups (comprising 8 registers) of address, 5 bit register addresses in the instruction convert 102 (the comprising 128 registers) of window registers heap among 7 physical register addresses visit Figure 10 of the overlapping standard of coincidence window to through the window registers heap address converting member 101 among Figure 10 simultaneously.
CWP generates parts 1011 ' and comprises (3 in 3 registers, corresponding to 8 windows), preserve CWP respectively, the value of CWP-1 and CWP+1, definition according to SPARC V9 architecture, when the SAVE efficient in operation, select 1 value that increases of CWP, when the RESTORE efficient in operation, select 1 value that subtracts of CWP, select the CWP initial value when not carrying out SAVE and RESTORE operation, the input channel of IMMD (being 0 in a preferred embodiment) is used to reset, when reset signal RST is effective, select IMMD, the CWP reset values is 0.Because the source operand of SAVE and RESTORE operation is from old window, and destination operand is from new window, therefore CWP generates parts and exports two values that the CWP register latchs front and back simultaneously, and the value CWP1 after latching is used for read operation, and the value CWP2 before latching is used for write operation.Two the output valve CWP1 of CWP and CWP2 become 7 physical addresss (mending 40 behind CWP) through CWP address extension parts 1013 ', because it is overlapping that SPARC V9 defines the output register of the input register of n window and n-1 window, therefore before address extension, increase supplement operation (negate adds).5 bit address RS1 of register in the instruction, RS2 and RD are after deducting definite value 8 through pretreatment component 1012 ', respectively with the output valve addition (parts 1014 ') of parts 1013 ', obtain final window registers stacking yard reason address, comprise that two are read address WINRP1, WINRP2 and one write physical address WINWP.
Like this, the 5 bit register addresses that provide corresponding to instruction can be mapped to global register heap GR0-GR7 and above the window registers heap RF00-RF7f, and realize the overlapping of window.The register of each window actual access is as follows:
For No. 0 window, the actual register of visit is GR0-GR7, RF00-RF17;
For No. 1 window, the actual register of visit is GR0-GR7, RF70-RF07;
For No. 2 windows, the actual register of visit is GR0-GR7, RF60-RF77;
For No. 3 windows, the actual register of visit is GR0-GR7, RF50-RF67;
For No. 4 windows, the actual register of visit is GR0-GR7, RF40-RF57;
For No. 5 windows, the actual register of visit is GR0-GR7, RF30-RF47;
For No. 6 windows, the actual register of visit is GR0-GR7, RF20-RF37;
For No. 7 windows, the actual register of visit is GR0-GR7, RF10-RF27.
Figure 10 c is the another kind of window registers heap read/write address converting member based on global register shown in Figure 10 and window registers separate design method, because therefore the register number in this window registers heap in each register window can claim mobile register window again by software set.Register in the mobile register window also comprises two parts: a part is global register (claiming static register again), and this component register to arbitrary process as seen; Another part is a window registers, register window corresponding to each detailed process has nothing in common with each other, the physical register start address of each window and window size are determined by BOF, SOF, three parameters of SOL, BOF is the physical address of first register of current window, SOF is the size of moving window, and SOL is the size in local register territory in the moving window.These two values of SOF and SOL can be by software set, SOL≤SOF, and the modification of BOF then is by hard-wired, its general alter mode is: BOF
n=BOF
N-1+ SOL
N-1, BOF wherein
n, SOL
N-1The SOL value of representing the BOF and n-1 the moving window of n moving window respectively.Each moves the size of register window can be different, but between the window by overlapped formation loop stack, the output register territory OUTS of the overlapping region current window of mobile register window, because mobile register window is not distinguished input register territory and output register territory on hardware, actual output register territory is determined by following formula: SOF-SOL.The overlap mode of mobile register window is: when carrying out the CALL operation, the output register territory of current window becomes the SOF of new window automatically, the SOL of while parent window and the BOF that the BOF sum becomes new window, when carrying out ALLOC (window size batch operation) operation, the value of SOF and SOL can enlarge according to the requirement of instruction or dwindle; When carrying out the RETURN operation, recover CALL operation BOF, SOF and SOL before.
Parts shown in Figure 10 c are used for the window registers heap address conversion of mobile register window also corresponding to the window registers heap address converting member 101 among Figure 10.This structure comprises:
Pretreatment component 1015 deducts (comprising read/write) number of global register with the register address in the instruction;
BOF (physical address of first register of current moving window) generates parts 1016, and it is input as the local register territory SOL and control signal CALL, the RETURN operation enable signal (CALL and RETURN signal are formed by the functional domain and the enable signal EBRANCH logical combination of BRANCH operator) of current window; Be output as the BOF value (physical address of first register of current window) of new window; Function is to revise the value of BOF according to instruction definition, and the reset values of BOF is 0, when the CALL efficient in operation, BOF automatically and the local register territory SOL addition of current window, form the BOF value of new window, when the RETURN efficient in operation, BOF reverts to the BOF value of previous window.
Adding unit 1017, with the output valve of parts 1015 respectively with 1016 output valve addition, obtain the final physical address (comprising read/write) of window registers heap, be used to visit the window registers heap of mobile register window.
Figure 10 d introduces the window registers heap address converting member that the Itanium that realizes according to this method moves register window, is further specifying Figure 10 c.
According to the definition of Itanium architecture, the Itanium general-purpose register comprises the visible register of 128 individual system, and it is static general-purpose register territory that these general-purpose registers are divided into two subclass: GR0-GR31; GR32-GR127 is storehouse general-purpose register territory.The address translation of Itanium general-purpose register mainly contains two classes: a class is mobile register window address translation, and another kind of is register rotation address translation.The register rotation is what to carry out on the basis of mobile register window, and rotation only limits to the SOL inside, local register territory (the register rotation under the mobile register window will further specify) of the mobile register window of corresponding active procedure in Figure 11 c.When as mobile register window, static register GR0-GR31 in the Itanium general-purpose register to all processes as seen, and corresponding to each process a corresponding mobile register window is arranged in the stack register territory, the size of window can be by software definition, between 0-96, change, automatic exchange parameter when overlapping CALL and RETURN operation by register between the window, thus visit avoided to storer.The moving window size is decided by SOF and two parameters of SOL, and SOF is the size of moving window, and initial value is 96, and SOL is the number (comprising input register) of local register in the window, both poor of the number of output register.When carrying out the CALL operation, the physical address BOF of the GR32 of current active window
nBecome the address BOF of GR32 in the last window
N-1Local register territory SOL with a last window
N-1Sum, the output register territory (SOF of a last window
N-1-SOL
N-1) become the SOF of new window automatically
nWhen carrying out the ALLOC operation, three values of SOF, SOL and SOR are set simultaneously under instruction control; Recover the last CALL operation SOF and SOL before when carrying out the RETURN instruction.
Above-mentioned Itanium general-purpose register can realize with parts shown in Figure 10, and wherein mobile register window address conversion module 101 can realize by the structure shown in Figure 10 c, this moment a=7, b=5, m=7.Because static register to arbitrary process as seen, so the minimum value of the register number that comprises of moving window is 32 (2
b), and maximal value is 128 (2
a), the number of total physical register is 160 (2
b+ 2
m), physical circuit is shown in Figure 10 d.For for simplicity, the situation (need produce three physical addresss this moment simultaneously, and two read writes) of two read operations and a write operation executed in parallel only is discussed.
Read/write address in the instruction (RS1<6:0 〉, RS2<6:0, RD<6:0) respectively from the register destination operand territory of the register source operand field and the data operator of path operator, when carrying out the register read write operation, register read write address RS1 in the instruction, low 5 direct access static register files of RS2 and RD, each read/write address (7) converts overlapping 7 the physical register address access window register files (comprising 128 registers) of coincidence window to through the window registers heap address converting member shown in Figure 10 d simultaneously.By write address enable signal decision and effectively to one of write operation of window registers heap to the write operation of global register heap, wherein the write address enable signal is the high-order RD<6:5 by write address〉and write the operator enable signal and combine, combination condition is: write that operator enables and RD<6:5 〉=00 o'clock, write operation to the global register heap is effective, write that operator enables and RD<6:5〉00 o'clock, effective to the write operation of window registers heap.Select the output of one of the sense data of global register heap and the sense data of window registers heap by reading useful signal control data output selection device, wherein read useful signal by reading the high-order RS1<6:5 in address〉be combined into RS2<6:5, combination condition is RSi (i=1,2) high two is 0 o'clock, the sense data of global register heap is effective, high two of RSi (i=1,2) is not 0 o'clock, and the sense data of window registers heap is effective.
In the window registers heap address converting member shown in Figure 10 d, comprise and subtract 32 pretreatment component 1015 ', by 37 adding units 1017 ' that totalizer constitutes, and comprise the BOF that the loop stack of N register constitutes by 7 totalizers and one and generate parts 1016 '.The function that BOF generates parts 1016 ' is that BOF is carried out reset operation and according to the value of modifying of order BOF, when the CALL efficient in operation, and BOF
n=BOF
N-1+ SOL
N-1, when the RETURN efficient in operation, BOF reverts to the BOF value of previous window.Parts 1016 ' are a kind of preferred structures of realizing above-mentioned functions, the register cycle storehouse that this structure can be come to determine by a register number as required constitutes, MUX is a gate among the figure, LAT is a latch, ADDER is a totalizer, when carrying out the SAVE operation, (value of SOL is determined by the ALLOCFRAME operator according to the Itanium instruction fetch to select BOF and local register territory SOL by signal 10161 control gates, the operating result of ALLOC is preserved by relevant register) additive value, refresh stack top register BOF, cooperate by 10161 and 10162 signals simultaneously and carry out push operation; During the RETURN efficient in operation, play stack operation, recover the initial value of BOF by signal 10161 and 10162 cooperations.Wherein signal 10161 and signal 10162 are combined by CALL signal and RETURN signal logic, and its principle is: circuit-switched data source, a control gate MUX gating left side when carrying out the CALL operation, and all latchs are opened simultaneously, promptly carry out a stack-incoming operation; When carrying out the RETURN operation, control gate MUX gating right wing Data Source, all latchs are opened simultaneously, promptly carry out one and go out stack operation.
Because the stack area of Itanium definition general-purpose register is from GR32, so the 7 bit register addresses that provide corresponding to instruction can be mapped to global register heap static register heap SR0S-SR31 and above the window registers heap WINRF (RF00-RF7f), and realize the overlapping of window.
Figure 11 is the spin register address generation block diagram based on look-up table.Spin register is a kind of register file control technology that produces in order to adapt to the development of optimizing technique of compiling, is used for the modulo scheduling of support software flowing water, and the name of eliminating data when round-robin scheduling is relevant.Spin register heap provides a kind of register renaming mechanism, makes in the new circulation that software flow constitutes that in fact the write operation to some registers in the instruction writes distinct register continuously, thereby guarantees correct semanteme.A spin register heap has a corresponding with it spin register base register (RRB).Article one, the register number of appointment adds the summation of RRB value in the instruction, and register rotation territory is used to actual register address on the mould.Special branch operation makes RRB cut down when new each time iteration begins in the modulo scheduling, comes to distribute different registers for the identical operations in the different iteration with this and comes event memory.
If the number of physical registers of register file is 2
n, rotation territory multiple is SOR (Size OfRotating), the size in actual rotation territory is SOR*2
m, SOR be big I by software set, value 1-s, s are natural number and s*2
m<2
n, m can be by architectural definition, and m is that 0 o'clock SOR is the rotation territory.
Address when parts shown in Figure 11 are used for the register rotation generates, and these parts comprise 3 parts:
Rotation base register (RRB) generates parts 111, and its function is RRB to be resetted and finish according to command request RRB is carried out 0 operation clearly and subtracts 1 operation.It is input as control signal CLR (controlling clear 0 operation) and rotation useful signal ROTATING (control RRB decrement operations); Be output as the rotation base RRB that this iterates.When the CLR signal is effective, RRB is put 0.The generation of CLR signal may be the RESET that resets, also may be clear 0 operation (for example by the CLRRRB operation of controlling according to the OPSTK operator of ITANIUM instruction fetch) in the instruction, the ROTATING signal then be to carry out logical combination by the operator of the special branch operation of control (for example by the BRANCH operator that is used to control CTOP, CEXIT, WTOP, WEXIT operation according to the ITANIUM instruction fetch) functional domain and enable signal thereof to form.
Adding unit 112, with the register address in the instruction respectively with the RRB addition, carry out register address rotation;
The parts 113 of tabling look-up, the full arrangement of the high position of the OPADD with 112 [n-1:m] and all possible value of SOR as rower and row mark, ask mould as list item to the row target rower respectively, are used for the register rotary manipulation in the rotation territory of different sizes.The low level of the OPADD of result who tables look-up and adding unit 112 [m-1,0] combination, the actual physical address of formation spin register.Above rower is selected and can be changed according to the different of physics realization mode with the row target.
Figure 11 a is an example of above-mentioned spin register, n=7 in this structure, m=3, s=12.This be one by 128 (2
n) register file that individual register is formed, the rotation territory can be 8 (2
m) 1-12 register rotary indicator doubly generate the structural drawing of parts.For convenience's sake, establishing the register file port number is 3 (two read one writes).This structure comprises a RRB who is made up of 7 digit counters and generates parts 111 ', and clear 0 by CLR signal controlling RRB when carrying out RRB and empty operation, RRB subtracts 1 when carrying out the rotation branch operation.In the preferred structure of Figure 111 ', MUX21_7 is 7 a gate, and DEC_7 is 71 device that subtracts, and DFF_7 is 7 a trigger.The latch signal 1111 of RRB is the logical OR relation of CLR and ROTATING.When resetting or carrying out clear 0 when operation of RRB, select definite value 1 by the CLR signal, trigger DFF preserves the output valve 0 that subtracts 1 device, and RRB is put 0; When the register rotation took place, the gate acquiescence was selected the value of feedback of RRB, and trigger DFF preserves 1 value that subtracts of RRB, and RRB successively decreases.
This structure also comprises an adding unit 112 ' and the parts 113 ' of tabling look-up.Adding unit 112 ' is used for three register address RS1 with instruction, RS2 and RD (two read one writes) respectively with the RRB addition, then with the value of high 4 [6:3] of each addition results and SOR (4) together as the input of lookup table circuit 113 ', and with the list item of output high 4 [6:3] as spin register physical address (comprise two read write), piece together 7 new bit address (comprise two read write) respectively with low three [2:0] of 112 ' OPADD, promptly form register rotation physical address corresponding to register address in the instruction.
Figure 11 b is the gauge outfit and the list item of lookup table circuit 113 ' among Figure 11 c, and wherein row mark 1131 is the binary representation of SOR (1-12), and rower 1132 is high four the full arrangements of 112 ' parts OPADD, and list item 1133 is that rower is to row target delivery value.Rower and row mark can be comparatively speaking, the implementation of form also can be ROM or register, a kind of preferred hardware implementation is to store with 12 64 bit registers to ask the mould value, when resetting, data in the register are reset to the data of each row among Figure 11 b, when the SOR value is determined in the ALLOC operation, at SOR[3:0] control under a gating column data (be kept in the register of a 64bit) corresponding with current SOR, when the register rotary manipulation that causes when branch instruction is effective, only high 4 with the address of tabling look-up are index, asking in the mould value of 16 4bit at these row selected, and obtains corresponding list item.The 3rd merging of this list item and the address of tabling look-up promptly becomes the postrotational physical address of register.When adopting this method to carry out register rotation address translation, only select the time-delay cost of one gate can finish the modular arithmetic of asking of rotating the address with one 16.
Figure 11 c is the circuit diagram of the spin register address translation parts in the moving window of realizing that Figure 11 b is combined with Figure 10 d, and the purpose that designs this structure is in order to realize the respective operations of Itanium processor general-purpose register.As mentioned above, the Itanium general-purpose register piles up the architecture interface to comprise a size is that 32 static register territory (GR0-GR31) and size are 96 storehouse territory (GR32-GR127).The register in storehouse territory is made up of the programmable mobile register window of size, the adjunct register window formation loop stack that overlaps each other.
The Itanium general-purpose register supports mobile register window and register to rotate two generic operations simultaneously, the effect difference of two generic operations, the former effect be the process that overlaps by register window swap data is to reduce the visit to storer when switching, the latter's effect then is the modulo scheduling for support software flowing water on hardware.
The effect of aforesaid operations is described below: when carrying out the CALL operation, and the physical address BOF of the GR32 of current active window
nBecome the address BOF of GR32 in the last window
N-1Local register territory SOL with a last window
N-1Sum, the output register territory (SOF of a last window
N-1-SOL
N-1) become the SOF of new window automatically
nWhen carrying out the ALLOC operation, three values of SOF, SOL and SOR are set simultaneously under instruction control; Recover the last CALL operation SOF and SOL before when carrying out the RETURN instruction; The operation of executive software flowing water loop branches (CTOP, CEXIT, WTOP, in the time of WEXIT), register rotates, but the register rotation is what to carry out in the scope of mobile register window, the size in rotation territory is SOR*8.The register rotation can only (window registers) be carried out in the stack area, and the rotation territory is 8 1-12 times, and the rotation territory is defined as from GR32.
The Itanium general-purpose register can realize with parts shown in Figure 10, the circuit structure of wherein mobile register window part carried out description in conjunction with Figure 10 d, because the register rotation is only carried out in window registers heap (register stack territory), therefore when on the basis of Figure 10 d, increasing the register spinfunction, only need to revise the address translation parts of window registers heap, to the visit of global register heap and constant to the control of data output alternative pack.The circuit diagram of the window registers heap address converting member that increase register spinfunction obtains on the basis of Figure 10 d in order discussing conveniently, only to be introduced two and is read a three-address architecture of writing shown in Figure 11 c, and more read/write address transfer principle is identical.
In Figure 11 c, the physical address of window registers heap produces the path two.In article one path, read/write address in the instruction (RS1<6:0 〉, RS2<6:0, RD<6:0) through after pretreatment component 1015 ' deducts definite value 32 separately, directly go to rotation address selection control assembly 114, select this value and BOF to generate the BOF value addition of parts 1016 ' output by control signal 1141 controls, obtain final physical address, the physical address of this moment only is the address of carrying out mobile register window operation, when register does not rotate, can select this path for use.In the second path, read/write address in the instruction (RS1<6:0 〉, RS2<6:0 〉, RD<6:0 〉) through pretreatment component 1015, after deducting definite value 32 separately, generate the RRB value addition of parts 111 ' output respectively with RRB, high 4 and SOR with the output valve of adding unit 112 ' are index accesses lookup table circuit 113 ', low 3 amalgamations of output valve of the list item that obtains and parts 112 ' of tabling look-up form input rotation address, rotation addresses alternative pack 114, select this circuit-switched data and BOF to generate the BOF value addition of parts 1016 ' output by control signal 1141 controls, obtain final physical address, this address is the register physical address when carrying out the register rotation in the mobile register window.
114 parts are rotation address selection control assembly, and its controlled condition is register rotation useful signal ROTATING, and this signal is combined by functional domain and the BRANCH operator enable signal according to the operator BRANCH of branch of Itanium instruction fetch.When register rotates when invalid, select above-mentioned article one path to produce physical address, when register rotates when effective, select above-mentioned second path to produce physical address.Owing to be to adopt mobile register window that static register shown in Figure 10 and window registers separated structures realize the Itanium general-purpose register and register rotary manipulation, therefore register GR0-GR31 is mapped on the global register heap GPR (GPR00-GPR31), GR32-GR127 is mapped on the window registers heap WINRF (RF00-RF7f), the reset values of BOF is 0, corresponding to first register among the window registers heap WINRF.
Figure 12 is the structural representation of restructural register file.A restructural register file comprises three parts at least:
Address translation and address selection parts 121, register address in the instruction is converted to the register file physical address that satisfies the System Design requirement, control simultaneously and from the register physical address that difference in functionality obtains, select the current effective operation address, register file is conducted interviews, according to reconfigurable design resource rule, the number of reading address and write address of parts 121 outputs is respectively to finish the needed union of reading address and write address number of each reconstruct element, this means that the reading-writing port number that register file need be provided with is the union that realizes the reading-writing port number of each function needs.
A data input alternative pack 122, the register that control obtains from difference in functionality write selects current effective to write data the data, cooperate the write operation of realizing register file with the register write address.
Register file parts 123, according to reconfigurable design resource rule, required the number of registers is for realizing the union of the needed register number of each function, and the number of register file reading-writing port is also for realizing the union of the needed register file reading-writing port of each function number.
Control signal 124 is selected signal for mode of operation, be controlled under a certain definite mode of operation, selector is should the input data and the read/write address of pattern, this signal can be combined by the enable signal of the register manipulation operator of inhomogeneity function, also can realize by in instruction, increasing the mode of operation control domain, when putting in order unanimity, the signal that control address is selected to select with control data is identical.
Figure 12 a is the restructural register file structure block diagram with register window, mobile register window and two kinds of mode of operations of spin register.For for simplicity, every kind of mode of operation only considers that all two read a primary demand of writing.Wherein, stationary window comprises 32 registers on the architecture interface, and the number of global register, input register, local register and output register all is 8; Moving window comprises 128 registers on the architecture interface, wherein static (overall situation) the number of registers is 32, and can realize register rotary manipulation on the basis of mobile register window, the register rotary manipulation can only carry out in the zone of non-static register.The method for designing that this register file adopts global register shown in Figure 10 to separate with window registers comprises as lower member:
Register address conversion and alternative pack 121 ', comprise: stationary window address translation parts 1211, the register address 125 that is used for will instructing under the stationary window mode of operation is converted to the physical address of window registers heap, and concrete change-over circuit has detailed description in Figure 10 b; Moving window and register rotation address converting member 1212, the register address 126 that is used for will instructing under moving window and register rotary work pattern is converted to the physical address of window registers heap, concrete change-over circuit is seen Figure 11 c, it should be noted that owing to the R32-R127 of the physical register in the window registers heap corresponding to register address in the instruction, therefore the initial physical addresses BOF reset values of R32 is 0, corresponding to first physical register of window registers heap; Address selection parts 1213, be used for selecting effective address access window register file WINRF (RF00-RF127) between register window and mobile register window and two kinds of mode of operations of spin register, the inner structure of address selection parts 1213 will further specify in Figure 12 c;
Data input alternative pack 122 ', be used between register window and mobile register window and two kinds of mode of operations of spin register, selecting valid data to write global register heap GPR (GR00-GR31) or window registers heap WINRF (RF00-RF127), because every kind of pattern has only a write port, therefore these parts can be reduced to the selection of importing data to two in this example, a kind of combination that is preferably designed for the enable signal of operator of selecting signal 124 ' (with read-write operator enable signals all in the generic operation or relation, operator enable signal low level is effective), be used for showing that current operation is register window operation or the operation of mobile register window;
Register file 123 ', owing to adopt global register heap shown in Figure 10 and window registers heap separated structures, so register file 123 ' comprises global register heap 1231, window registers heap 1232 and data output alternative pack 1233.
Wherein, global register heap 1231 comprises 32 64 bit registers, 2 write ports and 4 read ports.This is because when as stationary window, need 8 registers, and need 32 registers during as moving window and spin register, according to reconfigurable design resource rule, total register number is 32, and the reading-writing port number is that 6 (4 read 2 writes) are because the read/write address width difference of the global register heap between the different mode.When being used for fixing register window when operation of compatible SPARC V9, the read/write address of global register heap is the low level of each read/write address in the address signal 125: WRS1[2:0], WRS2[2:0] and WRD[2:0]; Move register window when operation when being used for compatible Itanium, the read/write address of global register heap is the low level SWRS1[4:0 of each read/write address in the address signal 126], SWRS2[4:0] and SWRD[4:0].The inner structure of global register will further specify in Figure 12 b.
Window registers heap 1232, comprise 128 64 bit registers, physics realization corresponding to the GRS district of the window registers of SPARC and Itanium, because the reading-writing port number of two kinds of pattern needs all is 3 (two read one writes), according to reconfigurable design resource rule, the reading-writing port number of register file is both unions, promptly also is 3 reading-writing port (two read one writes);
Data output alternative pack 1233, its function are to select correct sense data when global register and window registers separation between global register heap and window registers heap, and the inner structure of data output alternative pack will further describe in Figure 12 c.
Figure 12 b is the structural drawing of global register heap 1231 among Figure 12 a.During as the stationary window register file, two read 3 global register addresses (3) of writing acts on register GR0-GR7; During as mobile register window and spin register, two read 3 static register addresses (5) of writing acts on register GR0-GR31.Wherein GR0-GR7 is overlapping resource, its preferred structure sees 12311 ', the input data of register are in two kinds of patterns, according to the reconfigurable design control law, latch controlled condition and be that two kinds of patterns latch controlled condition or relation, and the latch signal of control register GR8-GR31 is only relevant with a kind of mode of operation of mobile register window, and its basic structure is shown in Figure 123 12 '.
Figure 12 c is the structural drawing of address selection parts 1213 and data output alternative pack 1233 among Figure 12 a.The address selection parts are realized the selection of reading an address and a write address to two respectively, WRPi (i=1 among the figure, 2) and SWRPi (i=1,2) represent respectively the rotation of stationary window and moving window and register window registers heap read the address, WWP and SWWP represent the write address of the window registers heap of stationary window and moving window and register rotation respectively, and alternative condition is each self-corresponding operator enable signal of writing.Data output alternative pack is realized the selection corresponding to two sense datas of two sense datas of the global register heap of a certain mode of operation of register file and window registers heap respectively, alternative condition is whether the enable signal and the address high position of operator is 0, (signal 12331 and signal 12332 correspond respectively to the selection signal under stationary window and the moving window pattern), GPRRPORTi among the figure (i=1-4) is the data of the output port of global register heap, GPRRPORT1 wherein, output data when GPRRPORT2 represents that global register is GR0-GR7, GPRRPORT3, output data when GPRRPORT4 represents that global register is GR0-GR31; RFRPORTi (i=1-2) is the data of the output port of window registers heap; The final output data of WRPORTi (i=1,2) expression stationary window, the final output data of SWPORTi (i=1,2) expression moving window and register rotation.
Figure 12 d is the restructural register file structure block diagram with register window, mobile register window and spin register, three kinds of mode of operations of random read-write register file.Random read-write pattern wherein comprises the executed in parallel of 4 read operations and 4 write operations.
The basic structure of this register file and Figure 12 a are similar, and the main difference part is to have increased random read-write address 127 (comprising that 4 are read address and 4 write addresses, each 7).According to reconfigurable design resource rule, this moment, the reading-writing port number of window registers heap 1232 ' actual needs was 8 (four read four writes), needed modified address alternative pack 1213 ', data input part part 122 simultaneously ", the structure 1233 ' and the address/data of data output section part select control signal 124 ".The structure of address selection parts 1213 ' is shown in Figure 12 e, 14 addresses of input (comprise that reading 1 for 2 of fixing register window writes, reading 1 for 2 of mobile register window writes, and read 4 for 4 of random read-write register file and write) (4 read 4 writes to merge into 8 read/write address through address selection, union for the needed read/write address number of above different mode), control four read ports and four write ports of window registers heap respectively, alternative condition 124 " a kind of logical OR (operator enable signal low level is effective; be used for showing current random read-write efficient in operation) that is preferably designed for the enable signal of random read-write operator; operate when invalid when random read-write, acquiescence is selected the left circuit-switched data among the figure; RANDRPi among the figure (i=1~4) expression is with machine-readable address, and RANDWPi (i=1~4) represents write address at random; RPi and WPi (i=1~4) represent finally to control the read/write address of window registers heap respectively.Data inputs alternative pack 122 " and the structure of data output section part 1233 ' shown in Figure 12 f; RANDDi among the figure (i=1~4) represents the input Data Source of write operation at random; WIND represents the input Data Source of stationary window; SWIND represents the input Data Source of mobile register window and register rotation; WPiD (i=1~4) expression is piled the corresponding Data Source of write port with window registers, and RFRPORTi (i=1~4) is the output data of the read port piled corresponding to window registers.
By Figure 12 d~Figure 12 f as seen, when increasing a kind of new function---in the time of the random read-write operation, by the reconfigurable design of register file, the hardware resource of increase is very little, and according to reconfigurable design sequential rule, the growth of delaying time on the critical path only is the time that address and data are selected.Its effect is to realize fixedly register window and the mobile register window of Itanium and the read-write operation of spin register of SPARC, and can increase user-defined register manipulation (this sentences the example that is simply operated as of random read-write).
Though illustrated and described better embodiment of the present invention in detail, will be appreciated that and to make variations and modifications to the present invention and do not break away from the scope of claims.