CN1382280B - For designing automatic processor generation system and the method thereof of configurable processor - Google Patents
For designing automatic processor generation system and the method thereof of configurable processor Download PDFInfo
- Publication number
- CN1382280B CN1382280B CN00812731.XA CN00812731A CN1382280B CN 1382280 B CN1382280 B CN 1382280B CN 00812731 A CN00812731 A CN 00812731A CN 1382280 B CN1382280 B CN 1382280B
- Authority
- CN
- China
- Prior art keywords
- instruction
- user
- processor
- module
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 claims abstract description 98
- 230000015654 memory Effects 0.000 claims description 51
- 230000014509 gene expression Effects 0.000 claims description 34
- 238000004088 simulation Methods 0.000 claims description 27
- 238000003860 storage Methods 0.000 claims description 20
- 230000006399 behavior Effects 0.000 claims description 17
- 230000002194 synthesizing Effects 0.000 claims description 17
- 230000015572 biosynthetic process Effects 0.000 claims description 16
- 238000003786 synthesis reaction Methods 0.000 claims description 16
- 238000005538 encapsulation Methods 0.000 claims description 8
- 241000196324 Embryophyta Species 0.000 claims description 7
- 229910002056 binary alloy Inorganic materials 0.000 claims description 4
- 238000000151 deposition Methods 0.000 claims description 4
- 230000002452 interceptive Effects 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 238000006011 modification reaction Methods 0.000 claims description 4
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 230000003139 buffering Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 235000005035 ginseng Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000007374 clinical diagnostic method Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 18
- 230000018109 developmental process Effects 0.000 abstract description 14
- 238000011161 development Methods 0.000 abstract description 9
- 239000011800 void material Substances 0.000 description 99
- 230000003068 static Effects 0.000 description 95
- 238000004422 calculation algorithm Methods 0.000 description 40
- AYEKOFBPNLCAJY-UHFFFAOYSA-O thiamine pyrophosphate Chemical compound CC1=C(CCOP(O)(=O)OP(O)(O)=O)SC=[N+]1CC1=CN=C(C)N=C1N AYEKOFBPNLCAJY-UHFFFAOYSA-O 0.000 description 31
- 235000008170 thiamine pyrophosphate Nutrition 0.000 description 30
- 238000011068 load Methods 0.000 description 26
- 239000000203 mixture Substances 0.000 description 25
- 101710040661 DTYMK Proteins 0.000 description 22
- 238000005516 engineering process Methods 0.000 description 21
- 230000000875 corresponding Effects 0.000 description 14
- 230000000694 effects Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 238000007906 compression Methods 0.000 description 11
- 230000001965 increased Effects 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 9
- 238000005457 optimization Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 7
- 230000004087 circulation Effects 0.000 description 6
- 238000007689 inspection Methods 0.000 description 6
- 239000011449 brick Substances 0.000 description 5
- 238000006073 displacement reaction Methods 0.000 description 5
- 230000004899 motility Effects 0.000 description 5
- 241000568443 Aname Species 0.000 description 4
- 101700039663 NBEA Proteins 0.000 description 4
- 230000001808 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- 230000001343 mnemonic Effects 0.000 description 4
- 230000002104 routine Effects 0.000 description 4
- 210000001519 tissues Anatomy 0.000 description 4
- 101700024838 ADD1 Proteins 0.000 description 3
- 101700031636 ADD2 Proteins 0.000 description 3
- 101710031728 DR_0824 Proteins 0.000 description 3
- 241000196435 Prunus domestica subsp. insititia Species 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 3
- 101700045068 add Proteins 0.000 description 3
- 238000007792 addition Methods 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 230000001264 neutralization Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 3
- 229910052710 silicon Inorganic materials 0.000 description 3
- 239000010703 silicon Substances 0.000 description 3
- 229920001276 Ammonium polyphosphate Polymers 0.000 description 2
- 206010008190 Cerebrovascular accident Diseases 0.000 description 2
- 241001058354 Inti Species 0.000 description 2
- 208000006011 Stroke Diseases 0.000 description 2
- 230000000996 additive Effects 0.000 description 2
- 101700024846 aliA Proteins 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001419 dependent Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 101710040847 mrcA Proteins 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 206010011968 Decreased immune responsiveness Diseases 0.000 description 1
- 241000735235 Ligustrum vulgare Species 0.000 description 1
- 102200090880 NFKBIA S32I Human genes 0.000 description 1
- 241000146313 Parnassius apollo Species 0.000 description 1
- 241001442055 Vipera berus Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000000903 blocking Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 229920003211 cis-1,4-polyisoprene Polymers 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000001186 cumulative Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000007688 edging Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 230000001771 impaired Effects 0.000 description 1
- VBCVPMMZEGZULK-NRFANRHFSA-N indoxacarb Chemical compound C([C@@]1(OC2)C(=O)OC)C3=CC(Cl)=CC=C3C1=NN2C(=O)N(C(=O)OC)C1=CC=C(OC(F)(F)F)C=C1 VBCVPMMZEGZULK-NRFANRHFSA-N 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000003137 locomotive Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000000873 masking Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 125000002950 monocyclic group Chemical group 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002093 peripheral Effects 0.000 description 1
- 230000002085 persistent Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000036633 rest Effects 0.000 description 1
- 230000002441 reversible Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
- 230000001131 transforming Effects 0.000 description 1
- 230000001052 transient Effects 0.000 description 1
Images
Abstract
A kind of configurable risc processor realizes user-defined instruction set with high performance fixing and variable-length coding.The process of definition KNI is by the support of various instruments, and these instruments allow user add new instruction and they carried out rapid evaluation, to keep multiple instruction set and to switch between which.A kind of standardized language is used to configurable every definition of development goal instruction set and describes for realizing the HDL of the hardware needed for this instruction set, and for checking and the various developing instruments of application development, realize the automatization of height the most in the design process.
Description
Background of invention
1. invention field
The present invention relates to microprocessor system, it is more particularly related to containing the one of one or more processors
Planting the design of application program solution, here, each processor in system is so configured in their design process
And reinforcement, to improve they suitabilitys to a kind of application-specific.The present invention is also towards such a system, wherein,
Application developer can develop instruction extension, the newest finger on the basis of existing instruction set architecture rapidly
Order, instructs including controlling the new of user-defined processor state, and measures such extension immediately to application program operation
Time and the impact on processor cycle time.
2. the explanation of correlation technique
Traditionally, it is once highly difficult for being designed processor and revise.For this reason, great majority are containing place
The system of reason device all uses those (schemes) once designing for general-use and verifying, then by multiple application program always
Continue to use.So, they to the suitability of application-specific and are the most all preferable.Amendment processor so that
Preferably perform that the code of application-specific is typically suitable for (such as, running more hurry up, lower power consumption is a little, or cost
Reduce).But, even if revising the design of existing processor, its difficulty, thus its time, cost and risk, be all the highest
, so typically not doing that.
In order to be more fully understood that the processor making prior art becomes the difficulty that configurable processor is run into, let us
Consider its development process.First, its instruction set architecture (ISA) will be developed.Substantially, carry out the step for taking second place
Rear it is necessary to used decades by many systems.Such as, Intel PentiumTMInstruction set used by processor is probably and inherits
Legacy as far back as 8008 and 8080 microprocessors that 1970 mid-nineties 90s introduced.In this process, based on predetermined ISA design
Specification, each ISA instructs, and syntax etc. is developed, and for the SDK of ISA, such as assembly program, debugs journey
Sequence, compiler etc. is also developed.Subsequently, the simulated program for specific ISA, various benchmark quilts are developed
Run, to assess the effectiveness of ISA, and according to the result of assessment, ISA is modified.On certain some, ISA will be recognized
For being satisfied, and the ISA description fully developed along with portion, one section of ISA simulated program, a ISA proving program group
And a kind of exploitation program groups, including such as assembly program, debugging routine, completing of compiler etc., ISA process is just declared end
Tie.Then, processor design is proceeded by.Owing to processor may have the service life in many years, so this process
Execution is the most typically, after a kind of processor once designs, always by many systems with many years.As long as
Provide ISA, its proving program group, simulated program and the various development goals of different processor, just can be to this processor
Microarchitecture is designed, emulates and revises.Once microarchitecture is finalized, and it is just included into a kind of hardware description language
(HDL) among, and developing a kind of microarchitecture proving program group, in order to verify this HDL embodiment, (majority is afterwards
Carry out).Then, processing with the craft described for this point and contrast, design aids can describe based on HDL and close
Become a circuit, and its each element is laid out and connects up.Layout can be modified subsequently, to optimize chip area
Use and timing.Alternatively, it is possible to use additional craft processes and generates the site plan described based on HDL,
HDL is converted to circuit, the most artificially and automatically to circuit verifies and carry out layout designs.Finally, one is used
Layout is verified by automation tools, to confirm that it matches with circuit, and enters each circuit according to every layout parameter
Row checking.
After completing processor exploitation, system is carried out master-plan.It is different from the design of ISA and processor, system
Design (it can include that chip designs, and present chip includes processor) is the most common, and typically enters system
Row design continuously.Each system is all used one section of extremely short time cycle (1 or 2 year) by a kind of application-specific.Base
In predetermined aims of systems, such as cost, performance, power and function, the processor description existed in advance, chip version type explanation
Book (is generally closely connected with processor distributor), is designed the architecture of whole system, selects a kind of processor to make
Match with design object, and the version type (this with processor select be closely connected) of selected processor.
Subsequently, selected processor, ISA, version type and simulated program, checking and the developing instrument of exploitation in advance is given
(being also used for the standard cell lib of selected version type), designs the embodiment of this system, for the HDL embodiment party of this system
Case develops a kind of proving program group, and makes this embodiment be verified.Secondly, the circuit of this system is synthesized, at circuit board
On carry out place and route, and layout and timing are carried out re-optimization.Finally, these plates are designed and layout, produce
Each chip, and assemble each circuit board.
Another difficulty of prior art processor design is exactly, owing to any given application program only needs every spy
The specific combination levied, and to allow a processor have this unwanted feature of application program will be undue costliness
, consume more power, and be more difficult to manufacture, so design has traditional processor of more features to cover simply
It is unsuitable for covering all of application program.Additionally, when starting to design a kind of processor, it has not been possible to know all of application
Target.If the amendment process of processor can realize automatization and very reliably, then system designer produces application solution
Ability will strengthen significantly.
As an example, it is considered to such a device, it is designed on a channel using complex protocol
Send and receive data.Owing to this agreement is complicated, it is impossible to all use hardwire (such as combination logic) to close
Complete processing procedure to reason, the substitute is, programmable processor is introduced this system and is used for protocol processes.Programmability is also
Permission mistake is fixed, and by by new software load memorizer, protocol update in the future just can be completed.But, pass
System processor do not design for this application-specific (when design this kind of processor time, even this application
Program may not yet occur), and it needs to perform such certain operations, and these operations need several instructions to go, and only
These operations just can be completed with one or several instruction in additional processor logic.
Owing to processor can not improve easily so that many system designers are not intended to do so, and change into one
Plant on available processor, select to perform the pure software solution of a kind of poor efficiency.This poor efficiency causes a kind of solution
Scheme may be slower, or need more power, or relatively costly (such as, it may need that one piece bigger, function more
Strong processor, performs this program with enough speed).Other designers select to design for this application program at them
Specialized hardware in provide some to process requirement, such as one coprocessor, then allow programmer logical in the difference of program
Cross coding to access this specialized hardware.But, owing to the most sizable working cell is just sufficiently accelerated so that by making
The time saved with specialized hardware travels to and fro between adding needed for specialized hardware transmits data more than (translator's note: should be and be less than)
Time, so, between processor and specialized hardware, transmit time restriction this scheme the making in system optimization of data
With.
In the example of communication channel application, this agreement may need encryption, error correction, or compression/decompression processes.This
The process of sample generally carries out operating rather than operating on the bigger word of processor on individual other bit.For one
The circuit that item calculates is probably moderate, but allows processor go to extract each bit, sequentially processes it, then
Reload each bit, considerable expense will be increased.
As a example the most special, it is considered to (similar coding is used for use the Hafman decoding of rule shown in table 1
MPEG compression standard).
| Value | Length | |
0 0 X X X X X X | 0 | 2 | |
0 1 X X X X X X | 1 | 2 | |
1 0 X X X X X X | 2 | 2 | |
1 1 0 X X X X X | 3 | 3 | |
1 1 1 0 X X X X | 4 | 4 | |
1 1 1 1 0 X X X | 5 | 5 | |
1 1 1 1 1 0 X X | 6 | 6 | |
1 1 1 1 1 1 0 X | 7 | 7 | |
1 1 1 1 1 1 1 0 | 8 | 8 | |
1 1 1 1 1 1 1 1 | 9 | 8 |
Numerical value and length thereof will calculate, and therefore, in code stream, each length bit can be eliminated, in order to
Find the starting point of next element to be decoded.
For a conventional instruction set, this carries out coding multiple method, but much tests needs owing to having
Do, and with the simple gate time delay of combination logic compares, each Software implementations is required for multiple processor cycle, so
All of which needs many bar instructions.Such as, the embodiment of a kind of effective prior art using MIPS instruction set may
6 logical operationss, 6 conditional branchings, 1 arithmetical operation, and relevant depositor is needed to load.A kind of optimization is used to design
Instruction set can make to encode, but in terms of the time, still expense is the biggest: 1 logical operations, 6 conditional branchings, 1
Arithmetical operation, and relevant depositor loading.
In terms of processor resource, expense is so large that so that typically to use the synopsis of a 256 row, comes
Replace the coding of the processing procedure of the sequence as successive appraximation.But, the synopsis of a 256 row to take substantial amounts of sky
Between, and access this table and may also need to many cycles.For longer Huffman encoding, the size of table will become nothing
Method uses, and it will cause more complicated and slow code.
Within a processor, the possible issue-resolution catering to special applications requirement uses configurable process exactly
Device, it has the instruction set and architecture being prone to revise and extend, in order to improves the function of processor and realizes determining of function
System.Configurability allows designer to specify in its product the need of or needs how many additional function.Configurability is
Simple one is that binary system selects: a kind of feature be with or without.For example, it is possible to provide one to be with or without floating point hardware
Processor.
By using the selection of configuration of finer Asymptotical Method, motility is made to be improved.Such as, processor is permissible
System designer is allowed to specify the number of depositor in register file, the width of memorizer, cache memory big
Little, the relatedness etc. of cache memory.But, these options are still not reaching to by system designer wanting according to oneself
The level that method is customized.Such as, in the example of superincumbent Hafman decoding, although in the prior art, it is not known that system
Designer may like and includes a special instruction in and be decoded, such as,
Huff8 t1, t0
Here, the most-significant byte of result is decoded numerical value, and meanwhile, least-significant byte is length.With described above soft
Part embodiment contrasts, the direct hardware embodiments of Hafman decoding be foolproof except instruction decoding etc. with
Outward, the decoding logic for the instruction of combination logic function generally has 30 doors, or the door of a typical processor
Number less than 0.1%, and can be calculated in a monocycle by an application specific processor, therefore, with only making
Comparing by universal command, it improves the factor is 4-20.
Prior art effort in terms of configurable processor generation is generally divided into two classes: work-in parameters hardware description and
The logic synthesis used;And from abstract machine describe compiler and the repurposing of assembly program.Belong to the 1st class
The processor hardware design that can synthesize, such as Synopsys DW 8051 processor, ARM/Synopsys ARM7-S, Lexra
The configurable risc core of LX-4080, ARC;And the most also include that Synopsys can synthesize/configurable
Pci bus interface.
In the above example, Synopsys DW 8051 includes the binary compatible of a kind of existing processor architecture
Embodiment;And synthetic parameters in a small amount, 128 or 256 bytes of such as internal RAM, parameter rom_addr_size determine
ROM address realm, an optional intervalometer, the serial port of a variable number (0-2), and one support 6
Or the interrupt location in 13 sources.Although the architecture of DW 8051 be may be made that some change, but at its instruction set
Structure can not be made change.
ARM/Synopsys ARM7-S processor includes the reality of the binary compatible of existing architecture and microarchitecture
Execute scheme.It has two configurable parameters: high-performance or the selection of low performance multiplier, and include in debugging routine and
Line emulation logic.Although it is possible to make the instruction set architecture of ARM7-S change, but they are existing can not to join
The subset of the processor embodiment put, so need not new software.
LX-4080 processor has a configurable variant of the MIPS architecture of standard, and to instruction set extension not
Software support is provided.Its option includes a customization engine interface, and its permission dedicated operations is to MIPS ALU
The operation code of ALU is extended;One interior hardware interface, it includes a register source and a depositor or 16 bit wides
Immediate source, and target and pending signal;One simple MMU option;3 MIPS coprocessor interface;
One leads to cache memory, scratch RAM or the local memory interface flexibly of ROM;One bus control unit, it
External function and memorizer are connected to the local bus of this processor self;And the write buffer of a configurable deep.
Between the configurable risc core of ARC and the door counting estimation rapidly obtaining data, there is a user interface, on
State estimation to configure based on object technology and clock speed, instruction cache, instruction set extension, an intervalometer choosing
, a scratch-pad storage device option, and Memory Controller option;One instruction set with selectable option,
Such as there are the local scratch RAM of the data block being sent to memorizer, special register, up to 16 kinds additional state code choosings
Select, 32 × 32 bit scoreboard multiplication blocks, 32 barrel-shifter/ spill spin blocks of a monocycle, a normalization
(finding the 1st) instructs, and result is directly write order buffer storage (not being written into register file), 16 MUL/
MAC block and 36 bit accumulators, and use the sliding pointer in order to access local SRAM of linear arithmetic;And by manual
The user instruction that editor's VHDL source code defines.ARC is designed without describing the device of language for realizing a kind of instruction set, also
Do not produce the software tool that configurable processor is special.
The configurable pci interface of Synopsys includes for installing, configure and the GUI of synthesis activity or command line interface;
Check whether the user action taking necessity in each of the steps;That selected, based on configuration (such as Verilog is to VHDL)
The installation of design document;Selectable configuration, such as parameter are arranged, and prompt the user with the inspection of Combination efficiency
The numerical value of every configuration, the HDL source code updated with user produces HDL and does not goes to edit HDL source file;And synthesis merit
Can, such as one user interface, technology bank is analyzed by it, to select I/O buffer, the constraints unrelated with technology with
And synthesis manuscript, Buffer Insertion and the prompting of the buffer for particular technology, and the formula unrelated with technology is converted to
Depend on the manuscript of technology.Owing to configurable pci bus interface achieves the consistency check of parameters, based on configuration
Install, and the automatic amendment of hdl file, so such EBI is noticeable.
Additionally, the synthetic technology of prior art illustrates based on ownership goal and selects different mapping relations, it is allowed to this
Speed, power, area or target component are optimized by mapping relations.In this, in the prior art, not by whole
On the premise of individual mapping process is designed, it is impossible to obtain the feedback of the effect reconfiguring processor by this way.
Such feedback can be used to bootstrap processor and further reconfigure, until it reaches till system design goal.
In the field that configurable processor produces, (that is, compiler and assembly program is automatic for the 2nd class prior art
Repurposing) relate to large-scale academic research, see for example that Hanono et al. write " sends out at AVIV retargetable code
Instruction in raw device selects, resource distribution and scheduling " (for the expression of the machine instruction automatically generated of code generator);
" the using nML to describe instruction set processor " that Fauth et al. is write;What Ramsey et al. was write " uses in embedded systems
In the machine description setting up instrument ";" code using tree coupling and dynamic programming produces " that Aho et al. is write is (in order to mate
The algorithm of the various conversion relevant with each machine instruction, such as, is added, loads, stores, branch etc., has a series of
It is represented as the procedure operation of some machine-independent intermediate form, uses the various methods of such as pattern match);And
" formalization of code generator and be derived automatically from " that Cattell is write (machine architecture for compiler research
Abstractdesription).
Once processor has been devised, and just should verify its running.In other words, processor is usual
Use a streamline (its every one-level is all adapted to the stage that instruction performs), perform each from the instruction of a storage
Item instruction.Therefore, change or increase an instruction or change configuration may be made universal by needs in the logic of processor
Changing, therefore, each in multiple pipeline stages can perform suitable action in each such instruction.One
It is verified by configuration requirement again that plant processor, and this checking is applicable to every change and interpolation.This is not one
The simple task of item.Various processors are all the complicated logical devices of internal data and the controlled state with extension, and
Control, data make processor checking become the technology of a kind of needs with the combination of program.The difficulty increased verified by processor
It it is exactly the difficulty in the verification tool that exploitation is suitable.Due in the prior art, checking is carried out the most automatically, so it
Motility, speed and reliability are below optimum.
Additionally, once processor is devised and through checking, if easily can not be programmed it, that is the most not
It is useful especially.Generally extension software tool with the help of processor is programmed, above-mentioned instrument include compiler,
Assembly program, linker, debugging routine, simulated program and tracing program.When processor changes, software tool also must
Must change therewith.If one instruction can not be compiled, collects, emulates or debug, then it is unhelpful for adding such instruction.
In the prior art, it is main that relevant to processor amendment and improvement software changes be once to promote processor to design one
Obstacle.
Thus, it will be seen that design and revise various process owing to being generally typically not for a kind of special applications
Device, so the processor design of prior art is among a certain degree of difficulty.If also, it can be seen that can be for spy
Very should be used for configuring and extend various processor, then be possible to obtain considerable improvement in system effectiveness.Further, if can be
Feedback used to improve the design of processor in embodiment characteristic (such as power consumption, speed etc.), just can promote design process
Efficiency and effectiveness.And, in the prior art, a processor is once modified, it is necessary to carries out substantial amounts of effort, tests
Demonstrate,prove the correct running of amended processor.Finally, although prior art provides limited processor configurability, but they
The finishing of configured processor can not be used for for the offer that produces of SDK.
The system meeting above-mentioned specification must be an improvement in industry, may be made that improvement for example, it is desired to
Having such a processor system, the information (that is, processor state) that it has being stored in inside special register is visited
The instructions asked or revise, it significantly limit the scope that can obtain instructions, and therefore limits obtainable property
The quantity that can improve.
Equally, invent new special instruction to relate to reducing cycle count, adding hardware resource and time cpu cycle shadow
The compromise of complexity is made between sound.Another challenge is exactly the most complicated in high-performance microprocessor embodiment
In details, on the premise of being not related to application developer, obtain effective hardware embodiments for new instruction.
Said system gives the user the motility designing a kind of processor the most supporting with her application.But it is right
For the interactive development of hardware and software, remain pretty troublesome.In order to be more fully understood by this problem, it is considered to so
A kind of typical scenario, the program is used for being adjusted the performance of its software application by many software developers.They will
Typically expect a kind of possible improvement, revising their software to use this possible improvement, recompilating theirs
Software source, in order to produce the application program run containing that possible improvement, and subsequently possible improvement is carried out
Assessment.According to the result of assessment, they can retain or abandon these possible improvement.Typically, whole process may only exist
Complete in a few minutes.This enables a user to freely test, and is quickly carried out attempting and determine retaining or abandoning
Idea.In some cases, it is the most complicated for assessing a kind of possible idea rightly.User may need in several cases
This idea is tested.In this case, user generally retains the miscellaneous editions of the application program compiled: a kind of
Prototype version and the another kind of version containing possible improvement.In some cases, possible improvement can be interactively,
And user can retain the plural copy of this application program, each all uses of possible improvement
Different subsets.By retaining miscellaneous editions, user just can in varied situations, the version that easily repeatable test is different.
The user of configurable processor likes being similar to software developer and develops the side of software on traditional processor
Formula interactively develops jointly hardware and software.Consider that the instruction of customization is added in configurable processor by user to go so
Situation.User likes interactively various possible instructions being added in their processor, and their spy
Fixed application program is tested and is assessed those instructions.In prior art systems, due to 3 kinds of reasons so that this becomes difficulty.
First, after proposing a possible instruction, obtaining the compiler and emulation that can have benefited from this instruction
Before program, user has to wait for more than one hour.
Secondly, when user wishes to test with many possible instructions, user be necessary for each instruction generate with
Retain a software development system.Software development system may be the hugest.Retain many versions may become to manage.
Finally, software development system configures for whole processor.This makes to decompose exploitation in the middle of different engineers
Process becomes highly difficult.Consider that two developers are operated such a example in a specific application simultaneously.One
Developer may be responsible for determining the characteristic of the cache memory of processor, another instruction being then responsible for adding customization.
When the two developer working relation together time, the most a piece of is all the most separable so that each developer
Her task can be carried out in isolation from each other.The developer of cache memory may propose a kind of special joining at the very start
Put.Another developer starts from this configuration, and attempts several instruction, sets up a software for each possible instruction
Development system.Now, the configuration of the cache memory that developer's amendment of cache memory has pointed out.Due to his
All use the configuration of original cache memory each of in configuration, so another developer now has to rebuild her
Configuration in each of.If there are many developers to be operated in a project simultaneously, be by different configuration tissues
Cannot manage to may become soon together.
The summary of the present invention
Instant invention overcomes these problems of prior art, and its target is just to provide such system
System.It is used for entering processor from identical configuration instruction by the description and a group producing the hardware embodiments of processor
The SDK of row programming, automatically configures a kind of processor.
Another target of the present invention is just to provide such a system, and it can be for different performance specifications, to hardware
Embodiment and SDK are optimized.
A further object of the present invention is just to provide such a system, and it is given different types of configurable for processor
Property, including extensibility, binary system selects and parameter modification.
Another target of the present invention is just to provide such a system, and it is with a kind of language that can be readily implanted hardware
Speech describes the instruction set architecture of processor.
The further object of the present invention is just to provide such a system and method, in order to develop and to realize and can amendment process
The instruction set extension of device state.
Another target of the present invention is just to provide such a system and method, in order to develop and to realize revising and can join
The instruction set extension of each depositor of the processor put.
A further object of the present invention is exactly to allow user to customize a kind of processor configuration by adding new instruction, and
And this feature can be assessed within a few minutes.
By providing an automatic processor generation system, it is possible to reach above-mentioned target, said system uses with standardization
The configuration definition of a kind of target instruction set is developed in the processor instruction set option of the customization that language is write and extension, for realizing being somebody's turn to do
The hardware description language explanation of the circuit needed for instruction set, and various developing instrument, such as compiler, assembly program, adjust
Examination program and simulated program, they can be used to generate software for this processor and verify this processor.Can
With for different specifications, such as area, power consumption and speed, carry out the embodiment of optimized processor circuit.A kind of processor is joined
Put and be once developed, it just can be tested, and is imported into system to be modified, in order to optimized processor repeatedly
Embodiment.
In order to develop an automatic processor generation system according to the present invention, need a kind of instruction set architecture of definition
Describe language, and develop various developing instrument, such as assembly program, linker, compiler and debugging routine.This is out
A part for the process of sending out, because while major part instrument is all standard, but they should be modified to be able to retouch according to ISA
State and be automatically configured.This part of design process is typically by designer or the life of automatic processor design tool itself
Product person complete.
One running according to the automatic processor generation system of the present invention is as follows.One user, such as one system sets
Meter person, develops a kind of configurable instruction set architecture.In other words, ISA definition and the instrument of previously exploitation, exploitation are used
Go out to follow the configurable instruction set architecture of one of certain ISA design object.Then, for this configurable instruction collective
Architecture configuration developing instrument and simulated program.Use configurable emulator, run benchmark test, to assess configurable finger
Make the effectiveness of architecture, and revise its core according to assessment result.The most configurable instruction set architecture
It is in a kind of satisfactory state, just for a kind of proving program group of its exploitation.
While paying close attention to the software aspects of this process, this system is closed also by developing a kind of configurable processor
Note hardware aspect.Then, use the such as aims of systems such as cost, performance, power, function and produce about available processor
The information of producer, the system architecture that the design of this system is overall, it is in view of configurable ISA option, extension and processor
Feature.Use the system architecture of entirety, exploitation software, simulated program, configurable instruction set architecture and process
The HDL embodiment of device, is configured processor ISA, HDL embodiment, software and simulated program, and system by this system
HDL is designed to system design on a single die.Equally, based on system architecture and the explanation of chip version type, base
Assessment in the version type ability relative to system HDL selects the version type of chip (unlike relating to processor choosing in the prior art
Select like that).Finally, use the standard cell lib of this edition type, this configuration system synthesis circuit, it is laid out and connects up, and
The ability that layout and timing carry out re-optimization is provided.Subsequently, if this design is not belonging to monolithic type, then to circuit-board laying-out
It is designed, manufactures each chip, and assemble each circuit board.
As seen above, employ several technology so that realizing the in extensive range of processor design process
Automatization.Be exactly design in order to solve the 1st technology of these problems and realize special mechanism, it unlike random amendment or
Extend the most flexible, but it still allows for great function and improves.The randomness changed by restriction, related to this is various
Problem also suffers restraints.
2nd technology is exactly to provide a single explanation to every change, and automatically to all affected parts
Produce amendment or extension.Owing to something is done once by hand, go automatically to do this part thing also with writing a kind of instrument
Using this instrument once to compare, the former is typically more cheap, so not accomplishing this point with the processor of prior art design.
When this task is repeated as many times as required execution, the advantage that just can find out automatization.
The 3rd technology used sets up a data base exactly, in order to the estimation assessed for follow-up user is with automatic
Configuration provides help.
Finally, the 4th technology is exactly to provide hardware and software with a kind of form of configuration that is suitable for.In the present invention one
In individual embodiment, some hardware and software is not directly to write with the hardware and software language of standard, but with such one
Plant language to write: by adding a preprocessor, it allows queries configuration database, and has displacement, condition, duplication
Standard hardware and the generation of software language with other amendment functions.Then with every hook connecting that improves being come
Complete the design of processor.
In order to these technology are described, it is considered to add every special instruction.By be limited to the method to have depositor and
Constant operand number also produces the various instructions of a register result, just can be only with for combination (stateless, feedback-less) logic
The running of bright various instruction.This inputs the distribution of assigned operation code, instruction name, assembly program syntax, and for this instruction
Combination logic (various instruments thus produce):
The instruction decoding logic of this processor, in order to identify new operation code;
Add a functional unit, in order on register operand, perform combination logic function;
It is sent to the input of the instruction scheduling logic of processor, to confirm only when its operand is effective, just sends finger
Order;
The amendment of assembly program, to accept new operation code and operand thereof, and produces correct machine code;
The amendment of compiler, increases new intrinsic function, in order to access new instruction;
The amendment of disassembler/debugging routine, in order to be translated as machine code newly instructing;
The amendment of simulated program, in order to accept the logic function specified by new operation code execution;And
Diagnotor generator, it produces direct and random code sequence, is increased in order to comprise and to check
The result of instructions.
Above all technology are all used to add various special instruction.Input is restricted to input and exports each operand
With logic so that they are estimated.At one, every change is described, and the amendment of all hardware and software is all
Derive from this description.This set represents how a single input can be used to improve multiple parts.
The result of this processing procedure is such system, due in the design process more a little later time,
May be made that various compromise, so this system is excellent in terms of meeting application demand between processor and the remainder of system logic
In prior art.Owing to its configuration can apply to more representation, so it is better than discussed above multiple existing
There is technical scheme.One single source may be used for all of ISA coding, and software tool and senior emulation can be included one in and join
Put bag, and flow process can be designed to iterative to find out the best of breed of every configuration numerical value.Further, noted earlier
Various methods concentrate on hardware configuration or software arrangements the most individually, and not used for the single user interface controlled, or
The measurement system redefined that person one guides for user, the present invention then by whole assignment of traffic to processor hardware and
The configuration of software, including the result from hardware designs and software performance, to help to select optimal configuration.
According to an aspect of the present invention, by providing the processor design tool of a kind of automatization just can reach these mesh
Mark, the description of the processor instruction set extension of the customization that the use of above-mentioned design tool is write with standardized language, carrys out development goal
The configurable definition of instruction set, illustrates for realizing the hardware description language of the circuit needed for this instruction set, and various exploitation
Instrument, such as compiler, assembly program, debugging routine and simulated program, it is each that they can be utilized for the exploitation of this processor
Plant application, and it is verified.Standardized language can process instruction set extension, and the latter revises processor state or use
Configurable processor.By providing a kind of extension being restricted and the field of optimization, just can realize in higher degree
The automatization of process, thus promote to develop quickly and reliably.
According to another aspect of the present invention, by providing such system also can reach above-mentioned mesh further
Mark, within the system, user can preserve organize possible instruction or state (hereinafter, possible configurable instruction or
The combination of state will be collectively referred to as " processor improvement "), and when assessing their application, cut the most between which
Change.
User uses method described here select and set up a basic processing unit.User generates new one group
User-defined processor improves and they is put among a file directory.Then, user enables a kind of in order to process use
Family improve instrument, and convert them to basic SDK can with use form.Owing to it only relates to user
Define improves and does not set up a complete software system, so this conversion is quickly.Then user enables substantially
SDK, tell this instrument dynamically use in new directory generate every processor improve.It is preferably, via
One command-line option or via an environmental variable, provides the position of this catalogue to each instrument.In order to simplify further
This process, user can use the software makefiles of standard.These processor instructions allowing users to revise them,
And subsequently via a single make order, process every improvement, and use basic SDK, new
Rebuild and assess their application under the name that processor improves.
Instant invention overcomes 3 kinds of restrictions in prior art.Giving one group of new possible improvement, user is permissible
Every new improvement is assessed within time a few minutes.By generating new catalogue for each group, user just can preserve possible each
The miscellaneous editions that item improves.Owing to this catalogue only includes describing rather than the description of whole software system of every new improvement,
So required memory space is minimum.Finally, every new improving connects with the remainder of configuration releases.Once user
Having generated the catalogue of a possible set with every new improvement, this catalogue just can be joined by she with any basic configuration
Close and use.
The brief description of accompanying drawing
When combining all accompanying drawings to read following detailed description, the above and other target of the present invention will become brighter
Aobvious, in all accompanying drawings:
Fig. 1 is a block diagram, represents and is performing at of instruction set according to a preferred embodiment of the present invention
Reason device;
Fig. 2 is a block diagram, represents according to should the square frame of a streamline used in the processor of embodiment
Figure;
Fig. 3 represents a kind of configuration manager in the graphical user interface (GUI) according to the present embodiment;
Fig. 4 represents a configuration edit routine in the graphical user interface (GUI) according to the present embodiment;
It is dissimilar that Fig. 5 represents at the configurability according to the present embodiment;
Fig. 6 is a block diagram, and expression is in the flow process of the processor configuration of this embodiment;
Fig. 7 is a block diagram, represents an instruction set according to the present embodiment;
Fig. 8 is a block diagram, represents one piece of emulation board for the processor configured according to the present invention;
Fig. 9 is a block diagram, represents the logical structure of the configurable processor according to the present embodiment;
Figure 10 is a block diagram, represents and is added among the structure of Fig. 9 by a multiplier;
Figure 11 is a block diagram, represents and is added among the structure of Fig. 9 by a multiply-accumulator;
Figure 12 and 13 these two parts figure represents the configuration of memorizer in the present embodiment;And
Figure 14 and 15 these two parts figure represents the interpolation of the user's defined function unit in the structure of Fig. 8;
Figure 16 is a block diagram, represents in a further advantageous embodiment, the flow of information between each system unit;
Figure 17 is a block diagram, represents in the present embodiment, the custom code for various SDKs be as
What produces;
Figure 18 is a block diagram, represents in another preferred embodiment of the invention, the various software moulds used
The generation of block;
Figure 19 is a block diagram, the knot of expression streamline in a configurable processor according to the present embodiment
Structure;
Figure 20 is the embodiment of the status register according to the present embodiment;
Figure 21 is a figure, represents in the present embodiment, for realizing the additional logic needed for status register;
Figure 22 is a figure, represents the combination of lower a kind of State-output of a kind of state from several semantic chunks, and
One of them is selected to be input among a status register according to the present embodiment;
Figure 23 represents the logic corresponding to the semantic logic according to the present embodiment;
Figure 24 represents in the present embodiment, when being mapped to a bit of user register, for one of state
The logic of bit.
The detailed description of each preferred embodiment
In general, automatic processor produce process start from configurable processor definition and user specify to it
Amendment, also await the application program specified of user for its configuration processor.This information is used to produce one and examines
Consider the configurable processor to user's amendment, and produce SDK, such as, for its compiler, emulation
Program, assembly program and disassembler, etc..Equally, use various new SDK that application program is carried out again
Compiling.Use simulated program that the application program through compiling again is emulated, produce a software features file, in order to retouch
State the performance of the configured processor running this application program, and with regard to aspects pair such as silicon chip area utilization, power consumption, speed
Configured processor is estimated, in order to produce a hardware characteristics file characterizing processor circuit embodiment.Software
Fed back with hardware characteristics file and be supplied to user, in order to being carried out further iteration configuration, making ground processor for this
Application-specific is optimised.
Automatic processor generation system 10 according to a preferred embodiment of the present invention has 4 critical pieces, such as Fig. 1
Shown in: a user configures interface 20, it is desirable to the user being carried out design processor by it inputs its configurability and extensibility
Option and other design constraints;A set of SDK 30, it can be customized, in order to selected by user
Standard carrys out design processor;Description parameterized, extendible to the hardware embodiments of processor 40;And a foundation
System 50, it receives input data from user interface there, produce required processor customization, the hardware that can synthesize retouches
State, and revise various SDK to adapt to selected design.Preferably, set up system 50 and produce diagnosis by way of parenthesis
Instrument, in order to verify design on hardware and software, also produces an evaluator, in order to assess every characteristic of hardware and software.
" the hardware embodiments description " used in this article and in the appended claims refer in order to
The one or more description of the various aspects of the embodiment of the physics of processor design is described, and, it is used alone or combines
One or more other describe, in order to according to the production of each chip of this design.Therefore, each portion that hardware embodiments describes
Divide and may be at the abstract of different levels, from the most senior as such as hardware description language, arrived by netlist and microcode
Every shielding describes.In the present embodiment, the major part that hardware embodiments describes is written among HDL, netlist and manuscript.
And, the HDL used in this article and in the appended claims refers to the hardware of general rank
Describing language, it is used to describe micro structure etc, and is not intended to represent any special case of this language with it.
In the present embodiment, the basis of processor configuration is exactly the architecture 60 shown in Fig. 2.Many elements of this structure
It it is the fundamental characteristics that can not directly modify of user.These include processor control section 62, the section of adjusting and decode 64 (although
Some of this section is based partially on the configuration that user specifies), ALU and address generation section 66, branching logic and instruction fetch section 68, and
Processor interface 70.Other each unit are all parts for basic processing unit, but can be configured by user.These include
Interrupting control section 72, data monitor section 74 and 76 with instruction address, and window registers file 78, data are deposited with command high speed buffer
Storage and marker field 80, write buffer 82 and intervalometer 84.Can be received alternatively by user for remaining each section shown in Fig. 2
Enter.
The central unit of processor configuration system 10 is that user configures interface 20.This is a module, it be desirable to
User provides graphical user interface (GUI), and by means of this interface, user likely goes selection to include that compiler reconfigures
And assembly program, disassembler and instruction set simulation program (ISS) are in interior processor function;And prepare for whole place
Reason device synthesis, place and route input.It also allows user have benefited from processor area, power consumption, circulation time, application performance
And the rapid evaluation of code length, in order to iteration and the configuration improving processor further.Preferably, GUI can also access
One configuration database, in order to obtain default value according to user's input, and carry out error detection.
Designing a processor 60 to use the automatic processor according to the present embodiment to generate system 10, user will set
Meter parameter is input to user and configures among interface 20.It can be to run on meter under user control that automatic processor generates system 10
An isolated blob on calculation machine system;But, it preferably mainly runs on automatic processor and generates the life of system 10
Produce on a system under the control of producer.So, it is possible to provide user to access on a communication network.Example
As, it is possible to use a web browser with the data entry screen write with HTML and Java language provides GUI.This
There is the benefit of several respects, such as, keep the confidentiality of any proprietary back-end software, simplify maintenance and the renewal of back-end software, etc.
Deng.In this case, in order to access GUI, user first has to log in system 10, in order to prove its identity.
Once user is approved to access, and system will show a Configuration Management Officer screen 86, as shown in Figure 3.Configuration Management Officer
Screen 86 is a catalogue, and it lists all configurations of user-accessible.Configuration Management Officer screen 86 in Fig. 3 represents to be used
There are two kinds of configurations, " just intr " and " high prio " in family, and the former has been set up, i.e. be finalized for producing, then
Person still needs to be set up.A kind of selected configuration can be set up from this screen 86 user, it be deleted, edits, generate one
Part report, illustrates to be that a kind of configuration and scaling option of this kind of selection of configuration, or generates a kind of new configuration.To those
For the configuration having built up, such as " just intr ", a set of SDK 30 into its customization can be downloaded.
Fig. 4 shows and generates a kind of new configuration or carry out a kind of existing configuration editing shown in Fig. 4 to be used
Configuration edit routine 88.Configuration edit routine 88 has one " option " to select menu on the left side, represents configurable and extendible
Each general aspect of processor 60.When an option portion is selected, occur as soon as having for this part on the right
The screen of each config option, and can be as known to industry, with pull-down menu, notepaper frame, check box, radio
Knobs etc. arrange these options.Although user can be randomly chosen each option and input data, but, due at each several part
Between there is dependency in logic, so data had better be inputted the most item by item;Such as, in order to be appropriately viewed in
" interrupting " each option of part, the number of interruption should be those being selected in " ISA option " part.
In the present embodiment, for every part, following config option is all available:
Target
Technology for assessment
Target ASIC technology: .18 .25 .35 micron
Target operating conditions: typical, worst case
Embodiment target
Target velocity: arbitrarily
Door counting: arbitrarily
Objective function: arbitrarily
Target priority: speed, area function;Speed, function, area ISA option
Numerical digit option
There is the MAC16 of 40 bit accumulators: be, no
16 multipliers: be, no
Except option
The number interrupted: 0-32
High-priority interrupt grade: 0-14
Activate debugging routine: be, no
Intervalometer number: 0-3
Other
Byte order: low level formerly, uimsbf unsigned integer most significant bit first
Can be used for calling the register number of window: 32,64
Processor high speed buffer storage and memorizer
Processor interface reading width (bit): 32,64,128
Write buffer row (address/numerical value to): 4,8,16,32
Processor high speed buffer storage
Instruction/data cache size (kB): 1,2,4,8,16
Instruction/data cache row size (kB): 16,32,64
Peripheral components
Intervalometer
Timer interruption number
Timer interruption grade
Debugging is supported
Instruction address breakpoint register number: 0-2
Data address breakpoint register number: 0-2
Debugging interrupt level
Trace port: be, no
Debugging module on chip: be, no
Full scan: be, no
Interrupt
Source: outside, software
Priority level
System memory addresses
Vector and address computation method: XTOS, manual
Configuration parameter
RAM size, initial address: arbitrarily
ROM size, initial address: arbitrarily
XTOS: arbitrarily
Configuration specific address
Vector except user: arbitrarily
Vector except core: arbitrarily
Register window spilling/underflow vector base address: arbitrarily
Reset vector: arbitrarily
XTOS initial address: arbitrarily
Application program initial address: arbitrarily
TIE instructs
(defining every ISA extension)
Target CAD environment
Emulation
VerilogTM: it is, no
Synthesis
Design CompilerTM: it is, no
Place and route
ApolloTM: it is, no
Additionally, system 10 also provides for adding the option of other functional units, such as 32 integers take advantage of/calculate except unit or floating-point
Art arithmetic element;MMU;RAM and ROM option on chip;The relatedness of cache memory;Strengthen
DSP and coprocessor command set;The cache memory of write-back;Multiprocessor synchronizes;The inference that compiler guides;And
Support to additional CAD encapsulation.Can be used for those config options of a given configurable processor, preferably at portion
They are listed by definition file (such as that one shown in appendix A), in order to select suitable option, system once user
10 use it for syntax inspection etc..
From the above it can be seen that automatic processor configuration system 10 provides a user with the configurable of two kinds of broad types
Property 300, as shown in Figure 5: extensibility 302, it allows user from function arbitrary defined in search and structure, and can revise
Property 304, it allow user from predetermined, select inside affined set of choices.In the range of alterability, system permits
Permitted the binary system of some characteristic and selected 306, such as, it should a MAC16 or DSP is added to processor 60 and its
The parameter declaration 308 of his processor characteristic, number that the latter is such as interrupted and the size of cache memory.
In above-mentioned config option, many is all that professional person is familiar with;But, also other merits attention.Example
As, RAM and ROM option allows designer to bring scratch-pad storage or firmware into processor itself.Processor 10 can be from
Instruction fetch or read-write data in these memorizeies.The size and location of memorizer is configurable.In the present embodiment, these are deposited
Each in reservoir is to be accessed for as an additional collection in the cache memory of a set associative.
By comparing with a single labelling row, just hit at first time in memory can be detected.
Owing to each high precedence interrupts needing 3 special registers, expense is relatively big, so system 10 is for interrupting
(realizing various 1 grade of interruption) and high precedence interrupt option (realizing 2-15 level to interrupt and various not maskable interrupts) carry
For independent config option.
The MAC16 (being shown in the 90 of Fig. 2) with 40 bit accumulator options with the addition of the multiplier/adders merit of a kind of 16
Can, the latter has the accumulator of 40,8 16 positional operand depositors and one group of compound instruction, and it is by multiplication, tired
Add, operand loads and address updates instruction and combines.Can under conditions of parallel with multiplication/accumulating operation, from
16 paired figure places are loaded into operand register by memorizer.This unit can support twice loading of each cycle and 1
The various algorithms of secondary multiplication/accumulating operation.
Debugging module (shown in Figure 2 92) on chip is used to go access process device 60 internal by jtag port 94
, the visible state of software.The generation that module 92 is exclusions provides to be supported, makes processor 60 enter debud mode;Access
All of program visible memory or memory location, perform any instruction that processor 60 is configured to perform;Amendment program
Enumerator PC makes it jump to the desired location in code;And one section of application program, it allows to return to normal operation mode,
This mode is outside from processor 60, triggers via jtag port 94.
Once processor 10 enters debud mode, it just from the external world wait about an effective instruction via
The instruction that jtag port 94 is scanned into.Once processor 10 hardware realize be produced, module 92 just by with
Debug this system.Can be via the debugging routine run on a distant place main frame to control the execution of processor 10.Adjust
Examination program sets up interface via jtag port 94 with processor, and uses the ability of the debugging module 92 on chip determine and control
The state of processor 10 processed and control the execution of each instruction.
Up to 3 32 register/intervalometers 84 can be configured.This makes the use of 32 bit registers make each clock
Cycle and (for the intervalometer configured to each) comparand register increase by 1, and comparator is by comparand register
Hold with the counting of present clock depositor is compared, for interrupting and similar function.Register/intervalometer can be configured
For edging trigger, and common and high precedence internal interrupt can be produced.
Infer option by allow loading adjust to changed conditions mobile, to control flow, be allowed to flow to them and be infrequently performed
Place, provide compiler scheduling on greater flexibility.Exclusions, such dress may be caused owing to loading
Carry movement exclusions to be incorporated among one section of effective procedure originally not occurred.When loading is performed, with
The loading of machine strain is avoided that the appearance of these exclusions, but when demanding data, is provided with a kind of exclusions.It is substituted by
Once loading mistake and cause a kind of exclusions, flexible loading makes the significance bit of destination register reset (with this choosing
The new processor state that item is relevant).
Although core processor 60 is preferably provided with the pipeline synchronization ability that some is basic, but when a system uses multiple
During processor, need certain communication between each processor or synchronize.In some cases, use such as inputs and exports
Motor synchronizing communication technology as queue.In other cases, shared memory model is used for communication, and deposits owing to sharing
Reservoir does not provide required semanteme, it is therefore necessary to provide the instruction set supporting to synchronize.There is acquisition for example, it is possible to add and release
Put loading and the storage instruction of semanteme (function).It is likely to be used for synchronizing and data in those different memory locations so that
Must keep in the such multicomputer system of accurate order between each synchronization is quoted, this is for controlling memory reference
Order is useful.Other instructions can be used to generate the signal known to industry.
In some cases, shared memory model is used for communication, and owing to shared memorizer does not provide required
Semanteme, it is therefore necessary to provide the instruction set supporting to synchronize.This point is completed by multiprocessor the synchronization option.
In each config option, perhaps most important be exactly defining of TIE instruction, thus sets up the finger of designer's definition
Make performance element 96.It is positioned at the TIE of the Tensilica company exploitation of California Santa ClaraTM(Tensilica instruction set expands
Exhibition) to allow user be the various functions that its application program describes customization with the form of extension and new instruction, in order to expand basic
ISA.Additionally, due to the motility of TIE, it can be used to describe user's unalterable ISA part;So, whole
ISA can be used to as one man produce SDK 30 and hardware embodiments describes 40.TIE explanation uses multiple long-pending
Wooden unit, as follows to the attribute description of each new instruction:
Instruction field instruction class
Instruction operation code instruction semantic
Instruction operands constant table
Instruction field statement field is used to improve the readability of TIE code.Each field is to gather together and with one
The subset or chain of other each fields that individual name is quoted.In instructing at one, the complete or collected works of each bit are exactly five-star super
Collect field inst, and this field can be divided into several less field.Such as,
Field x inst [11:8]
Field y inst [15:12]
Field xy { x, y}
By two 4 bit fields, x and y, it is defined as son field (respectively, bit 8-11 and 12-of highest field inst
15), and by 8 bit fields xy it is defined as the chain of two fields of x and y.
Statement opcode is that each coding specific field defines each operation code.It is intended to specify each operand (respectively to post
Storage or each constant the most immediately) each instruction field, if each operation code being prepared as so defining is used, then it is first necessary to use word
Section statement is defined, and is then defined with operand statement.
Such as,
Opcode acs op2=4 ' b0000 CUST0
Opcode adse1 op2=4 ' b0001 CUST0
Operation code CUST0 according to the previous definition binary constant 0000 of one group of 4 bit length (4 ' b0000 represent) defines
Two groups of new operation codes, acs and adse1.The preferably TIE explanation of core I SA has following statement
Field op0 inst [3:0]
Field op1 inst [19:16]
Field op2 inst [23:20]
Opcode QRST op0=4 ' b0000
Opcode CUST0 op1=4 ' b1000 QRST
A part as its basic definition.Therefore, the definition of acs and adse1 makes TIE compiler produce respectively
The instruction decoding logic represented by following statement:
Inst [23:0]=0,000 0110 xxxx xxxx xxxx 0000
Inst [23:0]=0,001 0110 xxxx xxxx xxxx 0000
The instruction operands statement operand each depositor of mark and immediately constant.But, it is being one by a field definition
Before individual operand, it should be defined as a field as above in advance.If this operand be one the most normal
Number, then can produce the value of this constant, or it can be from the constant table of a predefined as described below from this operand
Middle value.Such as, in order to an immediate operand is encoded, TIE code
Field offset inst [23:6]
operand offests4 offset{
Assign offsets4={ { 14{offset [17] } }, offset} < < 2;
}{
Wire [31:0] t;
Assign t=offsets4 > > 2;
Assign offset=t [17:0];
The field of 18 of one entitled offset of definition, it preserves a signed number and an operand
Offsets4, it is stored in 4 times of the number among offset field.The decline of Operand statement is actually described in
VerilogTMOne son of HDL is concentrated in order to carry out the circuit calculated, and above-mentioned HDL is used to describe combinational circuit, as specially
As industry personage is known.
Here, wire statement defines the logic wiring of the entitled t that one group of width is 32.After wire statement the 1st
The logical signal of individual assign statement appointment driving logic wiring is the constant offsets4 of right shift, and the 2nd
Assign statement specifies low 18 of t to be put into offset field.1st assign statement is directly specified as offset's
The value of one chain operand offsets4, and 14 copy of its sign bit (position 17) by one move to left two with
With.
For a constant table operand, TIE code
table prime 16{
2,3,5,7,9,11,13,17,19,23,29,31,37,41,43,47,
53
}
operand prime_s s{
Assign prime_s=prime [s];
} {
Assign s=prime_s==prime [0]?4 ' b0000:
Prime_s==prime [1]?4 ' b0001:
Prime_s==prime [2]?4 ' b0010:
Prime_s==prime [3]?4 ' b0011:
Prime_s==prime [4]?4 ' b0100:
Prime_s==prime [5]?4 ' b0101:
Prime_s==prime [6]?4 ' b0110:
Prime_s==prime [7]?4 ' b0111:
Prime_s==prime{8]?4 ' b1000:
Prime_s==prime [9]?4 ' b1001:
Prime_s==prime [10]?4 ' b1010:
Prime_s==prime [11]?4 ' b1011:
Prime_s==prime [12]?4 ' b1100:
Prime_s==prime [13]?4 ' b1101:
Prime_s==prime [14]?4 ' b1110:
4′b1111;
}
(following the numeral after table name word is each unit in table to utilize table statement to define constant array prime
Element number), and use operand s as enter this table prime an index, with thinking that operand prime_s encodes
One numerical value (notes the Verilog when index of definitionTMThe use of each statement).
Each operation code and each operand are linked together in a kind of common format by instruction class statement iclass.At statement
All instructions defined in iclass all have identical form and operand purposes.Before one instruction class of definition, it
First each composition should be defined as field, is then defined as operation code and operand.Such as, at previously defined operand
On the basis of code used in the example of acs and adse1, set up additional statement
Operand art t { assign art=AR [t];}{}
Operand ars s { assign ars=AR [s];}{}
Operand arr r { assign AR [r]=arr;}{}
Use operand statement to define 3 register operand art, ars and arr (to note again that in this defines
VerilogTMThe use of each statement).Subsequently, iclass statement iclass viterbi{adse1, acs}{outarr, inart,
Inars} assigned operation number adse1 and acs belongs to the common class of instruction viterbi, and above-mentioned instruction viterbi takes two depositors
Operand art and ars is as input, and output is written among register operand arr.
Instruction semantic statement semantic describes and uses VerilogTMThe same subsets of (for operand is encoded)
One or more instruction behavior.By in a plurality of instruction defined in a single semantic statement, can be shared some altogether
With expression formula, and hardware embodiments can become more efficient.Semantic statement allows the variable used be for
Each operand of each operation code defined in the opcode list of statement, and operate for each in opcode list
The single-bit variable that code is specified.This variable has the name identical with operation code, and when this operation code is detected,
Its valuation is 1.It is used for calculating section (VerilogTMSub-portion), in order to indicate the appearance of command adapted thereto.
Such as, TIE code definition one new instruction ADD8_4,48 positional operands in 32 words are followed separately by it
48 positional operands corresponding in one 32 word are added;Also defining another new command M IN16_2, it is at 32 words
In, carry out the selection of the minima of two 16 positional operands, and in another 32 words, respective 16 behaviour can be read
Count:
Opcode ADD8_4 op2=4 ' b0000 CUST0
Opcode MIN16_2 op2=4 ' b0001 CUST0
Iclass add_min { ADD8_4, MIN16_2}{outar r, inars, in art}
Semantic add_min { ADD8_4, MIN16_2}{
Wire [31:0] add, min;
Wire [7:0] add3, add2, add1, add0;
Wire [15:0] min1, min0;
Assign add3=art{31:24}+ars{31:24];
Assign add2=art [23:16]+ars [23:16];
Assign add1=art [15:8]+ars [15:8];
Assign add0=art{7:0]+ars [7:0];
Assign add={add3, add2, add1, add0};
Assign min1=art [31:16] < ars [31:16]?Art{31:16}:
Ars [31:16];
Assign min0=art [15:0} < ars [15:0]?Art [15:0]:
Ars [15:0];
Assign min={min1, min0};
Assign arr=((32{{ADD8_4}}}) & (add))
(({32{{MIN16_2}}}) & (min));
}
Here, op2, CUST0, arr, art and ars are predefined operation numbers as noted above, and opcode
Effect as above is played with iclass statement.
Semantic statement is specified by newly instructing the calculating carried out.As known to professional person, Semantic
The 2nd row in statement specifies the calculating carried out by new ADD84, the 3rd and the 4th row therein to specify to be entered by new MIN16_2
The calculating of row, and last column of this program segment specifies and result write arr depositor.
Returning to the discussion of user's input interface 20, once user have input her required configuration and scaling option,
Set up system 50 just the most down to carry out.As it is shown in figure 5, set up system 50 to receive the configuration being made up of each parameter of user setup
The extendible various features illustrated and designed by user, and by them with defining the every attached of core processor architecture
Add parameter (such as, the various features that user can not revise) to combine, to generate the configuration instruction describing whole processor
100.Such as, in addition to the configuration of user's selection arranges 102, set up system 50 and can also add parameters, with thinking place
The figure place of the physical address space appointment physical address of reason device, the 1st article of instruction that processor 60 is the most pending, etc.
Deng.
In order to the instructions realized as kernel instruction in configurable processor being described and via configuration choosing
The example of the instructions being selected to can use of item, Tensilica company the " Xtensa providedTMInstruction set architecture
(ISA) reference manual " (revising 1.0 editions) be incorporated into herein the most as a reference.
Represent the debugging module 92 that this processor will include on chip, interrupt device 72 and exclusions manages, but not
Device is interrupted including high precedence.
The instruction decoding logic of processor 60;
Illegal command detection logic for processor 60;
The ISA private part of assembly program 110;
The special support program of ISA of compiler 108;
The ISA private part (being used by debugging routine) of disassembler 110;And
The ISA private part of simulated program 112.
Owing to a kind of important allocative abilities is exactly including in of the encapsulation of designated order, so automatically producing these projects
It is valuable.For some thing, if instruction has been configured, then in each instrument, it is possible to use CC condition code
Realizing this step, to manage this instruction, but this is hard-to-use;The more important thing is, it does not allow system designer easy
The system that ground is for he adds instruction.
Except using configuration instruction 100 as in addition to the input of designer, it is also possible to accept the objectives, and allow
Set up system 50 and automatically determine configuration.Designer can be that the objectives specified by processor 60.Such as, clock frequency, area,
Cost, typical case power consumption and maximum power dissipation etc. can become target.Due to some target, to there is contradiction (such as, the most logical
Cross increase area or power consumption or the two increase to improve performance simultaneously), subsequently, set up system 50 and consult to search engine 106
Ask, to determine the set of available config option, and determine how a kind of calculation simultaneously reaching every input target from attempt
Method there arranges each option.
Different algorithms is possibly used for finding the configuration closest to reaching every input target and arranges.Such as, Yi Zhongjian
Single knapsack encapsulation algorithm considers each option according to numerical value divided by the sequence of cost, and accepts any to increase numerical value
Cost is limited in the option explanation specifying below limit value simultaneously.So, such as, in order to make maximizing performance, protect simultaneously
Hold power and specify numerical value less than one, divided by power, each option can be ranked up according to performance, and accept can increasing property
Can but without departing from each option of power limit.More complicated packsacks algorithm provides backtracking to a certain degree.
A kind of for determining that from target and design database the most different algorithm kind of configuration is based on simulated annealing.Respectively
One random initial set of item parameter is used as starting point, is then come really by one overall application program function of assessment
The fixed change accepted or refuse individual parameters.When connecing according to a threshold value (along with the carrying out optimized, this threshold value reduces) probability
During by the change born, the improvement of application program function is generally accepted.In this system, building from every input target should
Use program function.Such as, given the objectives are: performance>200, power<100, area<4, according to power, area, and property
The priority of energy, it is possible to use following application program function:
Max ((1-Power/100) * 0.5,0)+(max ((1-Area/4) * 0.3,0) * (if Power < 100 then
1 else (1-Power/100) * * 2))+(max (Performance/200*0.2,0) * (if Power < 100 then 1
Else (1-Power/100) * * 2)) * (if Area < 4 then 1 else (1-area/4) * * 2))
The reduction of its return power consumption, until it is less than 100, is neutral subsequently, returns the minimizing of area until it is less than 4,
It is neutral subsequently, and returns the raising of performance, until it is higher than 200, be neutral subsequently.Also have such parts: work as power
During beyond designated value, reduce the use of area, when power or area are beyond designated value, reduce the use of performance.
Both algorithms and other algorithms can be used to search for and meet the various configurations specifying target.It is important that
The design of configurable processor is illustrated in a design database, and this data base has prerequisite and every
The explanation of incompatibility option, and the impact that difference is measured by each config option.
The example that we are given has used every hardware target, and these targets are general, and do not rely on operation
Special algorithm on processor 60.Described algorithm can be utilized to select match with specific user program
Configuration.Such as, user program can run on and have on the accurate emulator of cache memory, to measure inhomogeneity
The number of the cache memory of type, these cache memories have different characteristics, the most different sizes, difference
Live width and different relatedness is set.The result of these emulation can be added to the data that searching algorithm 106 is used
In storehouse, above-mentioned algorithm is described to help to select hardware embodiments explanation 40.
It is likewise possible to for some appearance instructed to modify user's algorithm, above-mentioned instruction can be the most implanted
Among hardware.Such as, if user's algorithm takes a significant amount of time carries out multiplying, then search engine 106 can be automatically
A hardware multiplier is included in suggestion in.Such algorithm is not necessarily limited to consider a kind of user's algorithm.User can be by one group
Algorithm sends into system, and search engine 106 can select such a to configure, and on average, such configuration is to user's journey
The set of sequence is useful.
In addition to the pre-configured characteristic of selection processor 60, searching algorithm can be utilized to automatically select or to
The TIE extension that family suggestion is possible.Provide every input target, and provide the user program that may write with C programming language
Example, these algorithms will advise possible TIE extension.For not having the TIE of state to extend, pattern match journey can be used
Sequence is carried out embedded category and is similar to the various instruments of compiler.These pattern matchers according to bottom-up approach at expression formula node
Middle search can connect, with an individual character, the multiple byte instruction pattern that instruction replaces.Such as, user's c program contains following statement:
X=(y+z) < < 2;
X2=(y2+z2) < < 2;
Two numbers are added on two diverse locations by pattern matcher by this user of discovery, and result is moved to left two
Position.System will produce the probability of a TIE instruction (result is moved to left two by two number phase adductions) and add among data base.
Set up system 50 and follow the tracks of the TIE instruction that many bars are possible, together with them, a counting of how many times occurs.Use one
Planting trace tool, system 50 is also followed the tracks of during the whole execution of this algorithm, the frequent degree that each instruction is performed.Make
With a hardware emulator, system 50 is followed the tracks of to realize each possible TIE instruction, and the expense of hardware has much.These
Numeral is admitted to search for heuristic algorithm, in order to select one group of possible TIE instruction that can make every input target maximum;Above-mentioned
Target such as performance, code size, hardware complexity etc..
Similar but more strong algorithm is used to the possible TIE instruction finding have state.Several different
Algorithm is used to detect different types of chance.A kind of algorithm uses the instrument of similar compiler to scan user program, and
And detect this user program the need of the more depositor that can be provided by than hardware.As many practitioners institute of industry is ripe
As knowing, by the counting to register spilling, just this situation can be detected, and with the pattern after the compiling of personal code work
Recovered (taking-up).The instrument being similar to compiler advises an association with additional hardware registers 98 to search engine
Processor, but it is only supported for personal code work, the computing with the part repeatedly overflowed and recover.This instrument is responsible for logical
Know that the data base that search engine 106 is used claims: about coprocessor hardware cost estimation and about user's algorithm
Can how estimation be improved.As it has been described above, whether proposed coprocessor 98 can be caused more by search engine 106
Good configuration this point makes the judgement of the overall situation.
Alternatively, or in conjunction, and the instrument being similar to compiler checks whether user program uses position
Masking operation, to ensure that some variable is never more than some limit value.In this case, this instrument is advised to search engine 106
At one association using the data type consistent with user's limit value (such as, 12 or 20 or the integer of any other size)
Reason device 98.The 3rd kind of algorithm used in another embodiment, for the user program write with C Plus Plus, is similar to compile
The instrument of translator program finds that a lot of times all consume in the computing to user-defined abstract data type.If all computings are all
Based on being applicable to the data type of TIE, then this algorithm proposes in this kind of data type to search engine, at a TIE association
Reason device realizes all of computing.
In order to generate the instruction decoding logic of processor 60, produce for each group of operation code defined in configuration instruction
One group of signal.By simply by following statement
Opcode NAME FIELD=VALUE
It is rewritten to HDL statement
Assign NAME=FIELD=VALUE;
And will
Opcode NAME FIELD=VALUE PARENTNAME [FIELD2=
VALUE2]
It is rewritten to
Assign NAME=PARENTNAME & (FIELD==VALUE)
Just can produce this code.
The generation of depositor interlocking and pipelined digital signal also has been carried out automatization.This logic is also based on configuration instruction
In information and produce.Information and the latency of this instruction is used based on the depositor being included in iclass statement, when
When the source operand of present instruction depends on the target operand of the previous instruction not yet completed, the logic produced is inserted
Enter a hang-up (or bubble).The mechanism realizing this hang-up function realizes as a part for hardcore.
By individual other command signal produced being carried out NOR-operation, and its result is retrained bar with their field
Part is carried out and computing, produces illegal command detection logic:
Assign illegalinst=!(INST1|INST2…|INSTn);
Each instruction decoding signal and illegal command signal can be used as the output of decoder module and as hand-written processor
The input of logic.
In order to produce other processor feature, the present embodiment uses the Verilog of configurable processor 60TMDescribe, and
And strengthened with a kind of preprocesor language based on Perl.Perl is a kind of full characteristic language, including complicated control
Structure processed, subprogram and I/O device.It is referred to as TPP in one embodiment of the invention (such as the source listing in Appendix B
Shown in, TPP itself is one section of Perl program) preprocessor, scan its input, some line identifier for using preprocessor
The pre-processor code (those with branch as prefix are used for TPP) that language (Perl is used for TPP) is write, and build one section of journey
Sequence, including the row extracted and statement, to produce the text of other row.The row of non-preprocessor can have the table of embedding
Reaching formula, on its position, the expression formula produced as the result of TPP process is replaced.Then, the program obtained by execution with
Produce source code, i.e. in order to describe the Verilog of detailed processor logic 40TMCode (as will see below that
Sample, TPP is also used to configuration software developing instrument 30).
When for this occasion, due to it allow will such as configuration instruction inquiry, conditional expression and iteration structure that
The structure of sample brings Verilog intoTMAmong code, and as noted earlier, it is allowed to according at VerilogTMCode
Among configuration instruction 100 realize embed expression formula, so TPP is a kind of strong pretreatment language.Such as, based on
Data base querying TPP distribution be similar to
;$ endian=config_get_value (" IsaMemoryOrder ")
Here, config_get_value be illustrate in order to query configuration 100 TPP function, IsaMemoryOrder
It is the mark arranged in configuration instruction 100, and $ endian is by afterwards for generating VerilogTMThe one of code
Individual TPP variable.
TPP conditional expression can be
;if(config_get_value(“IsaMemoryOrder”)eq“LittleEndian”)
;{ perform Verilog according to low level formerly orderTMCode }
;Otherwise
;{ perform Verilog according to uimsbf unsigned integer most significant bit first orderTMCode }
Iterative cycles can be realized, such as by TPP structure
;For ($ i=0;$i<$ninterrupts;$i++)
;{do VerilogTM code for each of 1...N interrupts}
Here, $ i is a TPP loop index variable, and $ ninterrupts is the number of the interruption specified for processor 60
Mesh (uses config_get_value to obtain from configuration instruction 100).
Finally, TPP code can be embedded into VerilogTMExpression formula, such as
Wire [` $ ninterrupts-1`:0] srInterruptEn;
Xtscenflop # (` $ ninterrupts`) srintrenreg (srInterruptEn, srDataIn_W [` $
Ninterrupts-1`:0], srIntrEnWEn,!CReset, CLK);
Here, $ ninterrupts definition is interrupted number also determines xtscenflop module (trigger is original
Module) width (representing with bit);
SrInterruptEn is the output of trigger, is defined as a string an appropriate number of bit;
SrDataIn_W is the input of trigger, but only inputs relevant bit according to the number interrupted;
SrIntrEnWEn is the write enable signal of trigger;
CReset is intended for the removing input of trigger;And
CLK is intended for the input clock of trigger.
Such as, the following input being sent to TPP is given:
; # Timer Interrupt
; if (SIsaUseTimer) {
Wire [`Swidth-1`:0] srCCount;
wire ccountWEn;
//--------------------------------------------------------------
// CCOUNT Register
//---------------------------------------------------------------
Assign ccountWEn=srWEn_W && (srWrAdr_W==`SRCCOUNT);
Xtflop # (`Swidth`) srccntreg-(srCCount, (ccountWEn?SrDataIn_W:
SrCCount+1), CLK);
;For (Si=0; Si<STimerNumber; $i++){
//--------------------------------------------------------------
// CCOMPARE Register
//--------------------------------------------------------------
-
Wire [`Swidth-1`:0] srCCompare` $ i`;
wire ccompWEn`$i`;
Assign ccompWEn`Si`=srWEn_W && (srWrAdr_W==`SRCCOMPARE` $ i`);
xtenflop #(`Swidth`) srccmp`$i`reg
(srCCompare` $ i`, srDataIn_W, ccompWEn`Si`, CLK);
Assign setCCompIntr` $ i`=(srCCompare` $ i`==srCCount);
Assign clrCCompIntr` $ i`=ccompWEn` $ i`;
; }
; } ## IsaUseTimer
and the declarations
$ IsaUseTimer=1
$ TimerNumber=2
$ width=32
TPP generates
Wire [31:0] srCCount;
wire ccountWEn;
//--------------------------------------------------------------
// CCOUNT Register
//--------------------------------------------------------------
Assign ccountWEn=srWEn_W && (srWrAdr_W==`SRCCOUNT);
Xtflop # (32) srccntreg (srCCount, (ccountWEn?SrDataIn_W:
SrCCount+1), CLK);
//--------------------------------------------------------------
// CCOMPARE Register
//--------------------------------------------------------------
Wire [31:0] srCCompareO;
wire ccompWEnO;
Assign ccompWEnO=srWEn_W && (srWrAdr_W==SRCCOMPAREO);
xtenflop #(32) srccmpOreg
(srCCompareO, srDataIn_W, ccompWEnO, CLK);
Assign setCCompIntrO=(srCCompareO==srCCount);
Assign clrCCompIntrO=ccompWEnO;
//--------------------------------------------------------------
// CCOMPARE Register
//--------------------------------------------------------------
Wire [31:0] srCComparel;
wire ccompWEnl;
Assign ccompWEnl=srWEn_W && (srWrAdr_W==`SRCCOMPARE1);
xtenflop #(32) srccmplreg
(srCComparel, srDataIn_W, ccompWEnl, CLK);
Assign setCCompIntrl=(srCComparel==srCCount);
Assign clrCCompIntrl=ccompWEnl;
The HDL so produced describes 114 hardware being used to synthesize for realizing processor, such as in brick 122
Use the DesignCompiler made by Synopsys companyTM.Then, use such as public by Cadence in brick 128
The Silicon Ensemble that department providesTMOr by Avent!The Apollo that company providesTMResult is laid out and connects up.
The most each parts are complete by wiring, and in brick 132, use is such as provided by Synopsys company
PrimeTimeTM, its result is used for reversely annotation and the timing checking of wiring.The product so processed is exactly a hardware spy
Soliciting articles part 134, it can be used by the user to configure capture program 20 and provide further input, in order to further joined
Put iteration.
As above in conjunction with illustrated by logic composite part 122, one of result of configuration processor 60 is exactly one
The hdl file of group customization, by using any one of synthetics of multiple business, just can therefrom obtain special door one
The embodiment of level.The Design Compiler that Synopsys company providesTMJust it is a tool that.Correct in order to ensure
And the embodiment of high performance door one-level, needed for present embodiments providing in user rs environment as making building-up process automatization
Manuscript.When providing such manuscript, institute's facing challenges is exactly the enforcement supporting multiple synthetic methodology and different user
Target.In order to meet the 1st kind of challenge, manuscript cutting is less and the most complete manuscript by the present embodiment.One so
Example be just to provide one read manuscript, it can read and configure 60 relevant all hdl files with specific processor, and provide
Unique timing demands that one timing constraint manuscript is arranged in processor 60, and a manuscript, it is can be used in
The mode of the place and route of door one-level netlist writes out synthesis result.In order to meet the 2nd kind of challenge, the present embodiment is that each is real
Execute target and a kind of manuscript is provided.One such example is just to provide a kind of manuscript in order to obtain the fastest circulation time, and one
Plant the manuscript in order to obtain minimum silicon area, and a kind of manuscript in order to obtain lowest power consumption.
Also these manuscripts are used in other stages of processor configuration.Such as, the HDL model of processor 60 is once write
Go out, it is possible to carry out the correct operation of verification process device 60 with one section of simulated program, as above in conjunction with illustrated by brick 132
Like that.Generally, by running multiple test program in simulated processor 60 or diagnotor completes this step.At quilt
Running a kind of test program in the processor 60 of emulation and may need many steps, such as producing one of test program can perform
Image, produce and can represent with a kind of of executable image that simulated program 112 reads, generate a temporary transient layout with
Just collect simulation result, be provided with post analysis and be used, analyze simulation result, etc..In the prior art, multiple original text abandoned is used
Originally this step was completed.These manuscripts have and include knowledge about simulated environment, such as, should include which hdl file in,
In bibliographic structure, these files can be found in where, needs which file in testboard, etc..In current design, preferably
Mechanism write a manuscript model replaced by parameter and configure out exactly.This configuration mechanism also uses TPP to produce
The list of file required in simulations.
And, in the proof procedure of brick 132, it usually needs write other manuscript, in order to allow designer run
A series of test program.Being usually used to run regression routine group, it makes designer believe given the changing in HDL model
Change will not introduce new mistake.Due to return that manuscript has that many includes about filename, position etc. it is assumed that so they also
Often it is dropped.As it has been described above, for a single test program, run manuscript to generate one, recurrence manuscript write
Become a model.When configuration, configure this model by parameters being replaced into actual numerical value.
Last step that RTL describes the process being converted to hardware embodiments is used place and route (P& exactly
R) abstract netlist is converted to geometric representation by software.The connectivity of P&R software analysis netlist and determine the location of each unit.
Then it attempts going to draw the connection between all unit.Clock network is generally by special attention and as last
Individual step connects up.This process may be under the help of provides some information to each instrument, such as, wish to be drawn close by which unit
Together (referred to as software cluster), the relative position of each unit, it is desirable to which net has little propagation delay, etc..
In order to make this process become easier to and ensure compliance with required performance objective circulation time, area, merit
Consumption configuration mechanism is that P&R software generation one is solicited contributions originally or input file.These manuscripts possibly together with all if desired for how many power supplys
With ground link, how these lines should be distributed along border, etc..By inquiry, one data base produces these original texts
This, in this data base, containing being related to generate how much software cluster, and which unit should include them in, which
Net is important in timing, etc..These parameters according to which option the most chosen change.These manuscripts must be
Configured according to the various instruments being prepared for being laid out and connecting up.
Alternatively, this configuration mechanism can ask more information from user there, and is sent to P&R manuscript.Such as,
Interface can should insert how many buffer stages to the aspect ratio needed for user requires final layout in clock trees, input and
Which face output pin should be arranged on, the position that these pins are relative or absolute, power supply and the width of earth bus and position
Put, etc..Then these parameters will be sent to P&R manuscript, the layout needed for producing.
Can use more complicated manuscript, it supports the most complicated clock trees.A kind of in order to reduce power consumption
Clock signal is gated by common prioritization scheme exactly.But, it is relatively difficult owing to balance the time delay of all branches
, so this makes the synthesis of clock trees become a more difficult problem.Configuration interface can require correct each to user
Unit is used for clock trees, and carries out all or part of clock trees synthesis.By being informed in each gated clock position in this design
In where, and the time delay that assessment is from buffered gate (qualifying gate) to the input end of clock of each trigger, with regard to energy
Accomplish this step.Then, it clock tree synthesis tool will be provided an item constraint condition, the i.e. time delay of clock buffer will be with respectively
The time delay of door control unit matches.In the ongoing illustrated embodiment, a general Perl manuscript this step is completed.This original text
The gated clock information that this reading is selected according to which option by Configuration Agent business and produces.Once this design is by layout
Complete with wiring, and before final clock trees has synthesized, just run Perl manuscript.
Above-mentioned special handling process can also be made further improvement.Particularly, we will describe a kind of process,
By it, user just can almost instantaneously obtain similar hardware characteristics information, runs those without taking several hours
Cad tools.This process has several step.
The 1st step during this is exactly the group that the set of all config options is divided into each orthogonal option, makes
Obtain an option in a group follows each option in any other group unrelated on the impact of hardware characteristics.Such as, MAC16
Unit is unrelated with any other option on the impact of hardware characteristics.So, an option only having MAC16 option it is formed for
Group.Owing to the impact of hardware characteristics to be depended on the particular combination of these options, so more complicated example is exactly containing in each
Disconnected option, each high level interrupt option and an option group of Timer Options.
2nd step is exactly to characterize each option group impact on hardware characteristics.By obtaining in this set, each option
The various combinations impact on hardware characteristics, realize this sign.For each combines, use a kind of prior description
Process obtains this feature, in above process, derives an actual embodiment and measures its hardware characteristics.Such
Information is stored among an assessment data base.
Last step is exactly to derive special formula, with curve matching and interpositioning, calculates in each option group,
The impact on hardware characteristics of the particular combination of each option.According to the character of each option, use different formula.Such as, due to often
One additional interrupt vector all adds roughly the same logic to hardware, and we use linear function to simulate it to hardware
Impact.In another example, there is the timer units needing high level interrupt option, accordingly, with respect to Timer Options to firmly
The formula of the impact of part relates to the condition formula of several option.
How selection with regard to architecture affects the size offer running time performance and code of application program quickly
Feedback is useful.Selected from several groups of benchmarks of multiple applications.For each field, build in advance
A vertical data base, when how it affects the operation of each application program in this field to different architecture Design decision-makings
Between performance and code size make assessment.Along with user changes the design of architecture, for the application that user is interested
Or for multiple fields, data base is inquired about.Assessment result is fed to user so that she can become with hardware in software benefit
One estimation of compromise upper acquisition between Ben.
Easily RES(rapid evaluation system) can be extended, in order to make processor further with regard to how revising a kind of configuration
Ground optimizes advises.Each config option is connected by such example exactly with set of number, above-mentioned numeral
Represent the impact on the such as increase of area, time delay and power of various cost metrics of this option.Use RES(rapid evaluation system) makes
Calculate a kind of given option and the impact increasing cost is become easy.It only relates to calling for twice, wherein assessment system
Once there is option, once there is no option.The cost variance of this twice assessment represents the impact on increasing cost of this option.Such as,
By the area cost of two kinds of configurations (with and without MAC16 option) is estimated, calculate MAC16 option to increasing area
Impact.Difference during MAC16 option is shown subsequently in interactive mode configuration system.Such a system can guide user to lead to
Cross a series of single step and improve the solution arriving a kind of optimization.
Turning now to software one side of automatic processor configuration process, this embodiment of the present invention is configured with software development
Instrument 30 so that they are special for this processor.Configuration process starts from SDK 30, and this instrument can be promoted
It is applied to multiple different system and instruction set architecture.Such retargetable instrument be extensively studied and
Known to industry.This embodiment uses the instrument of GNU race, and this is a kind of free software, compiles journey including such as GNU C
Sequence, GNU assembly program, GNU debugging routine, GNU chain program, GNU tracing program, and various utility program.Then, pass through
Directly describe each several part producing software from ISA, and by using TPP that each several part of hand-written software is modified, come
Automatically configure these instruments 30.
GNU C compiler can be configured according to several distinct methods.After providing the description of core I SA, in compilation journey
In sequence, many logics depending on machine can use hand-written.In the instruction set of configurable processor, this of compiler
Individual part is common, and carrys out repurposing with hands and allow to be to obtain optimum to carry out fine tuning.But, even if to compiling
For this handwritten portions of program, some code remains automatically generation from ISA describes.Particularly, ISA describes and determines
The set of each constant value of justice, they may be used for each immediate field of various instruction.For each immediate field, all produce
A raw discriminant function, in order to check a most specific constant value can be encoded.When for processor
During 60 generation code, compiler just uses these discriminant functions.This aspect configuring compiler carries out automatization and disappears
Except based on ISA describe and compiler between inconsistent chance occurs, as long as and it make with minimum effort with regard to energy
Change the constant in ISA.
Process TPP carries out pretreatment, if the just configuration of the stem portion of compiler is good.For selecting to control by parameter
For each config option of system, parameters corresponding in compiler is all arranged via TPP.Such as, compiler tool
There is an indexed variable, in order to represent that target processor 60 uses uimsbf unsigned integer most significant bit first order or low level formerly order, and use
Article one, this variable is automatically configured by TPP order, and mentioned order reads sequence parameter from configuration instruction 100.TPP is also
It is used to according to whether encapsulation corresponding in configuration instruction 100 is activated, enables conditionally or the hands of anergy compiler
Work coded portion, this part produces and encapsulates for optional each ISA.Such as, if configuration instruction only includes the option 90 of MAC16,
In compiler, then only include the code producing every multiplication/accumulated instruction.
Compiler is also configured as supporting the instructions that the designer specified via TIE language defines.This support
There are two levels.In lowest level, the instructions of designer's definition can be used for grand, intrinsic function, or is compiled
Code in online (outside) function.This embodiment of the present invention produces a C language header file, and it will be at line function
It is defined as " in-line assembly " code (standard feature of GNU C compiler).Be given designer definition operation code and
After the TIE explanation of operations number, the process generating header file is namely converted to the in-line assembly sentence of GNU C compiler
The flat-footed process of one of method.A kind of alternate embodiment generates containing each of C preprocessor that grand (they refer to
Determine the instructions of in-line assembly) header file.Another alternative plan uses TPP directly to be added by intrinsic function
Among compiler.
Use the chance of instructions by allowing compiler automatically identify, the every finger to designer's definition is provided
2nd layer of support of order.Can directly be defined these TIE instruction by user or automatically generate during configuration.Compiling
Before translating user application, TIE code is automatically watched, and is converted into the C language function of equivalence.The step for
Sample is used to every TIE instruction is carried out high-speed simulation.The C language function of equivalence is partly compiled as compiler and is used
Based on tree-shaped intermediate representation.For each TIE instruction, this expression is stored among a data base.When with
When family application program is compiled, a part for compilation process is exactly a stage mode matcher.User application is compiled as
Based on tree-shaped intermediate representation.In user program, pattern matcher all starts scanning to every one tree from bottom.In scanning
Each step in, what pattern matcher inspection was planted in current point indicates whether that be matched with in data base appoints immediately
What TIE instruction.If there is coupling, then this coupling is registered.After completing the scanning to every one tree, farthest mate
Gather selected.In this tree, maximum match is all replaced into the TIE instruction of equivalence each time.
Above-mentioned algorithm uses the chance of stateless every TIE instruction by automatically identifying.Can also use various additional
Scheme automatically identify use have every TIE of state to instruct chance.One previous part describes for automatically
Ground selects the algorithm with possible every TIE instruction of state.Identical algorithm is used to automatically use answers at C or C++
Instruct with the every TIE in program.When a TIE coprocessor is defined as more depositor, but the most limited
During computing set, just each code region is scanned, to watch whether they there will be register spilling, and those regions
The most only use the set of available computing.If such region is found, code the most in those regions will automatically
It is changed to use instructions and each depositor 98 of coprocessor.The border in region produces conversion operation, in order to
Data are sent into or sends coprocessor 98.Similarly, if a TIE coprocessor is defined as different size of whole
Number carries out computing, and the most each code region is examined, and is the most all accessed with all data watched in this region, because its tool
There is different sizes.For each region of coupling, its code is changed, and glue code is added on border.Class
As, if a TIE coprocessor is defined as realizing the abstract data type of a kind of C Plus Plus, then in that data
All computings in type are all replaced into the instructions of TIE coprocessor.
It should be noted that automatically suggestion TIE instruction and automatically use TIE instruction both of which are the most useful
's.Via inherent mechanism, user can artificially use proposed every TIE to instruct, and the algorithm that can will be used
It is applied to every TIE instruction or each coprocessor 98 artificially designed.
How the instructions no matter designer defines produces, or via each at line function or by means of automatically
Identifying, compiler is required for knowing the potential flanking effect of the instructions that designer defines so that these can be referred to by it
Order is optimized and dispatches.In order to improve performance, traditional compiler optimization personal code work, in order to make required every spy
Property, such as run time performance, code size or power consumption, be optimised.That as known to the professional person that same position is proficient in
Sample, such optimization includes such as rearranging each instruction, or is other semantically equivalent instructions by some Instruction Replacement.
In order to be optimized well,
Compiler should appreciate that each instruction is the different piece how affecting machine.Article two, to machine state
The instruction that different piece carries out reading and writing can freely be reordered.Article two, a same part for machine state is conducted interviews
Instruction be generally not capable of being reordered.For traditional processor, carried out the reading of state by different instructions and/or write
By hardware connection, sometimes through form, enter compiler.In one embodiment of the invention, every TIE instruction is protected
It is set as with keeping all states of processor 60 are read and write.This makes compiler can produce correct code, but limits
Make compiler the ability that when TIE instructs is optimized code occurs.In another embodiment of the present invention, a kind of
Instrument automatically reads TIE definition, and finds that any state is read or write by described instruction for each TIE instruction.
Then, the amendment of this instrument is compiled the form that the optimization program of program is used, in order to accurately simulate each TIE instruction
Effect.
As compiler, the part depending on machine of assembly program 110 includes part and the use automatically generated
The manual coding part of TPP configuration.Some feature that all configurations are common supported by the code of hand-coding.But, collect journey
The main task of sequence 110 is to encode machine instruction, and can automatically generate from ISA describes the coding of instruction with
Decoding software.
Owing to, in several different software tools, coding and the decoding of instruction are all useful, so this of the present invention
Software is concentrated in together by individual embodiment, in order to perform these tasks in an independent software library.Use in ISA describes
Information automatically generate this storehouse.Enumerating for one of each operation code of this storehouse definition, a function, it is by operation code mnemonics
Character string efficient mapping is this member enumerated (StringToOpcode), and is each group of operation code designated order length
Form (InstructionLength), the number of operand, (numberOfOperands), operand field, operand class
Type (that is, depositor or immediate) (operandType), binary coding (encodeOpcode), and memonic symbol string
(opcodeName).For each operand field, this storehouse provides accessor's function, in order to corresponding each in coding line
Bit carries out encoding (fieldSetFunction) and decoding (fieldGetFunction).All these information in ISA description
It is all readily available;Produce library software and only convert this information into executable C language code.Such as, instructions
Coding be recorded among a C aray variable, wherein, each row is both for the coding of a specific instruction, passes through
Each opcode field is set to the numerical value specified for this instruction in ISA describes and produces above-mentioned coding;
EncodeOpcode function is only one group of given operation code and returns the numerical value of this array.
This storehouse also provides for a function, in order to be decoded the operation code in binary command
(decodeInstruction).This function is generated as a sequence of the switch statement of nesting, wherein, outermost
Pairs of switches is tested in the sub-opcode field of the top layer of operation code hierarchical structure, and, nested switch statement pair
Test in the middle-level each sub-opcode field being gradually lowered of operation code hierarchical structure.Therefore, generate for this function
Code there is the structure identical with operation code hierarchical structure itself.
Being given after this storehouse of coding and decoding, the realization of assembly program 110 just becomes to be easy to.Such as, exist
Instruction encoding logic in assembly program is foolproof:
AssembleInstruction (String mnemonic, int arguments [])
begin
Opcode=stringToOpcode (mnemonic);
If (opcode==UNDEFINED)
Error(″Unknown opcode″);
Instruction=encodeOpcode (opcode);
NumArgs=numberOfOperands (opcode);
For i=0, numArgs-1 do
begin
SetFun=fieldSetFunction (opcode, i);
SetFun (instruction, arguments [i]);
end
return instruction;
end
(binary command is converted to one and closely reconfigures assembly code by this program to realize disassembler 110
Readable form) be flat-footed too:
DisassembleInstruction(BinaryInstruction instruction)
begin
Opcode=decodeInstruction (instruction);
InstructionAddress+=instructionLength (opcode);
print opcodeName(opcode);
//Loop through the operands, disassembling each
NumArgs=numberOfOperands (opcode);
For i=0, numArgs-1 do
begin
Type=operandType (opcode, i);
GetFun=fieldGetFunction (opcode, i);
Value=getFun (opcode, i, instruction);
if(i!=O) print ", ";//Commaseparateoperands
//Print based on the type of the operand
switch(type)
Case register:
PrintregisterPrefix (type), value;
Case immediate:
print value;
Case pc_relative_label:
print instructionAddress+value;
//etc.for more different operand types
end
end
This disassembler algorithm is used for the disassembler instrument of a kind of brilliance, and is also used for debugging routine
130, to support the debugging of machine code.
With compiler is compared with assembly program 110, chain program is that ratio is less sensitive to configuration.Most chain programs
It is all standard, and the part even depending on machine also depends primarily on core I SA and describes, and can be a kind of special
Fixed core I SA carries out manual coding.TPP is used from configuration instruction 100, the such as such parameter of order to be configured.Target
The memorizer mapping of processor 60 is other aspects of the configuration needed for chain program.With as before, specifying with TPP
The parameters that memorizer maps is inserted among chain program.In this embodiment in accordance with the invention, by one group of chain program
Manuscript drives GNU chain program, and these chain program manuscripts contain memory map information just.One of this scheme excellent
Point is exactly, if the memorizer of goal systems maps the memorizer being different from processor 60 specified when configuration and maps, then adds
Chain program manuscript can generate afterwards, processor 60 need not be reconfigured, without rebuild chain program.Therefore, originally
Embodiment includes a kind of instrument, it configures new chain program manuscript by different memory mapped parameter.
Configuration instruction is also used to configure the one section of simulated program being referred to as ISS126 being shown in Figure 13.ISS126 is one section
Software application, the behaviour of its simulation configurable processor instruction collection.Be different from such as Synopsys VCS and
Processor hardware model opposed as the Verilog XL of Cadence and NC simulated program, ISS HDL model is that CPU exists
One when performing instruction is abstract.Owing to it need not to simulate each of in whole processor designs each and depositor
Next state changes, so ISS126 can run than simulation hardware faster.
ISS126 allows the program generated for configured processor 60 to be performed on a host computer.Its essence
Really reproduce reset and the interruption behavior of this processor, these behaviors allow to such as device driver and setup code this
The lower-level program of sample is developed.When local code is transformed to built-in application program, this is useful especially.
ISS126 can be used to identify potential problem, and such as architecture, it is assumed that memory order consideration etc., is used not
The target having been inserted into downloading code to reality.
In the present embodiment, the language of a kind of C of being similar to is used to express ISS with carrying out teaching type semantic, to set up C operator
Building block, instruction is converted to function by it.This language can be used to carry out the basic function of simulation interruption, such as, interrupt depositing
Device, position is arranged, interrupt level, vector etc..
Configurable ISS126 is used as following 4 kinds of purposes or the mesh of a part for system design and proof procedure
Mark:
Debugging software application program before hardware becomes can use;
Debugging systems soft ware (such as, compiler and operating system parts);
HDL emulation with verifying for hardware designs compares.ISS is used as quoting of ISA and realizes at processor
During design verification, ISS and processor HDL is diagnotor and application program runs, and from the track quilt of the two
Relatively;And
(this is probably a part for configuration process, or is selecting processor to analyze software application performance
Configuration after, it can be used for further application program adjust).
All of target is desirable that ISS126 can be to the program produced with configurable assembly program 110 and chain program
It is loaded and decodes.They also require ISS to perform instruction semantically and are equivalent to corresponding hardware execution and equivalence
Expectation in compiler.Because these reasons, ISS126 leads from the ISA file in order to define hardware identical with systems soft ware
Go out its decoding and execution behavior.
For listed above the 1st and last target, for ISS126, it is important that be reached as quickly as possible
Required precision.Therefore, ISS126 allows dynamically to control the level of detail of emulation.Such as, unless requested, at a high speed
The details of buffer storage does not emulates, and the simulation of cache memory can dynamically close or turn on.Additionally,
Before ISS126 is compiled, each parts (such as, cache memory and pipeline model) of ISS126 are configured such that
In at runtime, ISS126 seldom makes the action selection depending on configuration.So, from other each portions relating to system
The configurable behavior of all ISS is derived in the source defined divided.
For listed above the 1st and the 3rd target, for ISS126, it is important that when operating system OS not yet
When providing service for the system (target) in design, provide operating system service for application program.For these service, equally
It is essential that when this relevant portion being debugging process, target OS provide these to service.So, system carries
For one design, for transmitting these services between ISS main frame and simulation objectives neatly.Current design depends on ISS
Dynamically control (trap SYSCALL instruction can be switched on and close) and use special SIMCALL instruction to remove requesting host
Operating system service combination of the two.
Last target call ISS126 goes some aspect of analog processor and system action, and these aspects are less than
The level that ISA specifies.Particularly, by for the model from Perl manuscript (it extracts parameters from configuration database 100)
Produce C language code, build the cache memory model of ISS.Additionally, the details of the streamline behavior of instructions
(interlocking such as used based on depositor and functional unit effectiveness require) also derives from configuration database 100.Currently
Embodiment in, a special streamline describes file according to being similar to the syntax of LISP to specify this information.
3rd target call centering line-break is for being accurately controlled.For this purpose it is proposed, in ISS126 one is special
Non-architectural depositor is used to suppress various interruption to enable.
ISS126 provides several interfaces to support the different target that pin is used for:
One errorlevel or command mode (generally combine the 1st and last target uses);
One order circulation pattern, it provides is-not symbol debugging capability, such as, breakpoint, monitoring point, step equifrequency
Numerously for all 4 targets;And
One jack interface, it allow ISS126 by software debugging aid as one perform rear end use (this should
When being configured to the buffer status of selected particular configuration be read and writes).
One interface that can describe with manuscript, it allows the most detailed debugging and performance evaluation.Particularly, this
Interface can be used to compare different configuration of application behavior.Such as, on any breakpoint, from the fortune of a kind of configuration
Row state can follow the running status from another kind configuration to compare, or transfers to latter state.
ISS126 can follow the tracks of the execution of simulated program alternatively.This tracking uses a kind of known to industry
Program counter (PC) Sampling techniques.On the interval of rule, simulated program 126 is to just at the program meter of simulated processor
Number device is sampled.It sets up a rectangular histogram according to the hits of each code region.Simulated program 126 is also to adjusting
The number of times being performed with each edge in figure counts, and its method is, when a call instruction is simulated, and order counting
Device adds 1.When simulation process completes, simulated program 126 writes an output file, including rectangular histogram and call figure edge
Counting, its form is can be by read-out by the tracking observation program of a standard.Owing to simulated program 118 need not use instrument
Device mode (as among the tracking technique of a kind of standard) is modified, so following the tracks of expense do not affect simulation result, and
And this tracking is entirely without damage.
Preferably, system carries out effective hardware processor emulation and software processor emulation.For this purpose it is proposed, this enforcement
Example provides one piece of emulation board.As shown in Figure 6, emulation board 200 uses a compound PLD 202.Such as
Altera Flex 10K200E is emulation processor configuration 60 from hardware.The processor netlist once produced by this system is entered
Row programming, this CPLD device is the most functionally equivalent to last ASIC product.It provides such benefit, i.e. processor 60
Physics realization is feasible, and it runs faster than other emulation modes (such as ISS or HDL), and is accurate on the cycle
's.But, it can not reach every high frequency target that final ASIC can reach.
This block plate enables the designer to assess various processor config option, and the design cycle earlier stage just
Proceed by software development and debugging.It can be also used for the functional verification of this kind of processor configuration.
Resource available on plate 200 be equally be configured into a certain degree of.It is can be held by one owing to mapping
Change places what the PLD (PLD) 217 changed completed, so the memorizer of the most various memory element maps all
Can easily be changed.Equally, by using the storage component part of relatively big (capacity) and suitably determining token bus 222
With the size of 224 (being connected to cache memory 218 and 228), the speed buffering that just processor core can be made to be used is deposited
Reservoir 218 and 228 becomes extendible.
Use this plate to assess a kind of specific processor configuration and relate to several step.1st step is to obtain one group to retouch
State the RTL file of the particular configuration of processor.Next step is to use any one of multiple commercially available synthetic instrument, from
RTL synthesizes the netlist of a gate leve in describing.One such example is exactly the FPGA EXPRESS from Synopsys company.
Then, obtaining a kind of CPLD embodiment by the netlist of gate leve, the program uses the various works typically provided by distributor
Tool.A kind of such instrument is exactly the Maxplus2 from altera corp.Last step uses exactly and is sold by CPLD
The programmable device that business provides again, downloads to this embodiment on the CPLD chip on emulation board.
One of purposes due to emulation board is to support the rapid prototyping embodiment for debugging purpose, thus important
It is that CPLD implementation process cited in paragraph above is automatic.In order to reach this target, by by all relevant
File focus among a single catalogue, customize the various files being supplied to user.Subsequently, it is provided that one the most fixed
The synthesis manuscript of system, the configuration of specific processor can be synthesized in the specific FPGA device that client is selected by it.Warp
The embodiment manuscript of the Complete customization that the various instruments of pin business are used generates the most simultaneously.Such synthesis and embodiment original text
This correct embodiment functionally ensureing there is optimum performance.By suitable order is brought in manuscript, with
Just read in specific processor configures relevant all RTL file, by including suitable order in, in order to based at processor
I/O signal in configuration distributes chip pin position, and by including various order in, in order to obtain for processor logic
The special logical implementations of some pith (such as gated clock), reach correctness functionally.This manuscript
Also by the timing constraint condition detailed to the distribution of all of processor I/O signal, and by the spy to some signal of interest
Different process, improves the performance of this embodiment.One example of timing constraint condition is exactly, by considering this letter onboard
Number time delay, distribute specific input time delay to signal.The example that signal of interest processes is exactly, to the special overall situation
Wiring distribution clock signal, in order to obtain low clock delay difference on CPLD chip.
Preferably, system also configures a proving program group for configured processor 60.Most of picture microprocessors that
The checking of the composite design of sample includes following flow process:
Set up a testboard, in order to emulate this design, and output is compared, compare and can enter in testboard
OK, it is possible to use an external model as ISS126;
Write diagnotor, to produce stimulus;
Scheme as the row of finite state machine covers is used to measure the covering of checking, including covering HDL, reduction
Error rate, the number etc. of the vector run in this design;And
If covering insufficient, writing more diagnotor, and using various instrument, produce various diagnosis journey
Sequence, in order to put into practice this design further.
The present invention uses the flow process that some is similar with this, but in view of the configurability of the design, all portions of this flow process
Part is all modified.This methodology comprises the following steps:
Specifically configure for one and set up a testboard.The configuration of this testboard uses and is similar to retouch for HDL
The scheme stated, and support total Options and the extension wherein supported, i.e. cache memory (capacity) size, bus connect
Mouth, clock, interruption generation etc.;
A kind of particular configuration of HDL is run self-diagnostic procedure.Diagnotor itself is configurable, in order to pin
They are cut out by one specific fragment of hardware.Select which section diagnotor to run and also rely on configuration;
Run the diagnotor produced in a pseudo-random fashion, and after performing each instruction, by processor shape
State compares with ISS 126;And
That measures checking covers the covering instrument using measurement function and row to cover.Equally, monitoring programme and inspection
Program is also run together with diagnotor, to monitor illegal various states and various situation.All these specific to one
For configuration instruction, it is all configurable.
All each verification component all can verify that.TPP is used to realize configurability.
Testboard is a Verilog of the system wherein containing configured processor 60TMModel.Feelings in the present invention
Under condition, testboard includes:
Cache memory, EBI, external memory storage;
External interrupt and bus errors produce;And
Clock produces.
Owing to similar all of above-mentioned characteristic is all configurable, so testboard itself needs to support configurability.
So, such as, size and the width of cache memory, and the number of external interrupt are automatically adjusted according to configuration
Mesh.
Testboard provides stimulus to being device under processor 60.The assembly level of memorizer it is preloaded onto by offer
This point is accomplished in instruction.It also produces to control the behavior of processor 60 such as, the various signals of various interruptions.With
Sample, the frequency of these signals and timing are all controllable by testboard, and are automatically produced by the latter.
Diagnotor has two kinds of configurability.First, diagnotor TPP determines what is tested.Such as,
Through writing a kind of diagnotor in order to test software interrupt.This diagnotor it is to be appreciated that there is how many kinds of software interrupt, with
Just correct assembly code is produced.
Secondly, processor configuration system 10 should determine that any diagnotor is applicable to this configuration.Such as, it is encoded
The processor 60 of this unit just it is not suitable for not containing in order to test the diagnotor of MAC unit.In the present embodiment, pass through
Use a data base containing the information being related to each diagnotor to complete this step.This data base can include for
The following message of each diagnotor:
Use this diagnotor, if certain option is the most selected;
If diagnotor can not go to run with various interruptions;
If diagnotor operationally, needs various special storehouse or various handle;And
If diagnotor can not run in the case of ISS126 collaborative simulation.
Preferably, processor hardware describes the testing tool including 3 types: test generator instrument, monitoring programme and
Covering instrument (or the program of inspection), and a kind of collaborative simulation mechanism.Test generator instrument is to generate one with aptitude manner to be
The various instruments of column processor instruction.They are the sequences of various pseudo random testing generator.The present embodiment is internal uses two kinds
The one that type is developed specially is referred to as RTPG, another kind of referred to as VERA (VSG) based on external tool.Both of which have around
They and the configurability set up.Based on the effective instruction for a kind of configuration, they will produce a series of instruction.These works
Tool also makes the instruction of these new definition be produced randomly to test by processing the various instructions newly defined from TIE
Raw.The present embodiment includes monitoring programme and inspection program, in order to the level of coverage of measuring and design checking.
Monitoring programme and covering instrument run along with once returning operation.Covering instrument monitoring, diagnosing program is done
What, and the HDL put into practice is functionally and logically.Return run whole during collect all these information, and
It is analyzed afterwards, in order to obtaining which part about this logic needs the prompting of test further.The present embodiment uses
Several configurable functional coverage instruments.Such as, for a specific finite state machine, configuring according to one, it is not
Including all of state.Therefore, for that configuration, functional coverage instrument is not required to trial and goes to check those states or jumping
Become.By enabling this instrument TPP to configure, this step just can be completed.
Similarly, the most various monitoring programmes, in order to check the various illegal states occurred in HDL simulation process.This
A little illegal states can be expressed as various mistake.Such as, in one group of 3 state bus, two drivers should not be simultaneously in high electricity
Position.These monitoring programmes are whether configurable basis includes a kind of specific logic under this kind of configuration in, increase or take
Disappear some inspection projects.
HDL is linked together by collaborative simulation mechanism with ISS126.It is used to check when order fulfillment at HDL and
In ISS126, the state of processor is the most identical.Know at it and incorporate which feature for each configuration and need which
In the range of a kind of state compares this, it is also configurable.So, such as, the breakpoint feature (causing) of data
Increase a special depositor.This mechanism needs to know how that the special register to this is new compares.
The instruction semantic illustrated via TIE can be converted into functionally equivalent C language function, in order to is used for
ISS126, and allow system designer for testing and verifying.In configuration database 106, the semanteme of an instruction is various
Instrument is converted to C language function (this instrument uses the syntactic analysis instrument of standard to set up a syntax tree), then along this
Syntax tree, checks whether grammaticality, and exports the corresponding expression formula write as by C language.This conversion needs
The most pre-pass, in order to all expression formulas distribution bit width and rewrite syntax tree and make some conversion be simplified.With it
He compares by converse routine (such as HDL to C or C are to assembler language compiler), and these converse routines are relatively simple,
And can be proceeded by by professional person write from TIE and C language description.
Use the compiler and compilation/disassembler 100, benchmark test application source configured by configuration file 100
Code 118 is compiled and collects, and, using sample data set 124, it is simulated to obtain software features file 130, this article
Part is also sent to user and configures capture program to user feedback.
Selecting for any configuration of hardware and software price/benefit feature that capable acquisition selects for any configuration parameter
Select and open the chance being optimized system by designer further.Particularly, this will make designer select optimal configuration parameter,
These parameters optimize whole system according to some evaluation function.A kind of possible processing procedure is plan based on a kind of greediness
Slightly, i.e. by being repeatedly selected or do not select a kind of configuration parameter.In each step, all select whole system performance and
Price has those parameters of optimal impact.The step for repeat, until can not find the performance that can also improve system always
Till the single parameter of price.Other extensions include watching one group of configuration parameter attentively simultaneously, or use more complicated search
Algorithm.
In addition to obtaining optimal configuration parameter and selecting, this processing procedure can be utilized to build optimum processor
Various extensions.Owing to there is substantial amounts of probability in the various extensions of processor, it is important that limit extension candidate
Number.Wherein, a kind of technology is exactly application software for XRF analysis and only watches those instructions that can improve systematic function or price attentively
Extension.
After having said the operation of the automatic processor configuration system according to the present embodiment that is over, now will be to source
The example of reason device macro-architecture structure configuration.1st example represents advantage when applying the present invention to compression of images.
Locomotion evaluation is a pith of many image compression algorithms (including MPEG video and 263 conference applications).
Video image compression attempts using the similarity from a frame to another frame, to reduce the memory capacity needed for each frame.?
In the case of simplest, each block of image to be compressed can be carried out with the corresponding blocks (identical X, Y location) of reference picture
Relatively (leading or subsequently the image being only close to is compressed).The compression of the image difference between each frame is with indivedual
The compression of image is compared, and the former is generally of higher bit efficiency.In the video sequence, unique characteristics of image is not generally
It is moved between at same frame, so the immediate concordance between each piece of different frame is frequently not exactly at identical
X, on Y location, but have some to deviate.If some pith of image is moved between different frame, then it is necessary
Before these differences are calculated, identify and compensate this motion.This fact means by between continuous print figure
Difference (including the feature to various uniquenesses, and the X in the subimage for calculated difference, Y deviates) between Xiang
Encode, just can obtain the expression that contrast is the strongest.For calculate the deviation on the position of image difference be referred to as motion vow
Amount.
In this class compression of images, the heaviest calculating task determines optimal motion vector for each piece exactly.
The common method selecting motion vector is exactly in each block of image compressed and the set of each candidate blocks of previous frame image
Between, find out the vector of the mean difference between pixel and pixel with minimum.Each candidate blocks is around the block compressed
Position on the set of all of each contiguous block.The size of image, the size of block, and the size of each contiguous block, all affect
The operation time to motion estimation algorithm.
Each frame subimage of image to be compressed is carried out by simple block-based estimation with a frame reference picture
Relatively.In the video sequence, reference picture can lead over or follow thematic map picture.At each occurrence, at thematic map
Before decompressed, this reference picture decompressed system should be considered effective.One block of image to be compressed is with reference
Comparative descriptions between each candidate blocks of image is as follows.
Around correspondence position in a reference image, once search for for each piece of the image that is the theme.Generally, to image
Each of chrominance component (such as YUV) be analyzed individually.Sometimes, only to a kind of component, such as brightness, it is analyzed.
Between each possible block of the region of search of theme block and reference picture, calculate the mean deviation between pixel and pixel
Different.This difference is exactly the absolute value of the difference of the size of pixel number.Meansigma methods follow in the antithetical phrase of each piece N2 pixel it
Be directly proportional (dimension that here, N is this block).Produce minimum average B configuration pixel difference reference picture block definition thematic map as
The motion vector of this block.
Example below represents a kind of simple form of motion estimation algorithm, and then using TIE is a little special merit
Can its algorithm of unit optimization.This optimizes the acceleration effect producing more than 10 times so that applied compression based on processor is in being permitted
Many Video Applications.It illustrates one to be readily able to by high-level language and is programmed the place that combines with the efficiency of specialized hardware
The function of reason device.
This example uses two matrix OldB and NewB, represents old image and new images respectively.The size of image is by really
It is set to NX and NY.Block size is confirmed as BLOCKX and BLOCKY.Therefore, this image is multiplied by NY/BLOCKY by NX/BLOCKX
Block forms.Region of search around a block is confirmed as SEARCHX and SEARCHY.Optimum movement vector and numerical value are stored
Among VectX, VectY, and VectB.The optimum movement vector calculated by basic (reference) embodiment and numerical value quilt
It is stored among BaseX, BaseY, and BaseB.These numerical value are used to check by the use instruction extension calculating of this embodiment
Each vector out.In following C code section, it is possible to obtain these basic definitions:
#define NX 64/*
Image width*/
#define NY 32/*
Image height*/
#define BLOCKX 16/*
Block width*/
#define BLOCKY 16/*
Block height*/
#define SEARCHX 4/*
search region
Width*/
#define SEARCHY 4/*
search region
Height*/
unsigned char OldB[NX][NY];/ * old
Image*/
unsigned char NewB[NX][NY];/ * new
Image*/
unsignedshort VectX[NX/BLOCKX][NY/BLOCKY];/ *
Xmotionvector
*/
unsigned short VectY[NX/BLOCKX][NY/BLOCKY];/ * Ymotion vector
*/
unsigned short VectB[NX/BLOCKX][NY/BLOCKY];/ * absolute
Difference*/
unsigned short BaseX[NX/BLOCKX][NY/BLOCKY];/ * Base X motion
Vector*/
unsigned short BaseY[NX/BLOCKX][NY/BLOCKY];/ * Base Y motion
Vector*/
unsigned short BaseB[NX/BLOCKX][NY/BLOCKY];/ * Base absolute
Difference*/
#define ABS(x) (((x)<0)?(-(x)): (x))
#define MIN (x, y) (((x) < (y))?(x): (y))
#define MAX (x, y) (((x) > (y))?(x): (y))
#define ABSD (x, y) (((x) > (y))?((x) one (y)): ((y)-(x)))
Locomotion evaluation algorithm includes 3 nested circulations:
1. each source block in pair old image.
2. pair in each object block of the new images around block region, source.
3. calculate the absolute difference between every a pair pixel.
The complete code of this algorithm is listed below.
Reference software embodiment
void
motion_estimate_base()
{
Int bx, by, cx, cy, x, y;
Int startx, starty, endx, endy;
Unsigned diff, best, bestx, besty;
For (bx=0;bx<NX/BLOCKX;bx++){
For (by=0;by<NY/BLOCKY;by++){
Best=bestx=besty=UINT_MAX;
Startx=MAX (0, bx*BLOCKX-SEARCHX);
Starty=MAX (0, by*BLOCKY-SEARCHY);
Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX);
Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY);
For (cx=startx;cx<endx;cx++){
For (cy=starty;cy<endy;cy++){
Diff=0;
For (x=0;x<BLOCKX;x++){
For (y=0;y<BLOCKY;y++){
Diff+=ABSD (OldB [cx+x] [cy+y],
NewB [bx*BLOCKX+x] [by*BLOCKY+y]);
}
}
if(diff<best){
Best=diff;
Bestx=cx;
Besty=cy;
}
}
}
BaseX [bx] [by]=bestx;
BaseY [bx] [by]=besty;
BaseB [bx] [by]=best;
}
}
Basic embodiment is simple, and it can not use the more inherence in the comparison between this piece and block parallel
Property.Configurable processor architecture provides two kinds of important instruments, can significantly speed up the execution of this application program.
First, this instruction set architecture includes strong funneling displacement primitive, it is allowed to the most quickly take out
Take out-of-alignment field.This allow pixel ratio compared with internal ring from memorizer, effectively take out the group of adjacent each pixel.This ring can
To be rewritten, make it to simultaneously run on 4 pixels (byte).Particularly, in order to reach the purpose of this example, people
Wish to define a new instruction, in order within the same time, calculate the absolute difference of 4 pixels pair.But, defining this
Before new instruction, it is necessary to be again carried out this algorithm, to utilize such instruction.
The appearance of this instruction allows to obtain such improvement in internal ring mathematic interpolation, i.e. opening of ring becomes same
Noticeable.The C language code of internal ring is rewritten, in order to utilizes new absolute difference summarizing instruction and effectively shifts.Ginseng
A part for 4 the overlapping blocks examining image just can compare in same ring.(x y) corresponds to be added SAD
The new intrinsic function of instruction.(x, y) moves to right SRC to the chain of x and y, and its displacement is stored among SAR depositor.
Use the immediate mode of the estimation of SAD instruction
/
void
motion_estimate_tie()
{
Int bx, by, cx, cy, x;
Int startx, starty, endx, endy;
Unsigned diff0, diff1, diff2, diff3, best, bestx, besty;
Unsigned*N, N1, N2, N3, N4, * O, A, B, C, D, E;
For (bx=0;bx<NX/BLOCKX;bx++){
For (by=0;by<NY/BLOCKY;by++){
Best=bestx=besty=UINT_MAX;
Startx=MAX (0, bx*BLOCKX-SEARCHX);
Starty=MAX (0, by*BLOCKY-SEARCHY);
Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX);
Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY);
For (cy=starty;cy<endy;Cy+=sizeof (long))
For (cx=startx;cx<endx;cx++){
Diff0=diff1=diff2=diff3=0;
For (x=0;x<BLOCKX;x++){
N=(unsigned*) & (NewB [bx*BLOCKX+x]
[by*BLOCKY]);
N1=N [0];
N2=N [1];
N3=N [2];
N4=N [3];
O=(unsigned*) & (OldB [cx+x] [cy]);
A=O [0];
B=O [1];
C=O [2];
D=O [3];
E=O [4];
Diff0+=SAD (A, N1)+SAD (B, N2)+
SAD (C, N3)+SAD (D, N4);
SSAI(8);
Diff1+=SAD (SRC (B, A), N1)+
SAD (SRC (C, B), N2)+SAD (SRC (D, C),
N3)+SAD (SRC (E, D), N4);
SSAI(16);
Diff2+=SAD (SRC (B, A), N1)+
SAD (SRC (C, B), N2)+SAD (SRC (D, C),
N3)+SAD (SRC (E, D), N4);
SSAI(24);
Diff3+=SAD (SRC (B, A), N1)+
SAD (SRC (C, B), N2)+SAD (SRC (D, C),
N3)+SAD (SRC (E, D), N4);
O+=NY/4;
N+=NY/4;
}
if(diff0<best) {
Best=diff0;
Bestx=cx;
Besty=cy;
}
if(diff1<best) {
Best=diff1;
Bestx=cx;
Besty=cy+1;
}
if(diff2<best) {
Best=diff2;
Bestx=cx;
Besty=cy+2;
}
if(diff3<best) {
Best=diff3;
Bestx=cx;
Besty=cy+3;
}
}
}
VectX [bx] [by]=bestx;
VectY [bx] [by]=besty;
VectB [bx] [by]=best;
}
}
}
The present embodiment uses following SAD function to assess final new instruction:
The absolute difference summation of 4 bytes
/
static inline unsigned
SAD (unsigned ars, unsigned art)
{
Return ABSD (ars > > 24, art > > 24)+
ABSD ((ars > > 16) & 255, (art > > 16) & 255)+
ABSD ((ars > > 8) & 255, (art > > 8) & 255)+
ABSD (ars & 255, art & 255);
}
In order to debug this new embodiment, use following test program, by by new embodiment and use base
Two kinds of motion vectors and numerical value that the present embodiment is calculated are compared:
Main test program
/
int
Main (int argc, char**argv)
{
int passwd;
#ifndef NOPRINTF
Printf (" Block=(%d, %d), Search=(%d, %d), size=(%d, %d)
N ",
BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY);
#endif
init();
motion_estimate base();
motion_estimate_tie();
Passwd=check ();
#ifndef NOPRINTF
printf(passwd?" TIE version passed n ": " UTIE version
failed\n″);
#endif
return passwd;
}
In whole development process, all will use this simple test program.Here, it should the routine followed is just
Being when a mistake being detected, mastery routine should return 0, otherwise, returns 1.
TIE is used to allow the quick explanation of new instruction.Configurable processor generator can hardware embodiments with
And SDK these two aspects realizes these instructions fully.The optimum integration of new function is generated to hardware by hardware synthesis
Among data path.The software environment of configurable processor is supported to adjust at C and C++ compiler, assembly program, symbol completely
New instruction in examination program, tracing program and Cycle accurate instruction set simulation program.The rapid regeneration of hardware and software makes
Special instruction becomes a kind of quick and reliable instrument accelerated for application program.
This example uses TIE to realize a simple instruction, in order to be performed in parallel the pixel of 4 pixels ask difference,
Take absolute value and add up.This one-byte instruction can carry out 11 kinds of elementary operations (in conventional processing procedure, it may be necessary to many
The instruction of bar independence), as same atomic operation.Complete description be presented herein below:
//define a new opcode for Sum of Absolute Difference(SAD)
//from which instruction decoding logic is derived
Opcode SAD op2=4 ' b0000 CUSTO
//define a new instruction class
//from which compiler, assembler, disassembler
//routines are derived
Iclass sad (SAD}{out arr, in ars, in art}
//semantic definition from which instruction-set
//simulation and RTL descriptions are derived
semantic sad_logic(SAD){
Wire [8: 0] diff01, diff11, diff21, diff31;
Wire [7: 0] diff0r, diff1r, diff2r, diff3r;
Assign diff01=art [7: 0]-ars [7: 0];
Assign diff11=art [15: 8]-ars [15: 8];
Assign diff21=art [23: 16]-ars [23: 16];
Assign diff31=art [31: 24]-ars [31: 24];
Assign diff0r=ars [7: 0]-art [7: 0];
Assign diff1r=ars [15: 8]-art [15: 8];
Assign diff2r=ars [23: 16]-art [23: 16];
Assign diff3r=ars [31: 24]-art [31: 24];
Assign arr=
(diff01[8]?Diff0r:diff01)+
(diff11[8]?Diff1r:diff11)+
(diff21[8]?Diff2r:diff21)+
(diff31[8]?Diff3r:diff31);
}
This description is expressed as defining the minimal steps needed for a new instruction.First of all it is necessary to be that this new instruction is fixed
One group of new operation code of justice.In this case, new operation code SAD is defined as the child-operation code of CUSTO.As noted above
As, CUSTO is predefined as:
Opcode QRST op0=4 ' b0000
Opcode CUSTO op1=4 ' b0100 QRST
It is easy to see that QRST is top layer operation code.CUSTO is the child-operation code of QRST, and SAD is again CUSTO
Child-operation code.This hierarchical structure tissue of operation code allows logic groups and the management of opcode space.To be remembered one
Important thing is exactly the opcode space that CUSTO (and CUST1) is defined as retaining, in order to user adds new instruction.Best
It is that user rests on distributed opcode space, to ensure following re-usability that TIE describes.
The 2nd step in TIE describes is one new instruction class of definition, and it contains and newly instructs SAD.Here it is SAD
The place that each operand of instruction is defined.In this case, SAD includes 3 register operand, destination register arr,
Source register ars and art.As noted earlier, arr is defined as the depositor indexed by field r of this instruction,
Ars and art is defined as field s with this instruction and the depositor of t index.
Last block in description is that SAD instruction provides formal semantical definition.This description uses Verilog HDL language
A subset, in order to describe combination logic.How SAD instruction will be emulated by this block accurately regulation ISS just, with
And how to synthesize an adjunct circuit and be added in configurable processor hardware go to support new instruction.
Secondly, TIE is described and carries out debugging and verification by the various instruments that use above describes.Just describing at checking TIE
Really after property, next step is exactly that assessment newly instructs hardware size and the impact of performance.As set forth above, it is possible to use such as
Design CompilerTMComplete this step.After Design Compiler completes work, it is defeated that user can watch it attentively
Go out, in order to obtain detailed area and speed report.
Checking TIE be described as correct and effective after, here it is configure and construction one also support what new SAD instructed
The time of configurable processor.As it has been described above, use graphical user interface GUI to complete this step.
Again, locomotion evaluation code is compiled as the code for configurable processor, and configurable processor uses instruction
Collection simulated program carrys out the correctness of proving program, it is often more important that measure its performance.This step is completed: run by 3 steps
Use the test program of simulated program;Run basic embodiment to obtain instruction count;And run new embodiment with
Obtain instruction count.
The simulation data of 2nd step be presented herein below:
Block=(16,16), Search=(4,4), size=(32,32)
TIE version passed
Simulation Completed Successfully
Time for Simulation=0.98 seconds
Events Number Number
per 100
instrs
Instructions 226005(100.00)
Unconditional taken branches 454(0.20)
Conditional branches 37149(16.44)
Taken 26947(11.92)
Not taken 10202(4.51)
Window Overflows 20(0.01)
Window Underflows 19(0.01)
The simulation data of last step be presented herein below:
Block=(16,16), Search=(4,4), size=(32,32)
TIE version passed
Simulation Completed Successfully
Time for Simulation=0.36 seconds
Events Number Number
per 100
instrs
Instructions 51743(100.00)
Unconditional taken branches 706(1.36)
Conditional branches 3541(6.84)
Taken 2759(5.33)
Not taken 782(1.51)
Window Overflows 20(0.04)
Window Underflows 19(0.04)
From these two parts reports it can be seen that have been achieved for the acceleration of about 4 times.It should be noted that configurable processor
Instruction set simulation program is also provided that other useful informations more.
After verifying this program correctness and performance, next step uses Verilog as above to imitate exactly
Proper program carrys out testing results program.Professional person can find from the makefile of appendix C that the details of this process is (relevant
Each file is also shown in appendix C).The purpose of this emulation is exactly to verify the correctness of new embodiment further, and
And, it is often more important that so that this section of test program becomes the regression tested part for this configured processor.
Finally, it is possible to use such as Design CompilerTMCarry out synthesis processor logic, and use such as A
polloTMIt is laid out and connects up.
In order to illustrate simple and clear and simple for the sake of, video compress and locomotion evaluation have been made the sight once simplified by this example
Examine.It practice, in standard-compression algorithm, there is many additional nuances.Such as, MPEG2 typically divides with sub-pixel
Resolution carries out locomotion evaluation and compensation.Two adjacent row and columns of each pixel can be averaged, to generate one group of pixel, interior
It is inserted on an ideal position in the imagination between two row or two row.Here, due to only with 3 or 4 row TIE codes just
One group of parallel pixel average algorithm can be easily achieved.So the user of configurable processor defines instruction and again becomes useful
's.Pixel in a row averagely reuses the effective alignment function of the standard instruction set of this processor.
Therefore, the absolute value summarizing instruction including a simple difference in only increases hundreds of door, but to locomotion evaluation
The improvement of performance is more than 10 times.This acceleration expression is markedly improved in final system in terms of cost lattice and power-efficient.
And, the seamless extension of SDK (including new locomotion evaluation instruction in) allow quick prototype development,
Delivering of performance evaluation and complete software application solution.The solution of the present invention makes application specific processor
Configuration is simple, reliable and complete, and provides at the aspect such as cost, performance, function and power-efficient of final system product and draw
The improvement that people gazes at.
Focus on the example adding a hardware function units as one, it is considered to the basic configuration shown in Fig. 6, wherein wrap
Include processor and control function, program counter (PC), branching selection, command memory or cache memory and instruction decoding
Device, and basic integer data path, including main register file, bypass multiplexer, pipeline register, arithmetic
Logical block ALU, address generator and the data storage for cache memory.
Occur conditionally multiplier logic (when arranging " multiplier " parameter time) while write HDL, and such as Fig. 7
Shown in, multiplier unit is added as new pipeline stages and (if desired supports accurate exclusions, then require to be transformed into remove
Outer situation processes).Certainly, the various instructions of use multiplier are preferably added along with new unit.
As the 2nd example, as shown in Figure 8, a full coprocessor can be added to basic configuration, be used as such as to take advantage of
Digital signal processor as method/summing elements.This just serves change to the control band of processor, and for example, multiplication tires out
Add computing and add various decoding control signals, be decoded including to the content of the source and target depositor from extended instruction;
Suitable streamline time delay is added for each control signal;Extended register target logic;It it is a depositor bypass multiplexing
Device adds and controls, in order to send number from accumulating register, and includes a multiply-accumulator in, as an instruction
Perform the possible source of result.Additionally, it also needs to add a multiply-accumulator, the latter brings additional each cumulative
Depositor, a multiply-accumulate array and source for master register source select multiplexer.Equally, coprocessor is added
Bringing the extension of the depositor bypass multiplexer from accumulating register, it takes out a source from accumulating register, and
And extended loading/alignment multiplexer, in order to from multiplier results, take out a source.Further, in order to actual hardware
Being used together new functional unit, native system preferably increases some instructions.
Combine with digital signal processor and seem that another option useful especially is exactly a floating point unit.Such one
The functional unit of individual enforcement such as IEEE754 single-precision floating point computing standard can be together with the instructions for accessing it
Add.Floating point unit can be used for the application scenario of such as Digital Signal Processing, such as audio compression and decompression.
Another example as the motility of native system, it is considered to 4KB memory interface as shown in Figure 9.Use this
Bright configurability, each depositor and each data path of coprocessor can be wider than main integer register file and data path
A bit or narrowly, and the width of local storage can change so that memory width processes equal to the widest processor or association
The width (memorizer addressing when reading and writing the most correspondingly is adjusted) of device.Such as, Figure 10 represents one for processor
Local memory system, loading and the storage of 32 to a processor/coprocessor combination supported by this processor.Above-mentioned group
It is combined in identical array addressing, but loading and the storage of 128 supported by this coprocessor.This can come real with TPP code
Existing
Function memory (Select, A1, A2, DI1, DI2, W1, W2, DO1, DO2)
;SB1=config_get_value (" width_of_port_1 ");SB2=
config_get_value(″width_of_port_2″);
;$ Bytes=config_get_value (" size_of_memory ");
;$ Max=max ($ B1, $ B2);$ Min=min ($ B1, $ B2);
;$ Banks=$ Max/SMin;
;$ Wide1=($ Max==$ B1);$ Wide2=($ Max==$ B2);
;$ Depth=$ Bytes/ (log2 ($ Banks) * log2 ($ Max));
Wire [` $ Max`*8-1: 0] Data1=` $ Wide1`?DI1:(` $ Banks`{DI1}};
Wire [` $ Max`*8-1: 0] Data2=` $ Wide1`?DI2:{` $ Banks`{DI2}});
Wire [` $ Max`*8-1: 0] D=Select?Data1:Data2;
Wire Wide=Select?Wide1:wide2;
Wire [log2 (` $ Bytes`)-1: 0] A=Select?A1:A2;
Wire [log2 (` $ Bytes`)-1: 0] Address=A [log2 (` $ Bytes`)-
1:log2 (` $ Banks`)]:
Wire [log2 (` $ Banks`)-1: 0] Lane=A [log2 (` $ Banks`)-1: 0];
;For ($ i=0;$i<$Banks;$i++){
Wire WrEnable (i}=Wide | (Lane==(i});
Wire [log2 (` $ Min`)-1: 0] WrData` $ i`=D [(i}+1) * ` $ Min`*8-
1:{i) * ` $ Min`*8]
Ram (RdData` $ i`, Depth, Address, WrData` $ i`, WrEnable` $ i`);
;}
Wire [` $ Max`*8-1: 0] RdData={
;For ($ i=0;$i<$Banks;$i++){
RdData` $ i`,
;}
}
Wire [` $ B1`*8-1: 0] DO1=Wide1?RdData:RdData [(Lane+1) * B1*8-
1:Lane*B1*8];
Wire [` $ B2`*8-1: 0] DO2=Wide2?RdData:RdData [(Lane+1) * B2*8-
1:Lane*B2*8];
Here, $ Bytes is total memory size, under the control of write signal W1, on the byte ground of data/address bus D1
At the A1 of location, access with width B1, or use corresponding parameter B2, A2, D2 and W2.In a given cycle, only
It is movable for having one group of signal defined by Select.Memorizer is embodied as a set of memory pool by TPP code.Each
The width in individual pond is multiplied by the maximum ratio with minimum access width by the number of minimum access width and pond and is given.One for
Circulation is used to illustrate each memory pool and relevant write signal thereof, i.e. writes enable and writes data.2nd for follows
Ring is used to collect the data read from all each ponds, and is sent to one group of single bus.
Figure 11 represents the example that user-defined instructions is brought into basic configuration.As it can be seen, can
With with being similar to as arithmetic logic unit alu timing and simple instruction is added in processor pipeline by interface.With
The instructions that this mode is added should not produce hang-up or exclusions, does not contains state, only uses two common sources
Register value and coding line are as input, and produce a single output numerical value.But, if TIE language has appointment
The regulation of processor state, the most such constraints is exactly unnecessary.
Figure 12 represents another example realizing user's definition unit in this system.Function list shown in figure
Unit, a 8/16 parallel cell of data extension of ALU, produce from following ISA code:
Instruction {
Opcode ADD8_4 CUSTOM op2=0000
Opcode MIN16_2 CUSTOM op2=0001
Opcode SHIFT16_2 CUSTOM op2=0002
Iclass MY 4ADD8,2MIN16, SHIFT16_2
A<t, a<s, a>t_
}
Implementation{
Input [31: 0] art, ars;
input[23∶0]inst;
Input ADD8_4, MIN16_2, SHIFT16_2;
output[31∶0]arr;
Wire [31: 0] add, min, shift;
Assign add=(art [31: 24]+ars [31: 24], art [23: 16]+art [23: 16],
Art [15: 8]+art [15: 8], art [7: 0]+art [7: 0] };
Assign min [31: 16]=art [31: 16] < ars [31: 16]?Art [31: 16]:
ars[31∶16];
Assign min [15: 0]=art [15: 0] < ars [15: 0]?Art [15: 0]:
ars[15∶0];
Assign shift [31: 16]=art [31: 16] < < ars [31: 16];
Assign shift [15: 0]=art [15: 0] < < ars [15: 0];
Assign arr={32{ADD8_4}}& add | 32{MIN16_2}}& min |
{32{SHIFT16_2}}& shift;
}
In another aspect of the present invention, it is of particular interest that designer defines instruction execution unit 96, TIE defines
Instructions, including those amendment processor states instruction, it is simply that be decoded in this unit and perform.In the present invention
This aspect, multiple building blocks are added among language, make it possible to what explanation can be read by new instruction and write
Additional processor state." state " statement is used to the processor state that explanation is additional.This explanation starts from keyword
state.The next part of state statement illustrates size and the number of each bit of this state, and each bit of this state be as
What is indexed.Thereafter part is state name, in order to identify the state in other declaratives.Last of state statement
Part is a list of the attribute relevant with this state.Such as,
State [63: 0] DATA cpn=0 autopack
State [27: 0] KEYC cpn=1 nopack
State [27: 0] KEYD cpn=1
Define 3 kinds of new processor states, DATA, KEYC and KEYD.State DATA is 64 bit widths, its each bit quilt
Index is for from 63 to 0.KEYC and KEYD both of which is the state of 28 bits.DATA has a kind of coprocessor number attribute cpn,
Represent which coprocessor is data DATA belong to.
Attribute " autopack " represents that some being automatically mapped in user register file is deposited by state DATA
Device so that the numerical value of DATA can be read by various software tools and write.
User_register part is defined as expression and state is mapped to respectively depositing in user register file
Device.User_register part starts from a keyword user_register, which is followed by one and represents register number
The numeral of code, and using an expression formula representing each status bits to depositor to be mapped as ending.Such as,
User_register 4 { X, Y, z}
The low word specifying DATA is mapped to the 1st user register file, and high-word is mapped to the 2nd user and posts
Register file.Thereafter two user register file lines are used to preserve the numerical value of KEYC and KEYD.It is clear that at this
Status information used in part should be consistent with the state holding that used of part.Here it is possible to by one section of computer
Program automatically checks this concordance.
In another embodiment of the present invention, use packing-box design (bin-packing) algorithm automatically by each state
Bit is assigned to each row of user register file.In yet another embodiment, it is possible to use such as artificial and automatic distribution
Combination ensures compatibility upwards.
Instruction field statement field is used to improve the readability of TIE code.Each field is to be aggregated and use
Each chain each subset of other fields that name is quoted.In an instruction, the complete set of each bit is five-star superset
Field inst, and this field can be divided into less each field.Such as,
field xinst[11:8]
field yinst[15:12]
Fieldxy [x, y]
By two 4 bit field x and y, be defined as highest field inst son field (be respectively bit 8-11 and
12-15), and by 8 bit field xy it is defined as the chain of x and y field.
Statement opcode is coding specific field defining operation code.Intend to specify the instruction field of operand, such as, prepare
The depositor that used by the operation code that so defines or immediately constant, it should first defined with field statement, then use
Operand statement is defined.
Such as,
Opcode acs op2=4 ' b0000 CUSTO
Opcode adse1 op2=4 ' b0001 CUSTO
Operation code CUSTO based on the predefined binary constant 0000 of 4 bit long (4 ' b0000 represent) is determined
Two groups of new operation codes acs of justice and adse1.The preferably TIE of core I SA describes has following statement
field op0 inst[3:0]
field op1 inst[19:16]
field op2 inst[23P:20]
Opcode QRST op0=4 ' b0000
Opcode CUSTO op1=4 ' b0100 QRST
A part as its basic definition.Therefore, the definition of acs and adse1 makes TIE compiler produce respectively
The instruction decoding logic represented by following statement:
Inst [23: 0]=0,000 0110 xxxx xxxx xxxx 0000
Inst [23:0]=0,001 0110 xxxx xxxx xxxx 0000
The instruction operands statement operand each depositor of mark and immediately constant.But, it is being one by a field definition
Before individual operand, it should be defined as a field as above in advance.If this operand be one the most normal
Number, then can produce the numerical value of this constant, or it can be taken out from the constant table of a predefined from this operand,
The definition of constant table will describe below.Such as, in order to an immediate operand is encoded, TIE code
field offset inst[23:6]
operand offests4 offset{
Assign offsets4={{14{offset [17] } }, offset} < < 2;
}{
wire [31∶0]t;
Assign t=offsets4 > > 2;
Assign offset=t [17: 0];
}
Define one 18, the field of entitled offset, it preserves a signed number and an operand
Offsets4, the latter is stored in 4 times of the number in offset field.As professional person is understood, operand language
The last part of sentence is actually described in the Verilog for describing combinational circuitTMOne son of HDL is concentrated in order to carry out
The circuit calculated.
Here, wire statement defines the logic wiring that a group name is t, and its width is 32.After wire statement
1assign statement specify drive logic wiring logical signal be offsets4, and 2assign statement specify t low by 18
Position is put into offset field.The value of 1assign statement directly assigned operation number offsets4 be offset and it
Sign bit (position 17) and follow behind to move to left of 14 parts of two copies chain.
For a constant table operand, TIE code
table prime16{
2,3,5,7,9,11,13,17,19,23,29,31,37,41,43,47,
53
}
operand prime_s s{
Assign prime_s=prime [s];
}{
Assign s=prime_s==prime [0]?4 ' b0000:
Prime_s==prime [1]?4 ' b0001:
Prime_s==prime [2]?4 ' b0010:
Prime_s==prime [3]?4 ' b0011:
Prime_s==prime [4]?4 ' b0100:
Prime_s==prime [5]?4 ' b0101:
Prime_s==prime [6]?4 ' b0110:
Prime_s==prime [7]?4 ' b0111:
Prime_s==prime [8]?4 ' b1000:
Prime_s==prime [9]?4 ' b1001:
Prime_s==prime [10]?4 ' b1010:
Prime_s==prime [11]?4 ' b1011:
Prime_s==prime [12]?4 ' b1100:
Prime_s==prime [13]?4 ' b1101:
Prime_s==prime [14]?4 ' b1110:
4′b1111;
(following the numeral after table name is each element in table to utilize table statement to define constant array prime
Number), and use these operands as enter this table prime index, in order to encode a number for operand prime_s
Value (notes when index of definition, VerilogTMThe use of statement).
Operation code and operand are linked together in a kind of common format by instruction class statement iclass.At one
All instructions defined in iclass statement all have identical form and operand usage.Before one instruction class of definition,
First its each member must be defined as field, is then defined as operation code and operand.Such as, set up at determining above
On the basis of the example of justice operation code acs and adse1, additional statement
Operand art t { assign art=AR [t];}{}
Operand ars s { assign ars=AR{s};}{}
Operand arr r { assign AR [r]=arr;}{}
Use operand statement to define 3 register operand art, ars and arr (to note again that in definition
VerilogTMThe use of statement).Then, iclass statement
Iclass viterbi [adse1, acs] [out arr, in art in ars]
Assigned operation number adse1 and acs belongs to a common class of instruction viterbi, and it takes two register operand
Art and ars is as input, and output is written in a register operand arr.
In the present invention, instruction class statement iclass is modified to allow to carry out the conditional access information of each instruction
Explanation.It starts from keyword " iclass ", which is followed by the name of this instruction class, belongs to the row of the operation code of this instruction class
Table and a list of operand access information, and end at a new list that define, for conditional access information.
Such as,
Iclass lddata { LDDATA} { out arr, in imm4} { in DATA}
Iclass stdata { STDATA} { in ars, inart} { out DATA}
Iclass stkey { STKEY} { in ars, in art} { out KEYC, out KEYD}
Iclass des { DES} { out arr, in imm4} { inout KEYC, inout
DATA, inout KEYD}
Define several instruction class and how various new instruction accesses various state.Keyword " in ", " out " and
" inout " is used to refer to this state by respectively instructing reading, write or revise (read and write) in iclass.At this example
In, state " DATA " is commanded " LDDATA " and reads, and state " KEYC " and " KEYD " are commanded " STKEY " and write, " KEYC ",
" KEYD " and " DATA " is commanded " DES " amendment.
Instruction semantic statement semantic describes the behavior of one or more instruction, and these instructions use for operand
Carry out the Verilog encodedTMSame subsets.By in a plurality of instruction defined in a single semantic statement, some
Common expression formula can be shared, and hardware embodiments can become more efficient.Semantic statement allows to make
Variable be at each operand for each operation code defined in the opcode list of this statement, and arrange in this operation code
The single-bit variable specified for each group of operation code in table.This variable has the name identical with operation code, and when being somebody's turn to do
When operation code is detected, it is 1 by valuation.It is used for calculating section (VerilogTMSub-portion), corresponding in order to indicate
The appearance of instruction.
//define a new opcode for BYTESWAP based on
// - a predefined instruction field op2
// - a predefined opcode CUST0
//refer to Xtensa ISA manual for descriptions of op2 and CUSTO
Opcode BYTESWAP op2=4 ' b0000 CUST0
//declare state SWAP and COUNT
state COUNT 32
state SWAP1
//map COUNT and SWAp to user register file entries
//define a new instruction class that
// - reads data from ars(predefined to be AR[s])
// - uses and writes state COUNT
// - uses state SWAP
Iclass bs { BYTESWAP}{outarr, inars}{inout COUNT, in
SWAp}
//semantic definition of byteswap
// COUNT the number of byte-swapped words
// Return the swapped or un-swapped data depending on SWAP
semantic bs {BYTESWAP} {
Wire [31: 0] ars_swapped=
{ ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] };
Assign arr=SWAP?Ars_swapped:ars;
Assign COUNT=COUNT+SWAP;
}
The part 1 of above-mentioned code is a group of operation code, referred to as BYTESWAP of new instruction definition.
//define a new opcode for BYTESWAP based on
// - a predefined instruction field op2
// - a prede fined opcode CUSTO
//refer to Xtensa ISA manual for descriptions of op2 and CUSTO
Opcode BYTESWAP op2=4 ' b0000 CUSTO
Here, new operation code is defined as one group of child-operation code of CUSTO.From " Xtensau described in detail belowTMRefer to
Make architecture reference manual " in, it can be seen that CUSTO is defined as
Opcode QRST op0=4 ' b0000
Opcode CUSTO op1=4 ' b0100 QRST
Here, op0 and op1 is all referring to the field in order.Pattern typically in accordance with a kind of hierarchical structure is organized respectively
Operation code.Here, QRST is top-level operation code, and CUSTO is the child-operation code of QRST, and BYTESWAP is again the son of CUSTO
Operation code.This hierarchical structure tissue of operation code allows to carry out opcode space cluster and the management of logic.
Additional processor state needed for 2nd section of explanation expression BYTESWAP instruction:
//declare state SWAP and COUNT
state COUNT 32
Here, COUNT is illustrated as the state of a kind of 32 bits, and the state that SWAP is 1 bit.TIE language is specified
Each bit in COUNT is indexed from 31 to 0, and wherein bit 0 is lowest order.
XtensaTMISA provides two instructions, RSR and WSR, for (by data) be stored in special system register and
It is taken out.Similarly, it provides two other instructions, RUR and WUR (will be described in detail) below, for storage with extensive
The multiple various states illustrated in TIE.In order to store and recover the various states illustrated in TIE, it is necessary at RUR and
The user register document that WUR instruction is able to access that is fixed by the mapping relations of each state to each row.Above-mentioned code following
Part specifies this mapping relations:
//map COUNT and SWAP to user register file entries
Make following each instruction that the numerical value of COUNT be saved in a2, and the numerical value of SWAP be saved in a5:
RUR a2,0;
RUR a5,1;
This mechanism is actually used in test program, in order to verify every diesel locomotive of each state.In C language
In, above-mentioned two instructions have a following form:
X=RUR (0);
Y=RUR (1);
The nested parts that TIE describes is the definition of the new instruction class containing new instruction BYTESWAP:
//define a new instruction class that
// - reads data from ars(predefined to be AR[s])
// - uses and writes state COUNT
// - uses state SWAP
Iclass bs{BYTESWAP}{out arr, in ars}{inout COUNT, in
SWAP}
Here, iclass is keyword, and bs is the name of iclass.Next clause is listed in instruction class
(BYTESWAP) instruction in.Thereafter instruction appointment at the operand used that respectively instructs of this apoplexy due to endogenous wind (is in this example
An one input operand ars and output function number arr).Last clause in iclass defines specifies at this
(in this example, state SWAP will be read the various states that apoplexy due to endogenous wind is accessed by this instruction by this instruction, enter state COUNT
Row is read and writes).
Last block of above-mentioned code is that BYTESWAP instruction provides formal semantical definition:
//semantic definition of byteswap
// COUNT the number of byte-swapped words
// Return the swapped or un-swapped data depending on SWAP
semantic bs {BYTESWAP}{
wire [31∶0] ars_swapped
{ ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] };
Assign arr=SWAP?Ars_swapped:ars;
Assign COUNT=COUNT+SWAP;
}
This description uses a subset of Verilog HDL to describe combination logic.This block accurately specifies to refer to just
How order collection simulated program will emulate BYTESWAP instruction, and how adjunct circuit is synthesized and is added to XtensaTM
Among processor hardware, to support new instruction.
In the present invention realizing various user's definition status, the state illustrated can be made as its dependent variable
With, in order to access and be stored in the information in various state.Occur in status identifier instruction on the right of an expression formula from
This state reads.By a numerical value or an expression formula are distributed to status identifier, just can complete to be written to a kind of shape
State.Such as, semantic code segment table below shows one instructs how to read or write various state:
Assign KEYC=sr==8 ' d2?Art [27:0]: KEYC;
Assign KEYD=sr==8 ' d3?Art [27:0]: KEYD;
Assign DATA=sr==8 ' d0?DATA [63:32], art}:{art,
DATA[63:32]};
In order to illustrate in configurable processor, the example of the various instructions can being performed as kernel instruction, with
And become available each purpose the instructed, " Xtensa that Tensilica company publishes via the selection of each config optionTMInstruction
Architecture (ISA) reference manual " revised edition 1.0 is incorporated into herein the most as a reference.Further, in order to illustrate can be by
It is used for performing the various examples of the TIE sound instruction that such user defines instruction, is published by Tensilica company equally
" instruction extension language (TIE) reference manual " revised edition 1.3 is incorporated into herein the most as a reference.
From TIE describes, it is possible to use such as, it is similar to shown in Appendix D section program and produces these instructions of execution
Hardware embodiments.Annex E represents the header file needed for supporting new instruction as intrinsic function and uses
Code.
Use configuration instruction, can automatically generate the following:
The instruction decoding logic of processor 60;
Illegal command detection logic for processor 60;
The ISA private part of assembly program;
The special support program of ISA for compiler;
The ISA private part (being used by debugging routine) of disassembler;And
The ISA private part of simulated program.
Figure 16 is a figure, represents how the ISA private part of these software tools produces.TIE syntactic analysis journey
It is several sections of Program Generating C language codes that sequence 410 describes file 400 from the TIE that user generates, and each section in said procedure is all produced
A raw file, this document can be accessed by one or more SDKs, in order to obtains defining instruction about user
Information with state.Such as, program tie2gcc 420 generates a C language header file 470 being referred to as xtensa_tie.h, its
Include the intrinsic function for new instruction.Program tie2isa 430 generates a dynamic link libraries (DLL) 480, wherein contains
(in the patent application of Wilson discussed below et al., this is wherein to be begged for define the information of instruction format about user
The coding of opinion and the efficient combination of decoding DLL).Program tie2iss 440 generates performance simulation program, and produces one containing referring to
The DLL490 that order is semantic, as, as discussed in the patent application of Wilson et al., this instruction semantic is compiled by main frame
Program is used for being produced as the simulated program DLL that this simulated program is used.Program tie2ver 450 is retouched with a kind of suitable hardware
Predicate speech is that user-defined instructions produces necessary description 500.Finally, program tie2xtos 460 be produced as RUR and
The preservation that WUR instruction is used and recovery code 510.
To the fine description of instructions and they how to access various state and make it possible to produce and effectively patrol
Volume, this logic is inserted among the design of existing high-performance microprocessor.Describe in conjunction with this embodiment of the present invention
Various method special disposal those from one or more status registers read or write every new instruction therein.Especially
It is that the present embodiment represents in the sense that the implementation of microprocessor class, how to derive the hardware for each status register
Logic, the implementation of above-mentioned microprocessor all uses streamline, as obtaining high performance a kind of technology.
In the such as streamline embodiment shown in Figure 17, a status register is typically replicated several times, often
One illustrates and all represents the numerical value being in the state among a specific pipeline stages.In the present embodiment, a kind of state
It is converted into multiple copies of the depositor consistent with preferential core processor embodiment.Meanwhile, again with preferentially
The consistent mode of core processor embodiment produce additional bypass and forward direction logic.Such as, contain to aim at one
Having 3 core processor embodiments performing the stage, one State Transferring is 3 depositors by the present embodiment, its connection side
Formula is as shown in figure 18.In this embodiment, each depositor 610-630 represent 3 pipeline stages one of them
The numerical value of state in.Ctrl-1, ctrl-2, and ctrl-3 are control signals, in order in corresponding trigger 610-630
Activate data latch function.
The work consistent with preferential processor embodiment carried out to make multiple copies of status register
It is required additional logic and control signal." unanimously " means that state should show and at interruption, exclusions and flowing water
When line hang-up, remaining various state of processor are the most identical.Typically, a kind of given processor embodiment
Definition represents some signal of various pipeline condition.Require that such signal can make pipeline state depositor correctly carry out
Work.
In a typical streamline embodiment, performance element includes multiple pipeline stages.At this streamline
Multiple levels carry out the calculating of an instruction.Instruction stream flows through from streamline according to the sequence guided by control logic.?
Any given time, the instruction of up to n bar the most all may be had to be performed.Here n is the number of level.Exceed standard at one
In the processor of amount, it is possible to use the present invention realizes, and the number of instruction in a pipeline can be n × w, and wherein, w is
The exit width of processor.
The effect controlling logic is to confirm that the dependency between each instruction is complied with, and between each instruction
Any interference be all addressed.If one instruction uses the data calculated by a previous instruction, then need special
Hardware in the case of not blocking streamline, data are delivered to after an instruction.If occurring interrupting, institute the most in a pipeline
There is instruction to be required for being killed, re-execute the most again.When owing to not possessing its required input data or computing hardware and
When making call instruction to perform, this instruction should be suspended.The cheap method hanging up one article of instruction is the 1st execution rank at it
Section just kills it, and re-executes this instruction in next cycle.The result of this technology is exactly to generate in a pipeline
One invalid level (bubble).This bubble instructs together with other, flows through this streamline.At the impaired flowing water of each instruction
The end of line, these bubbles are abandoned.
Use the example of above-mentioned 3 level production lines, adding needed for the typical embodiment of such a processor state
Logic and connection are shown in Figure 19.
Under normal circumstances, the numerical value calculated in one-level will be sent to next instruction immediately, without wait
This numerical value arrives the end of streamline, in order to reduce factor data dependency and number of times that the streamline that introduces is hung up.By directly
The output of the 1st trigger 610 is sent to semantic chunk by ground so that it can be used by next instruction immediately, just can complete this
One step.In order to process such as interrupt and except etc. abnormal conditions, the present embodiment need following 3 kinds of control signal: Kill_1,
Kill_all, and Valid_3.
Signal Kill_1 represents the data required owing to not possessing it, so being currently at the 1st pipeline stages 110
Instruction should be killed.Signal Kill_all represent due to before them one instruction produced a kind of exclusions or
Once interrupting has occurred in person, so all instructions in a pipeline all should be killed.Signal Valid_3 represents current place
Instruction among afterbody 630 is the most effective.This situation is typically to kill one article of instruction in the 1st pipeline stages 610
And the result of a bubble (illegal command) occurs in a pipeline." Valid_3 " represent simply the 3rd pipeline stages it
In instruction be effective or a bubble.It is clear that the most effective instruction should be latched.
Figure 20 is expressed as realizing the additional logic needed for status register and connection.It illustrates how that building control patrols simultaneously
Volume, to drive each signal " ctrl-1 ", " ctrl-2 " and " ctrl-3 " so that the embodiment of status register meet above-mentioned respectively
Item requirement.The sample HDL code that in order to realize status register Figure 19 shown in automatically generate is presented herein below.
Module tie_enflop (tie_out, tie_in, en, clk);
Parameter size=32;
output[size-1:0] tie_out;
input[size-1:0] tie_in;
input en;
input clk;
reg[size-1∶0] tmp;
Assigntie_out=tmp;
always@(posedge clk) begin
if (en)
Tmp≤#1 tie_in;
end
endmodule
Module tie_athens_state (ns, we, ke, kp, vw, clk, ps);
Parameter size=32;
input[size-1∶0] ns;//next state
input we; //write enable
input ke; //Kill E state
input kp; //Kill Pipeline
input vw; //Valid W state
input clk; //clock
output[size-1∶0]ps;//present state
wire[size-1∶0]se; // state at E stage
wire[size-1∶0]sm; // state at M stage
wire[size-1∶0]sw; // state at W stage
wire[size-1∶0]sx; // state at X stage
wire ee; // write enable for EM register
wire ew; // write enable for WX register
Assign se=kp?Sx:ns;
Assign ee=kp | we &~ke;
Assign ew=vw &~kp;
Assign ps=sm;
Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),
\.clk(clk));
Tie_enflop # (size) state_MW (.tie_out (sw) .tie_in (sm),
.en (1 ' b1) .clk (clk));
Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),
\.clk(clk));
endmodule
If semantic chunk specifies this state as its input, then use above-mentioned pipeline state register model, should
The current state value of state is sent to semantic chunk as an input variable.If semantic chunk has produces new numerical value for a kind of state
Logic, then generate one group of output signal.This output signal is used as next state, is input to pipeline state and deposits
Device.
The present embodiment allows multiple semantic description blocks, and each all describes the behavior of a plurality of instruction.This not
Under affined describing mode, it is possible to an only subset of each semantic chunk is that a kind of given state produces next state
Output.Furthermore it is possible to a given semantic chunk depends on conditionally within one period of preset time, it performs any instruction
And produce the output of next state.It is then desired to additional hardware logic goes the next state of combination from all semantic chunks
Output, to form the input being sent to pipeline state depositor.In this embodiment in accordance with the invention, for each semantic chunk certainly
Derive one group of signal, to represent that this block the most produces a new numerical value for this state dynamicly.In another embodiment,
Such one group of signal can be left designer for and go explanation.
Figure 20 illustrates how to combine the next State-output of a kind of state from several semantic chunk sl-sn, and suitably
Select one of them to be input to status register.In this part of figure, op1_1 and op1_2 is the behaviour for the 1st semantic chunk
Making code signal, op2_1 and op2_2 is the operation code signal for the 2nd semantic chunk, etc..The next State-output of semantic chunk i
It is si (if there being multiple status register, then have multiple next State-output for this block).This signal has represented this semantic chunk i
Through producing a new numerical value for this state si_we.Signal s_we indicates whether that any semantic chunk is that this state produces one
Individual new numerical value, and be used as write enable signal and be input to pipeline state depositor.
Even if the ability to express of multi-semantic meaning block is not more than single semantic chunk, it is typically still by relevant instruction set
In to a single block, more structurized description is provided.Owing to performing these instructions in the range of being more confined from,
So multi-semantic meaning block can also cause the simpler analysis to instruction effect.On the other hand, for a single semantic chunk, logical
Often have reason to describe the behavior of a plurality of instruction.Being most commonly that, this is owing to these hardware embodiments instructed are the most prosperous public
Common logic.A plurality of instruction normally results in more effective hardware designs described in the single semantic chunk.
Owing to interrupting and exclusions, for software, it is necessary to load the numerical value of various states to data storage, with
And recover the numerical value of (taking-up) various state from which.Based on new state and the formal description of new instruction, it is possible to real estate automatically
Raw such recovery and loading instruction.In one embodiment of the invention, it is used for recovering with the logic loaded by automatic real estate
Life is two semantic chunks, and the latter can be recursively converted to the actual hardware just like any other block.Such as, from lower column-shaped
In the explanation of state:
State [63: 0] DATA cpn=0 autopack
State [27: 0] KEYC cpn=1 nopack
State [27: 0] KEYD cpn=1
Can produce following semantic chunk, in order to by " DATA ", the numerical value of " KEYC " and " KEYD " reads in each general register:
Iclass rur{RUR}{out arr, in st} (in DATA, in KEYC, in KEYD}
semantic rur(RUR){
Wire sel_0=(st==8 ' d0);
Wire sel_1=(st==8 ' d1);
Wire sel_2=(st==8 ' d2);
Wire sel_3=(st==8 ' d3);
Assign arr={32{sel_0}} & DATA [31: 0]
{32{sel_1}} & DATA[64∶32]
{32{sel_2}} & KEYC
{32{sel_3}} & KEYD;
}
Figure 21 represents the block diagram of the logic corresponding to this class semantic logic.Input signal " st " carry out with various constants
Relatively, to form various selection signal, they are used to consistent method to be described with user_register, from each state
Depositor selects some position.Using previous state description, the position 32 of DATA is mapped to the position 0 of the 2nd user register.Cause
This, the 2nd input of MUX should be connected to the 32nd of DATA state in this figure.
Following semantic chunk can be produced, by the numerical value write state " DATA " from each general register, " KEYC " and
“KEYD”
Iclass wur { WUR}{in art, in sr}{out DATA.out KEYC, out KEYD}
semantic wur (WUR) {
Wire sel_0=(st==8 ' d0);
Wire sel_1=(st==8 ' d1);
Wire sel_2=(st==8 ' d2);
Wire sel_3=(st==8 ' d3);
Assign DATA={sel_1?Art:DATA [63: 32], sel_0?Art:
DATA[31∶0]};
Assign KEYC=art;
Assign KEYD=art;
Assign DATA_we=WUR;
Assign KEYC_we=WUR & sel_2;
Assign KEYD_we=WUR & sel_3;
}
Figure 22 represents when being mapped to the kth position of i-th user register, the logic of the jth position of state S.At one
In WUR instruction, if user_register number " st " is " i ", then the kth position of " ars " is loaded onto S [j] depositor;Otherwise,
The raw value of S [j] is re-circulated.If additionally, reloaded in any position of state S, then signal S_we is activated.
TIE user_register explanation is specified from the Additional processor state defined by state description to by these RUR
Mapping relations with the identifier that WUR instruction is used, in order to this state outside instructing independent of TIE is read
With write.
Annex F represents the code for producing RUR and WUR instruction.
The task that the is mainly applicable to switching of RUR and WUR instruction, in a multitask environment, multi-tasks Software is common
Enjoy the processor run according to some dispatching algorithm.When activated, the state duration of this task is at the depositor of processor
Among.When dispatching algorithm determines to be switched to another task, the state among each depositor of processor that is stored in is deposited
Enter among memorizer, and by among the state of another task depositor from memory loads to processor.XtensaTMRefer to
Architecture (ISA) is made to include that RSR and WSR instructs, in order to state defined in ISA to be read and writes.Such as, following generation
Code is the part that task " is stored in memorizer ":
//save special registers
Rsr a0, SAR
Rsr a1, LCOUNT
S32i a0, a3, UEXCSAVE+ 0
S32i a1, a3, UEXCSAVE+4
Rsr a0, LBEG
Rsr a1, LEND
S32i a0, a3, UEXCSAVE+8
S32i a1, a3, UEXCSAVE+12
;if(config_get_value(″IsaUseMAC16″)){
Rsr a0, ACCLO
Rsr a1, ACCHI
S32i a0, a3, UEXCSAVE+16
S32i a1, a3, UEXCSAVE+20
Rsr a0, MR_0
Rsr a1, MR_1
S32i a0, a3, UEXCSAVE+24
S32i a1, a3, UEXCSAVE+28
Rsr a0, MR_2
Rsr a1, MR_3
S32i a0, a3, UEXCSAVE+32
S32i a1, a3, UEXCSAVE+36
;}
And following code is the part that task " is recovered from memorizer ":
//restore special registers
132i a2, a1, UEXCSAVE+ 0
132i a3, a1, UEXCSAVE+4
Wsr a2, SAR
Wsr a3, LCOUNT
132i a2, a1, UEXCSAVE+8
132i a3, a1, UEXCSAVE+12
Wsr a2, LBEG
Wsr a3, LEND
;if(config_get_value(″IsaUseMAC16″)){
132i a2, a1, UEXCSAVE+16
132i a3, a1, UEXCSAVE+20
Wsr a2, ACCLO
Wsr a3, ACCHI
132i a2, a1, UEXCSAVE+24
132i a3, a1, UEXCSAVE+28
Wsr a2, MR_0
Wsr a3, MR_1
132i a2, a1, UEXCSAVE+32
132i a3, a1, UEXCSAVE+36
Wsr a2, MR_2
Wsr a3, MR_3
;}
Here, SAR, LCOUNT, LBEG, LEND are cores XtensaTMThe processor status register part of ISA,
And ACCLO, ACCHI, MR_0, MR_1, MR_2 and MR_3 are MAC16XtensaTMA part for ISA option.(each depositor
All stored with pair wise and recovered, to avoid pipeline interlock.)
When designer defines new state with TIE, it also must carry out task switching as above state.Right
For designer, a kind of probability is exactly, write simply task switching code (a therein part is had already given above) with
And add instruction RUR/S32I and L32I/WUR being similar to above-mentioned code.But, when software is automatically generated and at knot
When being correct on structure, configurable processor will be maximally effective.Therefore, the present invention includes a kind of device, in order to automatically
Increase task switching code.Following each tpp row is added in above-mentioned store tasks:
;My $ off=0;
;my $i;
;For ($ i=0;$i<$#user_registers;$ i+=2)
Rur a2, ` $ user_registers [$ i+0] `
Rur a3, ` $ user_registers [$ i+1] `
S32i a2, UEXCUREG+ ` $ off+0`
S32i a3, UEXCUREG+ ` $ off+4`
;$ off+=8;
;}
;if(@user_registers & 1){
; # odd number of user registers
Rur a2, ` $ user_registers [$ #user_registers] `
S32i a2, UEXCUREG+` $ off+0`
;$ off+=4;
;}
And following each row is added in above-mentioned recovery tasks:
;My $ off=0;
;my $i;
;For ($ i=0;$i<$#user_registers;$ i+=2)
132i a2, UEXCUREG+ ` $ off+0`
132i a3, UEXCUREG+ ` $ off+4`
Wur a2, ` $ user_registers [$ i+0] `
Wur a3, ` $ user_registers [$ i+1] `
;$ off+=8;
;}
;if(@user_registers & 1){
; # odd number of user registers
132i a2, UEXCUREG+` $ off+0`
Wur a2, ` $ user_registers [$ #user_registers] `
;$ off+=4;
;}
Finally, task status region in memory should have the additional sky distributing to user register storage
Between, and this space is defined as assembly program constant from the side-play amount counted of base address of task storage pointer
UEXCUREG.This memory area is defined #define UEXCREGSIZE (16*4) with following code in advance
#define UEXCPARMSIZE (4*4)
;if (& config_get_value(″IsaUseMAC16″)){
#define UEXCSAVESIZE (10*4)
;}else{
#define UEXCSAVESIZE (4*4)
;}
#define UEXCMISCSIZE (2*4)
#define UEXCpARM 0
#define UEXCREG(UEXCPARM+UEXCPARMSIZE)
#define UEXCSAVE(UEXCREG+UEXCREGSIZE)
#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)
#define UEXCWIN(UEXCMISC+0)
#define UEXCFRAME
(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE)
which is changed to
#define UEXCREGSIZE (16*4)
#define UEXCPARMSIZE (4*4)
;if(& config_get_value(″IsaUseMAC16″)){
#define UEXCSAVESIZE (10*4)
;}else{
#define UEXCSAVESIZE (4*4)
;}
#define UEXCMISCSIZE (2*4)
#define UEXCUREGSIZE `@user_registers*4`
#define UEXCPARM 0
#define UEXCREG(UEXCPARM+UEXCPARMSIZE)
#define UEXCSAVE(UEXCREG+UEXCREGSIZE)
#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)
#define UEXCUREG(UEXCMISC+UEXCMISCSIZE)
#define UEXCWIN(UEXCUREG+0)
#define UEXCFRAME\
(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE+UEXCUREGSIZE
)
This code depends on one tpp variable@user_register of existence, and it has a user register number
List, this is simply the list that a the 1st independent variable from each user_register statement generates.
In the microprocessor embodiment that some is more complicated, a kind of shape can be calculated in different pipeline states
State, processing this step needs process as described herein is made some extensions (although simple extension).First, language is described
Speech needs extension, enables a semantic chunk with a pipeline stages links together.Can use in the middle of several method
One completes this step.In one embodiment, relevant pipeline stages can be specified significantly with each semantic chunk.?
In another embodiment, it can be a scope of each semantic chunk appointment pipeline stages.In yet another embodiment, according to
Required computation delay, can be that a given semantic chunk automatically derives pipeline stages.
The 2nd task that status of support produces in different pipeline stages processes various interruption exactly, various except feelings
Condition and various hang-up.This is usually directed under the control of Pipeline control signal, increases suitable bypass and forward direction logic.One
In individual embodiment, a standard drawing can be produced, in order to indicate this state when to produce and when it is used therebetween
Relation.Based on applied analysis, it is possible to achieve suitable forward direction logic, to process common situation, and interlocking can be produced
Logic, for the various situations not processed by forward direction logic, hangs up streamline.
The algorithm that this basic processing unit is used is depended on for revising the method for the instruction outlet logic of basic processing unit.
But, in general, for great majority instruct, it is no matter single outlet or superscale, regardless of being for one-cycle instruction
Or multi-cycle instructions, instruction outlet logic all only relies upon and is test for instruction, is used for producing:
1. indicate this instruction that whether various states are used as the various signals in a source for each processor state element;
2. indicate this instruction that whether various states are used as the various letters of a target for each processor state element
Number;
3. indicate whether this instruction uses the various signals of each functional unit for each functional unit;
These signals are used to execution and mail to streamline and the export inspection that intersects, and are used to depending on streamline
Outlet logic in update the state of streamline.TIE contains all required information, in order to increase various letters for every new instruction
Number and their equation.
First, TIE state description causes generating one group of new signal for instruction outlet logic.The 3rd illustrated at iclass
Or the 4th in or inout operand listed in independent variable or state be the 1st group for appointed processor state element
Instructions listed in 2nd independent variable of equation increases instruction decoding signal.
Secondly, listed in the 3rd or the 4th independent variable that iclass illustrates in or inout operand or state are pin
Instructions listed by 2nd independent variable of the 1st prescription formula of appointed processor state element is increased instruction decoding
Signal.
3rd, the logic generated from each TIE semantic chunk represents a new functional unit, thus generates one group
New cell signal, and, each decoding signal of the every TIE instruction for specifying for this semantic chunk passes through logical "or" computing
It is grouped together, to form the 3rd prescription formula.
When an instruction is issued, it should update the state of streamline for following sending determines.Further, be used for repairing
The method of the command issuing logic changing basic processing unit depends on the algorithm that this basic processing unit is used.But, some is general
Observation be possible.Pipeline state should be to sending the logic reversal following state of offer:
4. when this result can be used for bypass, for the various signals of each issued instruction instruction target;
5. indicate this functional unit to be the various signals that another instruction is got ready for each functional unit.
The embodiments described herein is a single outlet processor, and wherein, the instructions of designer's definition is limited in
Within one monocycle of logical calculated.In this case, the problems referred to above are appreciably simplified.Need not functional unit enter
Row checks or intersection export inspection, and does not also have an one-cycle instruction can make a processor state element for next
Bar instruction performs the preparation that pipeline is ready.Therefore, exporter's formula becomes just
Issue=(~srcluse | srclpipeready) & (~src2use | src2pipeready)
& (~srcNuse | srcNpipeready);
And wherein src [i] pipeline ready signal is not affected by each extra-instruction, and src [i] use be according to
The 1st equation group illustrating described on and revising.In this embodiment, it is not necessary to the 4th and the 5th group of signal.To more than one
For the flexible embodiment of outlet and multicycle, its TIE will be expanded with a kind of latency explanation for each instruction and describe,
Be given and set up the number calculating the cycle needed for streamline.
By the instruction decoding signal of each instruction is carried out logical "or" computing, they are concentrated in together, thus
Producing the 4th group of signal in each semantic chunk pipeline stages, according to explanation, the execution of instruction completes in the stage.
The logic produced by acquiescence all will be fully sent into streamline, and each function therefore produced by TIE
Unit, after accepting an instruction, is the most all monocyclic.In this case, for the 5th group of letter of each semantic chunk of TIE
Number generally it is established.When needing to reuse the logic in each semantic chunk on multiple cycles, one further
Bright general was specified within how many cycles, and these instructions are by this functional unit of use.In this case, by each instruction
Instruction decoding signal carry out logical "or" computing, they are concentrated in together, thus in each semantic chunk pipeline stages
Producing the 5th group of signal, the execution of each instruction completes in this grade of specified cycle count.
Alternatively, in a different embodiment, it can allow designer specify knot as the extension to TIE
Really ready signal and functional unit ready signal.
The example of code carrying out according to the present embodiment processing is shown in each annex.For simplicity, this will not be made in detail
Explanation;But, after refering to above-mentioned reference manual, this all will have been understood by professional person.Annex G is to realize a use
The example of the instruction of TIE language;Annex H represents that TIE compiler is by for using what the compiler of such code produces.
Similarly, annex I represents what TIE compiler will produce for simulated program;Annex J represents that TIE compiler will be for one
Section user program in extend TIE instruction grand generation what;Annex K represents what TIE compiler will produce, in order to emulate
Every TIE instruction in local mode;Annex L represents what TIE compiler will produce, as to additional firmware
Verilog HDL describes;And annex M represents what TIE compiler will produce, as optimizing above-mentioned Verilog HDL
The Design Compiler manuscript described, in order to assess TIE instruction in terms of area and speed to CPU size and the shadow of performance
Ring.
As indicated above, in order to start processor configuration process, user is via above-mentioned GUI, by selecting
One basic processing unit starts.As a part for process, as it is shown in figure 1, SDK 30 is established and is carried
Supply user.SDK 30 contains 4 vitals relating to another aspect of the present invention, refers to Fig. 6: compiling journey
Sequence 108, assembly program 110, instruction set simulation program 112, and debugging routine 130.
As known to professional person, compiler is writing with high-level programming language as such as C or C++
User application be converted to the assembler language that processor is special.High-level programming language as such as C or C++ is designed to
Allow the author of application program so that their form of describing subtly is to describe their application program.These are not the most each
Plant processor language to understand.The author of application program does not needs all of special of the processor for being used
Characteristic and worry about.Typically, identical C or C++ program can be revised without modification or a little just can be dissimilar in many
Processor in use.
C or C++ Program transformation is assembler language by assembly program.Assembler language is closer to machine language, and processor is straight
Ground connection supports machine language.Different types of processor has the assembler language of their own.Each assembly instruction is the most straight
Ground connection represents a machine instruction, but the two is the most identical.Assembly instruction is designed readable character string of behaving.Each
Instruction or operand are all presented a significant name or memonic symbol, allow people can read assembly instruction, are prone to simultaneously
Understand which kind of operation machine will carry out.Assembler language is converted to machine language by assembly program.Effectively will be every by assembly program
Article one, assembly instruction string encoding is one or more machine instruction, and the latter can directly and effectively be executed by processor.
Machine code can be run the most on a processor, but the processor of various physics is not to be the most all immediately
Can use.The processor setting up various physics is the process of time-intensive, expensive.When selecting possible processor configuration, user can not
The processor selecting all to set up a physics for each.The substitute is, provide a user with a kind of referred to as simulated program
Software program.Run on the simulated program on common computer and can emulate the user on user configured processor
The effect of application program.Simulated program can imitate the semanteme of simulated processor, and can tell user actual place
How soon reason device will have when running the application program of user.
Debugging routine is a kind of instrument, allow user with their software interactive formula search various problem.Debugging routine is permitted
Their program is interactively run at family allowable.The execution that user can shut down procedure at any time, watches its C language attentively simultaneously
Source code, obtained assembly code or machine code.User can also watch or revise on a breakpoint she (any or
All) each variable or hardware register numerical value.Then user can continue executing with one statement of execution the most every time, the most often
One machine instruction of secondary execution, perhaps forwards the new breakpoint that user selects to.
All 4 parts 108,110,112 and 130 are required for knowing user-defined instruction 750 (see Fig. 3), and emulate
Program 112 and debugging routine 130 must also know user-defined state 752 by way of parenthesis.System allows user via being added
The intrinsic call of C and C++ application program to user accesses user-defined instruction 750.Compiler 108 should for
The instruction 750 of family definition, is converted to assembly language directive 738 by intrinsic call.Assembly program 110 should take out new compilation language
No matter speech instruction 738, be directly to be write by user or changed by compiler 108, and encode that as corresponding to user
Each machine instruction 740 of each instruction 750 of definition.User-defined each machine instruction 740 should be solved by simulated program 112
Code.It should simulate the semanteme of instructions, and it should simulate the performance of the instructions on configured processor.
Simulated program 112 also should the numerical value that contained of analog subscriber definition status and performance.Debugging routine 130 should allow user to go
Display assembly language directive 738, defines instruction 750 including user.It should allow user to watch and revise user's definition
The numerical value of state.
In this aspect of the invention, user enables a kind of instrument, i.e. TIE compiler 702, processes the most possible
User-defined every improvement 736.TIE compiler 702 is different from compiler 708, and user application is changed by the latter
For assembler language 738.TIE compiler 702 sets up some parts, and it makes the basic software system 30 (compiling having built up
Program 708, assembly program 710, simulated program 712 and debugging routine 730) go to use new, user-defined every improvement
736.Each element of software system 30 uses the most different set of each parts.
Figure 24 is a figure, illustrates how the TIE specified portions of these software tools produces.TIE compiler 702
Defining extension file 736 from user is some Program Generating C language codes, and each of which section all produces a file, one
Or various software developing instrument can access this file, in order to obtain the information defining instruction and state about user.Example
As, program tie2gcc 800 produces a C language header file 842 being referred to as xtensa-tie.h and (will make specifically below
Bright), it contains the intrinsic function definition for new instruction.Program tie2isa 810 produces a dynamic link libraries (DLL) 844/
848, it defines the information of instruction format and (encoding D LL 844 is described more detail below and decodes DLL's 848 containing being related to user
Combination).Program tie2iss 840 produces for performance simulation and the C language code 870 of instruction semantic, as discussed below
As, it is used for producing the simulated program DLL849 that simulated program is used, below by right by a main frame compiler 846
This makees narration in detail.Program tie2ver 850 defines instruction with suitable hardware description language for user and produces necessary description
850.Finally, program tie2xtos 860 preserves and recovers code 810, switches for scene??Preserve and recover user and define shape
State.In the application program of above-mentioned Wang et al., the additional information of realization about user's definition status can be found.
Compiler 708
In the present embodiment, the intrinsic call in user application is converted to assembly language directive by compiler 708
738, for user-defined improvement 736.Compiler 708 realizes this mechanism and in-line assembly mechanism at grand top,
Such mechanism in compiler as such as GNU compiler it can be seen that.About the more information of these mechanism,
Can be found in such as, " GNU and C++ compiler is user guided ", EGCS version 1.0.3.
Considering that a user wishes to generate a new instruction foo, it runs on two depositors, and a result is returned
3rd depositor.Instruction description is put into user and is defined among a specific catalogue of command file 750 by user, and enables TIE
Compiler 702.TIE compiler 702 generates the file 742 with standard name as such as xtensa-tie.h.Should
File contains the following definition of foo.
#define foo (ars, art)
({int arr;Asm volatile (" foo % 0, %1, %2 ": "=a " (arr):
" a " (ars), " a " (a rt));})
When user enables compiler 708 in her application program, she passes through command-line option or environmental variable,
Tell that compiler 708 has user and defines the directory name improving 736.This catalogue also comprises xtensa-tie.h file 742.Compile
File xtensa-tie.h is automatically included in C language or the C Plus Plus application program that user is compiling by translator program 708, just
As the definition that user oneself has been written that foo.User includes intrinsic call in instruction in the application program of oneself
foo.Due to the definition included in, so compiler 708 regards those intrinsic calls as calling the definition included in.
The grand mechanism of standard provided according to compiler 708, compiler 708 processes when calling of grand foo, just looks like that user directly compiles
Assembly language directive 738 rather than macro-call are write.It is to say, according to the in-line assembly mechanism of standard, compiler 708
To call and be converted to single assembly instruction foo.Such as, perhaps user has one to comprise the letter calling internal foo
Number.
Int fred (int a, int b)
{
Return foo (a, b);
}
Compiler utilizes user-defined instruction foo, and function is converted to following assembly language subprogram.
Fred:
.frame sp, 32
Entry sp, 32
#APP
Foo a2, a2, a3
#NO_APP
retw.n
When user creates one group of new user-defined improvement 736, it is not required to write new compiler.TIE compiles
Translator program 702 simply creates file xtensa_tie.h742, and this document is automatically included in user by the compiler pre-build
Application program.
In this embodiment, assembly program 710 uses code database 744 to encode assembly instruction 750.Enter this storehouse
744 include such as minor function:
The operation code that operation code mnemonics character string is converted to inside represents;
For the opcode field in a machine instruction 740, for often organizing the bitmap that operation code provides to be generated;With
And
Operand value for the operand of each instruction encodes, and by the bitmap of encoded operand
It is inserted in the operand field of machine instruction 740.
For example, it is contemplated that the example that we are above-mentioned calls the user function of internal foo.Assembly program may accept to refer to
Make " foo a2, a2 a3 ", be then converted into the machine instruction represented by hexadecimal number 0 × 62230, wherein, high-order 6
Representing the operation code of foo together with low level 0,2,2 and 3 represent 3 depositors a2, a2 and a3 respectively.
It is the combination based on form and intrinsic function that the inside of these functions realizes.Form can be by TIE compiler 702
Easily produce, but their ability to express is the most limited.When needs greater flexibility, such as when expressing operand volume
During code function, TIE compiler 702 just can generate random C language code, and be included among storehouse 744.
Again imagine the example of " foo a2, a2, a3 ".Each register field is simply compiled with the number of depositor
Code.TIE compiler 702 creates lower array function, and this function checks legal register value, if numerical value is legal, just
The number of return register.
xtensa_encode_result encode_r(valP)
u_int32_t*valp;
{
U_int32_t val=*valp;
if((val>>4)!=0)
return xtensa_encode_result_too_high;
* valp=val;
return xtensa_encode_result_ok;
)
If whole codings is the simplest, avoid the need for any encryption function, as long as a form is sufficient to.So
And, user can select more complicated coding.Following coding TIE language is write, by the value of operand divided by 1024
Each operand is encoded by business.Such coding is that the numerical value that is often coded of of the multiple of 1024 is for those requirements
Highly useful.
Operand t×10t{t<<10}{t×10>>10}
Operand Coding and description is converted to following C language function by TIE compiler.
xtensa_encode_result encode_tx10(valp)
U_int32_t*valp;
{
U_int32_t t, tx10;
Tx10=*valp;
T=(tx10 > > 10) & 0 × f;
Tx10=decode_t × 10 (t);
if(t×10!=* valp)
return xtensa_encode_result_not_ok;
} else{
* valp=t;
}
return xtensa_encode_result_ok;
}
Because for operand, possible span is very big, so can not carry out such with a form
Coding.Form will have to the biggest.
In an embodiment of code database 744, the memonic symbol character string maps of operation code is internal by a form
Operation code represents.In order to improve efficiency, this form may be sorted, or it is probably a hash table, or allows to carry out
Effectively other data structures of retrieval.Another part of form closes often organizing operation code with the model foundation of a machine instruction maps
System, is initialized as the suitable bitmap of this operation code by opcode field.There is identical operand field and operand coding
Operation code be grouped together.For each operand in these groups, storehouse comprises a function operand value is encoded
Becoming bitmap, these bitmaps are inserted among the suitable field of machine instruction by another function.A independent inside table will be every
Individual instruction operands is mapped as these functions.Imagining an example, the number of result register is encoded as the bit of instruction
12…15.TIE compiler 702, will the bit 12 of instruction by lower for generation array function ... 15 values being set to result register (number
Code):
Void set_r_field (insn, val)
xtensa_insnbuf insn;
u_int32_t val;
{
Insn [0]=(insn [0] & 0 × ffff0fff) | (val < < 12) & 0 × f000);
In order to just user-defined instruction, code database 744 can be changed in the case of need not again writing assembly program 710
It is implemented as a dynamic link libraries (DLL).DLLs is the standard mode allowing program dynamically extend its function.Process DLLs's
Details is different in different host operating systems, but basic conception is the same.DLL is as the expansion of program code
Fill, be dynamically loaded among active program.Operation time linker solves between DLL and mastery routine and DLL
And the symbolic reference between other DLLs loaded.For code database or DLL744, the sub-fraction of code is static
Be connected to assembly program 710.This code is responsible for loading DLL, by the information in DLL and the instruction system 746 pre-build
Existing coding information (may load from an independent DLL) is combined, and makes this information can pass through as above institute
The each interface function stated conducts interviews.
When user creates new improvement 736, she enables TIE compiler on the basis of the description improving 736 descriptions
702.The C language code definition that TIE compiler 702 generates realizes inside table and the function of encoding D LL.TIE compiler
Then 702 enable the native compiler 746 of host computer system, and (code of its compiling runs on main frame rather than is being configured
Processor on run), in order to create encoding D LL144 for user-defined instruction 750.User, in its application program, uses
Mark or environmental variable enable the assembly program 710 write in advance, and these marks or environmental variable point to and define containing user
The catalogue of every improvement 736.The assembly program 710 write in advance dynamically opens DLL744 in catalogue.For each
For bar assembly instruction, the assembly program 710 write in advance uses encoding D LL744 to carry out search operation code memonic symbol character string, seeks
Look for opcode field bitmap in machine instruction, and each instruction operands is encoded.
Such as, when assembly program 710 finds TIE instruction " foo a2, a2, a3 ", assembly program 710 is by a form
Finding, " foo " operation code is converted to be in the numeral 6 of bit position 16 to 23.From table, it finds volume for each depositor
Code function.A2 is encoded to numeral 2 by function, and another a2 is encoded to numeral 2, and a3 is encoded to numeral 3.From table, it is looked for
To suitable, function is set.Result value 2 is put into the bit location 12 of this instruction by Set_r_field ... 15.Similar arranges letter
Suitable place is also put in other 2 and 3 by number.
int decode_insn(const xtensa_insnbuf insn)
{
If ((insn [0] & 0 × ff000f)==0 × 60000) return xtensa_fool_op;
If ((insn [0] & 0 × ff000f)==0 × 160000) return
xtensa_foo2_op;
If ((insn [0] & 0 × ff000f)==0 × 260000) return
xtensa_foo3_op;
return XTENSA_UNDEFINED;
}
When user-defined instruction number is a lot, operation code is carried out with all possible user-defined instruction 750
Relatively it is probably time-consuming, so the switch statement group that TIE compiler can use separately level replaces.
switch(get_op0_field(insn)){
switch(get_op1_field(insn)){
switch(get_op2_field(insn)){
Default:return XTENSA_UNDEFINED;
}
Default:return XTENSA_UNDEFINED;
}
Default:return XTENSA_UNDEFINED;
}
In addition to being decoded instruction operation code, decoding DLL748 also includes for being decoded instruction operands
Function.The mode completed is identical with in encoding D LL744 encoding operand.First, the function of DLL748 is decoded
Selection operation digital section from machine instruction.Continuing above-mentioned example, TIE compiler 702 generates following function, from one
12 to 15 bits of instruction choose a numerical value:
u_int32_t get_r_field (insn)
xtensa_insnbuf insn;
{
return((insn[0] & 0×f000)>>12);
}
TIE includes coding and the description of decoding to the description of an operand, so in view of encoding D LL744 uses operation
Number encoder describes, and decoding DLL748 uses operand decoding to describe.Such as, the description of TIE operand is:
Operand t×10t{t<<10}{t×10>>10}
Generate following operand decoding functions:
u_int32_t decode_t×10(val)
u_int32_t val;
{
U_int32_t t, t × 10;
T=val;
T × 10=t < < 10;
return t×10;
}
When user enables simulated program 712, she tells that simulated program 712 is containing user-defined every improvement 736
The catalogue of decoding DLL748.Simulated program 712 opens suitable DLL.Whenever an instruction is decoded by simulated program 712
Time, if this instruction is successfully decoded not over the decoding functions of the instruction system write in advance, then simulated program
712 just enable the decoding functions in DLL748.
After providing a decoded instruction 750, simulated program 712 must to instruction 750 semanteme explain and
Simulation.This completes with function fashion.Every instruction 750 has the functions of correspondence, allows the simulated program 712 language to this instruction 750
Justice is simulated.Whole states of the processor being modeled are kept following the tracks of by simulated program 712 in inside.Simulated program 712 has
Fixing interface is for updating or the state of query processor.As it has been described above, user-defined every improvement 736 is hard by TIE
Part describes what language was write as, and this language is a subset of Verilog.Hardware description language is converted to by TIE compiler 702
C language function, simulated program 712 utilizes above-mentioned C language function to simulate new improvement 736.Hardware description language operator
Directly be converted to the C language operator of correspondence.The operation of read states or write state is converted into the interface of simulated program, is used for
Processor state is updated or inquires about.
As an example in the present embodiment, it is assumed that have a user to create an instruction, deposit in order to increase by two
Device.This example is selected to be intended merely to simplicity.The semanteme increased can be done description below with hardware description language by user:
Semantic add{add}{assign arr=ars+art;}
Output register is represented by internal name arr, and it has been assigned the sum of two input registers, and the two is defeated
The internal name entering depositor is ars and art respectively.TIE compiler 702 takes this description, and generates simulated program 712
The semantic function used:
Void add_func (u32 _ OPND0_, u32_OPND1_, u32_OPND2_, u32
_OPND3_)
{
Set_ar (_ OPND0_, ar (_ OPND1_)+ar (_ OPND2_));
pc_incr(3);
}
Hardware computation symbol "+" be directly converted to correspondence C language operator "+".The reading of hardware register ars and art
Take calling of function " ar " that be converted into simulated program 712.The write of hardware register arr is converted into simulated program
Function " the set ar " of 712 calls.Because every the content of program counter pc is the most impliedly added this instruction by instruction
Size, so TIE compiler 702 also generates, simulated program 712 function is called, makes simulated pc increase by 3, i.e. add
Method instruction size.
When TIE compiler 702 is activated, create a semanteme as above for each user-defined instruction
Function, the most also creates a form, and whole operation code names is mapped among relevant semantic function by it.Use standard
Compiler 746 form function is compiled in simulated program DLL749.When user enables simulated program 712, she
Tell the simulated program 712 catalogue containing user-defined every improvement 736.Simulated program 712 opens suitable DLL.Whenever
When enabling simulated program 712, instruction all of in program is decoded by it, and creates a form, wherein contains every finger
Make the mapping relations to relevant each semantic function.When setting up mapping relations, simulated program 712 opens DLL, retrieves suitable language
Justice function.When emulating the semanteme of user-defined instruction 736, simulated program 712 directly enables the letter in DLL
Number.
In order to how long the time needed for telling user to run application program on simulated hardware has, simulated program 712
Need the implementation effect of emulator command 750.Simulated program 712 employs pipeline model for this.Every instruction is in several cycles
Upper execution.In each cycle, instruction uses the different resource of machine.Simulated program 712 begins attempt to be performed in parallel to be owned
Instruction.If a plurality of instruction uses identical resource in the same cycle, instruction the most below is suspended, to wait that resource is risen
Out.If the state write in the cycle below of the instruction above is read in instruction below, instruction the most below is just hung
Rise, to wait that this numerical value is written into.Simulated program 712 uses function interface to simulate the effect of each instruction.For each
The instruction of type all creates a function.These functions include calling simulated program interface, this interface analog processor
Performance.
For example it is assumed that there are simple 3 register instruction foo.TIE compiler may create following emulation journey
Order function:
Void foo_sched (u32 op0, u32 op1, u32 op2, u32 op3)
pipe_use_i fetch(3);
Pipe_use (REGF32_AR, op1,1);
Pipe_use (REGF32_AR, op2,1);
Pipe_def (REGF32_AR, op0,2);
pipe_def_ifetch(-1);
}
Calling pipe_use_ifetch and tell that simulated program 712 claims, needs are taken 3 bytes by this instruction.Right
Twice of pipe_use calls tells that simulated program 712 claims, and two input registers will read in the cycle 1.To pipe_def's
Calling and tell that simulated program 712 claims, output register will be written in the cycle 2.Pipe_def_ifetch is called and tells
Simulated program 712 claims, and this instruction Bu Shiyige branch, therefore next instruction can be removed at next cycle.
The pointer of these functions is placed on in a form together with each semantic function.Function is inherently as semantic function one
Sample is compiled among DLL749.When enabling simulated program 712, it creates instruction and runs the mapping relations of function.When
When setting up mapping relations, simulated program 712 opens DLL749, retrieves suitable performance function.When to user-defined instruction 736
Implementation status when emulating, simulated program 712 directly enables the function in DLL749.
Debugging routine interacts with user-defined every improvement 750 in two ways.First, user can show pin
Assembly language directive 738 to user-defined instructions 736.In order to accomplish this point, debugging routine 730 must be by machine
Sound instruction 740 is decoded as assembly language directive 738.The principle used when instruction is decoded by this with simulated program 712 is
Identical, and the DLL that uses of debugging routine 730 preferably DLL with using when simulated program 712 decodes is identical.Except
Outside being decoded instructions, decoded instruction must also be converted to character string by debugging routine.To this end, decoding DLL748
Including a function, the operation code of each inside is represented and is mapped as corresponding memonic symbol character string by it.This can be by portion
Simple table realizes.
User can use mark or environmental variable to enable the debugging routine write in advance, these mark or environmental variables
Point to the catalogue containing user-defined every improvement 750.The debugging routine pre-build dynamically opens suitable DLL748.
Debugging routine 730 also interacts with user-defined state 752.Debugging routine 730 allows for reading and repairing
Change state 752.To this end, debugging routine 730 communicates with simulated program 712.It has many to simulated program 712 state of inquiring
Greatly, and what the name of state variable is.Whenever debugging routine 730 be required show User Status numerical value time, it just as
Inquire that predefined state is the same and inquire this numerical value to simulated program 712.Similarly, in order to revise the state of user, debugging
Program 730 tells that state is arranged to a set-point by simulated program 712.
Thus, it will be seen that the embodiment user-defined instruction set and state supported according to the present invention, can
To use the module of definition user function to complete, these modules are embedded among kernel software developing instrument.Therefore, exploitation
During one system, specific user-defined every embedding module of improving can use as the one of internal system group, in order to
In tissue and operation.
Additionally, kernel software developing instrument may be exclusively used in specific kernel instruction set and processor state, and use
The set of the single embedding module of every improvement of family definition, may be with resident many kernel software developing instruments in systems
Combine and be evaluated.
Adnexa A
#Xtensa configuration database explanation
# Id:Definition, v1.651999/02/04 15:30:45adixit Exp.
# all rights reserved Tensilica company in 1998
These encoded instructions of #, statement, and computer program are Tensilica company
The Proprietary Information of the secrecy of #, in the case of not obtaining the written consent that Tensilica company is prior, must not
Open to third party, or carry out all or part of copy in any form
#
# this be configuration parameter define file.
The all configurations being supported of # all must be illustrated in this document
The instrument of all analysis configuration of # all should check the correctness of this file
# should keep minimum to the change of this file, and processes carefully
#
# UNC
The name of # most parameters is all with a class name beginning in list:
# Addr address and conversion parameter
# Build ?
# Cad target CAD environment
# DV every design verification parameter
In # Data the following one:
# DataCache data caching parameter
# DataRAM data RAM parameter
# DataROM data ROM parameter
# Debug debugging routine option parameter
# Impl embodiment the objectives
In # Inst the following one:
# InstCache instruction cache parameter
# InstRAM instructs RAM parameter
# InstROM instructs ROM parameter
# Interrupt interrupt parameters
# Isa instruction set architecture parameter
# Iss instruction set simulation program parameter
# PIF processor interface parameter
# Sys systematic parameter (such as memorizer mapping)
# TIE special instruction parameter
# Test production test parameter
# Timer cycle count/compare option
# Vector reset/exclusions/interrupt vecter address
# many parameters end up with a suffix, be given they measured time unit used:
# Bits
# Bytes (i.e. 8)
# Count is used as general " number " suffix
# Entries is similar to Count
The absolute path name of # Filename file
# Interrupt interrupts mark (0 ... 31)
# Level interrupt level (1 ... 15)
# Max maximum
# Paddr physical address
Enumerate for one of the possible numerical value of # Type
# Vaddr virtual address
The form of this document:
Row 1: configuration parameter name
Row 2: the default value of parameter
Row 3: the perl of the effectiveness of verification value represents
# Xtensa Configuration Database Specification
# SId:Definition, v1.65 1999/02/04 15:30:45adixit Exp $
□
□
# Copyright 1998 Tensilica Inc。
# These coded instructions, statements, andcomputer programs are
# Confidential Proprietary Information of Tensilica Inc.and may not
be
# disclosed to third parties or copied in any form, in whole or in
Part,
# without the prior written consent of Tensilica Inc。
#
# This is the configuration parameter definition file。
# -All supported configurations must be declared in thisfile
# -All tools parsing configurations must check against this file for
validity
# -Changes to this file must be kept minimum and dealt with care
#
# Naming Conventions
# Most parameter names begin with a category name from the following
# list:
# Addr Addressing and translation parameters
# Build ?
# Cad Target CAD environment
# DV Design Verification parameters
# Data One of the following:
# DataCache Data Cache parameters
# DataRAM Data RAM parameters
# DataROM Data ROM parameters
# Debug Debug option parameters
# Impl Implementation goals
# Inst One of the following:
# InstCache Instruction Cache parameters
# InstRAM Instruction RAM parameters
# InstROM Instruction ROM parameters
# Interrupt Interrupt parameters
# Isa Instruction Set Architecture parameters
# Iss Instruction Set Simulator parameters
# PIF Processor Interface parameters
# Sys System parameters (e.g.memory map)
# TIE Application-specific instruction parameters
# Test Manufacturing Test parameters
# Timer Cycle count/compare option parameters
# Vector Reset/Exception/Interrupt vector addresses
# Many parameters end in a suffix giving the units in which they
# are measured:
# Bits
# Bytes (i.e. 8 bits)
# Count used as a generic″number of″suffix
# Entries similar to Count
# Filename absoluate pathname of file
# Interrupt interrupt id (0..31)
######################################################################
#
ISA option
#
######################################################################
######
IsaMemoryOrder LittleEndian LittleEndian|BigEndian
IsaARRegisterCount 32 32|64
######################################################################
######
# address and conversion
######################################################################
######
AddrPhysicalAddressBits 32 1[6-9]|2[0-9]|3[0-2]
AddrVirtualAddressBits 32 1[6-9]|2[0-9]|3[0-2]
######################################################################
######
# data caching/RAM/ROM
######################################################################
######
DataCacheBytes 1k 0k|1k|2k|4k|8k|16k
DataRAMBytes 0k 0k|1k|2k|4k|8k|16k
DataROMBytes 0k 0k|1k|2k|4k|8k|16k
DataWriteBufferEntries 4 4|8|16|32
DataCacheAccessBits 32 32|64|128
######################################################################
#
Instruction cache/RAM/ROM
#
######################################################################
######
InstCacheBytes 1k 0k|1k|2k|4k|8k|16k
InstRAMBytes 0k 0k|1k|2k|4k|8k|16k
InstROMBytes 0k 0k|1k|2k|4k|8k|16k
InstCacheAccessBits 32 32|64|128
######################################################################
##
Processor interface
#
######################################################################
######
PIFReadDataBits 32 32|64|128
PIFWriteDataBits 32 32|64|128
######################################################################
##
System
#
######################################################################
######
SysROMBytes 128k [0-9]+(k|m)
SysRAMBytes 1m [0-9]+(k|m)
SysStackBytes 16k [0-9]+(k|m)
SysXTOSBytes 0x00000c00 0x[0-9a-fA-F]+
######################################################################
#″″″″
Vector address
#
#####################################################################
######
######################################################################
######
Interrupt option
#
######################################################################
######
InterruptCount 1 [1-9]|1[0-9]|2[0-9]|3[0-2]
InterruptLevelMax 1 [1-3]
Interrupt0Type External External|Internal|Software
InterruptlType External External|Internal|Software
Interrupt2Type External External|Internal|Software
Interrupt3Type External External|Internal|Software
Interrupt4Type Externa External|Internal|Software
Interrupt5Type External External|Internal|Software
Interrupt6Type External External|Internal|Software
Interrupt7Type External External|Internal|Software
Interrupt8Type External External|Internal|Software
Interrupt9Type External External|Internal|Software
Interrupt10Type External External|Internal|Software
Interrupt1lType External External|Internal|Software
Interrupt12Type External External|Internal|Software
Interrupt13Type External External|Internal|Software
Interrupt14Type External External|Internal|Software
Interrupt15Type External External|Internal|Software
Interrupt16Type External External|Internal|Software
Interrupt17Type External External|Internal|Software
Interrupt18Type External External|Internal|Software
Interrupt19Type External External|Internal|Software
Interrupt20Type External External|Internal|Software
Interrupt21Type External External|Internal|Software
Interrupt22Type External External|Internal|Software
Interrupt23Type External External|Internal|Software
Interrupt24Type External External|Internal|Software
Interrupt25Type External External|Internal|Software
Interrupt26Type External External|Internal|Software
Interrupt27Type External External|Internal|Software
Interrupt28Type External External|Internal|Software
Interrupt29Type External External|Internal|Software
Interrupt30Type External External|Internal|Software
Interrupt31Type External External|Internal|Software
Interrupt0Level 1 [1-3]
InterruptlLevel 1 [1-3]
Interrupt2Level 1 [1-3]
Interrupt3Level 1 [1-3]
Interrupt4Level 1 [1-3]
Interrupt5Level 1 [1-3]
Interrupt6Level 1 [1-3]
Interrupt7Level 1 [1-3]
Interrupt8Level 1 [1-3]
Interrupt9Level 1 [1-3]
Interrupt10Level 1 [1-3]
InterruptllLevel 1 [1-3]
Interrupt12Level 1 [1-3]
Interrupt13Level 1 [1-3]
Interrupt14Level 1 [1-3]
Interrupt15Level 1 [1-3]
Interrupt16Level 1 [1-3]
Interrupt17Level 1 [1-3]
Interrupt18Level 1 [1-3]
Interrupt19Level 1 [1-3]
Interrupt20Level 1 [1-3]
Interrupt21Level 1 [1-3]
Interrupt22Level 1 [1-3]
Interrupt23Level 1 [1-3]
Interrupt24Level 1 [1-3]
Interrupt25Level 1 [1-3]
Interrupt26Level 1 [1-3]
Interrupt27Level 1 [1-3]
Interrupt28Level 1 [1-3]
Interrupt29Level 1 [1-3]
Interrupt30Level 1 [1-3]
Interrupt31Level 1 [1-3]
######################################################################
Other processor component options processor Timer Options
#
#
######################################################################
######
TimerCount 0 [0-3]
Timer0Interrupt 0 [0-9]|1[0-9[12[0-9]|3
[0-1]
Timer1Interrupt 0 [0-9]|1[0-9]12[0-9]|3
[0-1]
Timer2Interrupt 0 [0-9]|1[0-9]12[0-9]|3
[0-1]
######################################################################
######
Debugging routine option
#
######################################################################
######
DebugDataVAddrTrapCount 0 [0-2]
DebugInstVAddrTrapCount 0 [0-2]
DebugInterruptLevel 2 [2-3]
######################################################################
######
Instruction set simulation program
#
#######################################################################
######
#####################################################################
######
Design verification
#
######################################################################
######
######################################################################
######
Test option
#
######################################################################
######
######################################################################
##
Processor embodiment configures
#
######################################################################
######
ImplTargetSpeed 250 [1-9] [0-9] *
ImplTargetSize 20000 [1-9] [0-9] *
ImplTargetPower 75 [1-9] [0-9] *
ImplSpeedPriority High High|Medium|Low
ImplPowerPriority Medium High|Medium|Low
ImplSizePriority Low High|Medium|low
ImplTargetTechnology 25m
18m|25m|35m|cx3551|cx3301|acb25typ|acb25wst|t25typical|
t25worst|
t35std|lss3g|ibm25typ|ibm25wc|vst_tsmc25tym
ImplOperatingCondition Typical Worst|Typical
######################################################################
######
CAD option
######################################################################
######
######################################################################
#
TIE command file.It must be absolute path name
#
######################################################################
######
TIE filename/.* |-
######################################################################
######
####################################################################
######
#
Following program segment is only used for inside.To any inner parameter is up sent, PLSCONFM
#
The all product component of # can support it.
######################################################################
######
######################################################################
######
#Constants for Athens implementation
DataCacheIndexType physical physical
DataCacheMissStart 32 32
DataCacheTagType physical physical
InstCacheIndexType physical physical
InstCacheMissStart 32 32
InstCacheTagType physical physical
######################################################################
######
# Build mode...for Web customers.They can run a limited number of
# production builds, but as many eval builds as they like.
#UserCID is used for fingerprinting
######################################################################
######
BuildMode Evaluation Evaluation|Production
BuildUserCID 999 [0-9]+
#####################################################################
######
######################################################################
######
#Values used by the GUI-basically persistent state
######################################################################
######
######################################################################
######
SysAddressLayout Xtos Xtos|Manual
Accessories B
#!/usr/xtensa/tools/bin/perl
# Tensilica PreProcessor
# SId:tpp, v 1.15 1998/12/17 19:36:03 earl Exp $
# Modified:Kaushik Sheth
# Copyright(C)1998 Tensilica.All rights reserved.
# The original code was taken from Iain McClatchie。
# perl preprocessor
# Copyright(C)1998 Iain McClatchie.All rightsreserved. No
warrantee implied。
# Author:Iain McClatchie
# You can redistribute and/or modify this software under the terms
ofthe
# GNU General Public License as published by the Free
SoftwareFoundation;
# either version 2, or (at your option) any later version.
use lib″@xtools@/lib″;
package tpp;
# Standard perl modules
use strict;
use Exporter();
use Getopt::Long;
# Module stuff
@tpp::ISA=qw (Exporter);
@tpp::EXPORT=qw (
include
error
);
@tpp::EXPORT_OK=qw (
include
gen
error
);
%tpp::EXPORT_TAGS=();
use vars qw(
$debug
$lines
@incdir
$config
$output
@global_file_stack
);
#Main program
{
S::myname=' tpp '; # for error messages
# parse command line
$ debug=0; # -debug command line option
$ lines=0; # -linescommand lineoption
@incdir=(); # -I command line options
$ config="; # -c command line option
$ output=undef; # -o command line option
My@eval=();
if(!GetOptions(
″debug!"=> $ debug,
″lines!"=> Slines,
" I=s@"=>@incdir,
" c=s "=> $ config,
" o=s "=> $ output,
" eval=s@"=>@eval)
‖@ARGV≤0)
# command line error
print STDERR<<″END″;
tpp[args]file
Applies a perl preprocessor to the indicated file, and any files
included therein;the output of the preprocessor is written to
stdout.Perl is embedded in the source text by one of two means.
Whole lines of perl can be embedded by preceding them with a
semicolon(you would typically do this for looping statments or
Subroutine calls) .Alternatively, perl expressions can be embedded
into the middle of other text by escaping them with backticks。
-debug Print perl code to STDERR, so you can figure out why your
embedded
perl statements are looping forever。
-lines Embed ' #line 43 " foo.w " ' directives in output, for
more
comprehensible error and warning messages from later
tools。
-I dir search for include files in directory dir
-o output_file Redirect the output to afile rather than astdout。
-c config_file Read the specified config file。
-e eval Eval eval before running program
NOTE:
the lines with only″;″and″;//″will go unaltered.
END
exit(1);
}
#Initialize
Push (@INC ,@incdir);
@global_file_stack=();
#Read configuration file
tppcode::init($config);
# Open the output file
if($output){
Open (STDOUT, " > $ output ")
‖ die (" $:: myname:$!, opening ' $ output ' n ");
}
# Process evals
foreach(@eval){
tppcode::execute(S_);
}
# Process the input files
foreach (@ARGV){
include($_);
}
# Done
exit(0);
}
sub include{
My ($ file)=@_;
My ($ buf, $ tempname ,@chunks, $ chunk, $ state, $ lasttype);
If ($ file=~m | ^/|)
if(!Open (INP, " < $ file "))
Error ($ file, " $!, opening $ file ");
}
}else{
my $path;
Foreach path (". ", incdir)
If (open (INP, " < $ path/ $ file "))
$ file=" $ PATH/ $ FILE ";
last;
}
}
Error ($ file, " Couldn ' t find $ file in@INC ")
If tell (INP)==-1;
}
$ lasttype=" ";
while(<INP>){
If (/ ^ s*;(.*) $ /)
My $ l=$ 1;
if($lasttype ne″perl″){
$ lasttype=" perl ";
}
If ((/ ^ s*;S*///) ‖ (and/^ s*;S* $ /))
$ buf.=" print STDOUT " $ _ ";\n″;
}else{
$ buf.=$ 1. " n ";
}
}else{
if($lines and $lasttype ne″text″){
$ buf.=" print STDOUT " #line. " file " n ";\n″;
$ lasttype=" text ";
}
chomp;
if(m/^$/){
$ buf.=" print STDOUT " n ";\n″;
next;
}
@chunks=split (" ` ");
$ state=0;
$ tempname=" 00 ";
foreach $chunk(@chunks){
If ($ state==0)
$ chunk=quotemeta ($ chunk);
$ state=1;
} else{
If ($ chunk=~m/^ W/) { #Perl expression
$ buf.=" $ temp $ tempname=$ chunk;\n″;
$ chunk=" $ { temp $ tempname } ";
$tempname++;
$ state=0;
} else{ # Backquoted something
$ chunk=" ` " .quotemeta ($ chunk);
$ state=1;
}
}
}
# check if the line ends with a backquote
if(m/\`$/){
$ state=1-$ state;
}
Error ($ file, " Unterminated embedded perl expression, line
$.″)
If ($ state==0);
$ buf.=" print STDOUT " " .join (" " ,@chunks).
″\\n\″;\n″;
}
}
close(INP);
print STDERR $buf if($debug);
Push (@global_file_stack, $ file);
Tppcode::execute ($ buf);
pop(@global_file_stack);
if($@){
chomp($@);
Error ($ file, $@);
}
}
sub gen{
print STDOUT(@_);
}
sub error{
My ($ file, $ err)=@_;
Print STDERR " $:: myname:Error ($ err) while preprocessing file
\″$file\″\n″;
my $fn;
foreach $fn(@global_file_stack){
print STDERR″included from\″$fn\″\n″;
}
exit(1);
}
# This package is used to execute the tpp code
package tppcode;
no strict;
use Xtensa::Config;
sub ppp_require{
Print STDERR (" tpp:Warning:ppp_require used instead of
tpp::include\n″);
tpp::include(@_);
}
sub init(
My ($ cfile)=@_;
config_set($cfile);
}
sub execute{
My ($ code)=@_;
eval($code);
}
#
# Local Variables:
# mode:perl
# perl-indent-level:4
# cperl-indent-level:4
# End:
Adnexa C
# Change XTENSA to point to your local installation
XTENSA=/usr/xtensa/awang/s8
#
# No need to change the rest
#
GCC=/usr/xtensa/stools/bin/gcc
XTCC=$ (XTENSA)/bin/xt-gcc
XTRUN=$ (XTENSA)/bin/xt-run
XTGO=$ (XTENSA)/Hardware/scripts/xtgo
MFILE=$ (XTENSA)/Hardware/diag/Makefile.common
All:run-base run-tie-cstub run-iss run-iss-old run-iss-new run-ver
#
# Rules to build various versions of me
#
Me-base:me.c me_base.c me_tie.c src.c sad.c
$ (GCC)-o me-base-g-O2-DNX=64-DNY=64 me.c
Me-tie-cstub:me.c me_base.c me_tie.c src.c sad.c
$ (GCC)-o me-tie-cstub-g-O2-DTIE-DNX=64-DNY=64me.c
Me-xt:me.c me_base.c me_tie.c src.c sad.c
$ (XTCC)-o me-xt-g-O2-DXTENSA-DNX=32-DNY=32me.c
Me-xt-old:me.c me_base.c me_tie.c src.c sad.c
$ (XTCC)-o me-xt-old-g-O3-DOLD-DXTENSA-DNX=32-DNY=32
me.c
Me-xt-new:me.c me_base.c me_tie.c src.c sad.c
$ (XTCC)-o me-xt-new-g-O3-DNEW-DXTENSA-DNX=32-DNY=32
me.c
Me-xt.s:me.c me_base.c me_tie.c src.c sad.c
$ (XTCC)-o me-xt.s-S-O3-DNOPRINTF-DXTENSA-DNX=16-DNY=
16
me.c
#
# Rules for various runs of me
#
Run-base:me-base
me-base;exit 0
Run-tie-cstub:me-tie-cstub
me-tie-cstub;exit 0
Run-iss:me-xt
$(XTRUN)me-xt
Run-iss-old:me-xt-old
$(XTRUN)--verbose me-xt-old
Run-iss-new:me-xt-new
$(XTRUN)--verbose me-xt-new
Run-ver:me-xt.s testdir
cp me-xt.s testdir/me-xt
$(XTGO)-vcs -testdir `pwd`/testdir -test me-xt>run-ver.out
2>&1
grep Status run-ver.out
Testdir:
mkdir-p testdir/me-xt
@echo ' all:me-xt.dat me-xt.bfd ' > testdir/me-xt/Makefile
@echo″include $(MFILE)″>>testdir/me-xt/Makefile
Clean:
Rm-rf me-**.out testdir results
APPENDIX I:TEST PROGRAM
#include<stdio.h>
#include<stdlib.h>
#include<limits.h>
#ifndef NX
#define NX 32/* image width*/
#endif
#ifndef NY
#define NY 32/* image height*/
#endif
#define BLOCKX 16/* block width*/
#define BLOCKY 16/* block height*/
#define SEARCHX 4/* search region
Width*/
#define SEARCHY 4/* search region
Height*/
unsigned char OldB[NX][NY];/ * old image*/
unsigned char NewB[NX][NY];/ * new image*/
unsigned short VectX[NX/BLOCKX][NY/BLOCKY];/ * X motion vector*/
unsigned short VectY[NX/BLOCKX][NY/BLOCKY];/ * Y motion vector*/
unsigned short VectB[NX/BLOCKX][NY/BLOCKY];/ * absolute
Difference*/
unsigned short.BaseX[NX/BLOCKX][NY/BLOCKY];/ * Base X motion
vector*/
unsigned short BaseY[NX/BLOCKX][NY/BLOCKY];/ * BaseY motion
Vector*/
unsigned short BaseB[NX/BLOCKX][NY/BLOCKY];/ * Base absolute
difference*/
#define ABS(x) (((x)<0)?(-(x)): (x))
#define MIN (x, y) (((x) < (y))?(x): (y))
#define MAX (x, y) (((x) > (y))?(x): (y))
#define ABSD (x, y) (((x) > (y))?((x)-(y)): ((y)-(x)))
^L
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
In order to 01dB and NewB array is initialized by test purpose
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
void init()
{
Int x, y, x1, y1;
For (x=0;x<NX;x++){
For (y=0;y<NY;y++)(
OldB [x] [y]=x^y;
}
}
For (x=0;x<NX;x++){
For (y=0;y<NY;y++){
X1=(x+3) %NX;
Y1=(y+4) %NY;
NewB [x] [y]=OldB [x1] [y1];
}
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Every result comparison full-colored data is checked
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
unsigned check()
{
Int bx, by;
For (by=0;by<NY/BLOCKY;by++){
For (bx=0;bx<NX/BLOCKX;bx++){
if(VectX[bx][by]!=BaseX [bx] [by]) return0;
if(VectY[bx][by]!=BaseY [bx] [by]) return0;
if(VectB[bx][by]!=BaseB [bx] [by]) return0;
}
}
return1;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
The various embodiments of locomotion evaluation
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
#include″me_base.c″
#inClude″me_tie.c″
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Main test program
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
int
Main (int argc, char*argv)
{
int passed;
#ifndef NOPRINTF
Printf (" Block=(%d, %d), Search=(%d, %d), size=(%d, %d) n ",
BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY);
#endif
init();
#ifdef OLD
motion_estimate base();
Passed=1;
#elif NEW
motion_estimate_tie();
Passed=1;
#else
motion_estimate_base();
motion_estimate_tie();
Passed=check ();
#endif
#ifndef NOPRINTF
printf(passed?" TIE version passed n ": " * * TIE version
failed\n″);
#endif
return passed;
}
APPENDIX II:ME_BASE.C
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
The embodiment of reference software
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * */
void
motion_estimate_base()
{
Int bx, by, cx, cy, x, y;
Int startx, starty, endx, endy;
Unsigned diff, best, bestx, besty;
For (bx=0;bx<NX/BLOCKX;bx++){
For (by=0;by<NY/BLOCKY;by++){
Best=bestx=besty=UINT_MAX;
Startx=MAX (0, bx*BLOCKX-SEARCHX);
Starty=MAX (0, by*BLOCKY-SEARCHY);
Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX);
Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY);
For (cx=startx;cx<endx;cx++){
For (cy=starty;cy<endy;cy++){
Diff=0;
For (x=0;x<BLOCKX;x++){
For (y=0;y<BLOCKY;y++){
Diff+=ABSD (OldB [cx+x] [cy+y],
NewB [bx*BLOCKX+x] [by*BLOCKY+y]);
}
}
if (diff<best) {
Best=diff;
Bestx=cx;
Besty=cy;
}
}
}
BaseX [bx] [by]=bestx;
BaseY [bx] [by]=besty;
BaseB [bx] [by]=best;
}
}
}
APPENDIX III:ME_TIE.C
#include″src.c″
#include″sad.c″
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Use the quick styles of the locomotion evaluation of SAD instruction
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
void
motion_estimate_tie()
{
Int bx, by, cx, cy, x;
Int startx, starty, endx, endy;
Unsigned diff0, diff1, diff2, diff3, best, bestx, besty;
Unsigned*N, N1, N2, N3, N4, * O, A, B, C, D, E;
For (bx=0;bx<NX/BLOCKX;bx++){
For (by=0;by<NY/BLOCKY;by++){
Best=bestx=besty=UINT_MAX;
Startx=MAX (0, bx*BLOCKX-SEARCHX);
Starty=MAX (0, by*BLOCKY-SEARCHY);
Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX);
Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY);
For (cy=starty;cy<endy;Cy+=sizeof (long))
For (cx=startx;cx<endx;cx++){
Diff0=diff1=diff2=diff3=0;
For (x=0;x<BLOCKX;x++){
N=(unsigned*)
& (NewB [bx*BLOCKX+x] [by*BLOCKY]);
N1=N [0];
N2=N [1];
N3=N [2];
N4=N [3];
O=(unsigned*) & (OldB [cx+x] [cy]);
A=O [0];
B=O [1];
C=O [2];
D=O [3];
E=O [4];
Diff0+=SAD (A, N1)+SAD (B, N2)+
SAD (C, N3)+SAD (D, N4);
#ifdef BIG_ENDIAN
SSAI(24);
Diff1+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)
+
SAD (SRC (C, D), N3)+SAD (SRC (D, E),
N4);
SSAI(16);
Diff2+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)
+
SAD (SRC (C, D), N3)+SAD (SRC (D, E),
N4);
SSAI(8);
Diff3+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)
+
SAD (SRC (C, D), N3)+SAD (SRC (D, E),
N4);
#else
SSAI(8);
Diff1+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)
+
SAD (SRC (D, C), N3)+SAD (SRC (E, D),
N4);
SSAI(16);
Diff2+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)
+
SAD (SRC (D, C), N3)+SAD (SRC (E, D),
N4);
SSAI(24);
Diff3+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)
+
SAD (SRC (D, C), N3)+SAD (SRC (E, D),
N4);
#endif
O+=NY/4;
N+=NY/4;
}
if(diff0<best){
Best=diff0;
Bestx=cx;
Besty=cy;
}
if(diff1<best){
Best=diff1;
Bestx=cx;
Besty=cy+1;
}
if(diff2<best){
Best=diff2;
Bestx=cx;
Besty=cy+2;
}
if(diff3<best){
Best=diff3;
Bestx=cx;
Besty=cy+3;
}
}
}
VectX [bx] [by]=bestx;
VectY [bx] [by]=besty;
VectB [bx] [by]=best;
}
}
}
APPENDIX IV:SAD.C
#if defined(XTENSA)
#include <machine/Customer.h>
#elif defined(TIE)
#include″../dk/me_cstub.c″
#else
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
The absolute difference sum of 4 bytes
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
static inline unsigned
SAD (unsigned ars, unsigned art)
{
Return ABSD (ars > > 24, art > > 24)+
ABSD ((ars > > 16) &255, (art > > 16) &255)+
ABSD ((ars > > 8) &255, (art > > 8) &255)+
ABSD (ars & 255, art & 255);
}
#endif
APPENDIX V:SRC.C
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
If object code is source code, then a global variable is used to store the position of SSAI
Shifting amount.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Directly access and move to right chain instruction.Displacement should be loaded individually with SSAI ()
Depositor
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Direct access to the Shift Right Concatenate Instruction.
The shift amount register must be loaded separately with SSAI()。
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
static inline unsigned
SRC (unsigned ars, unsigned art)
{
unsigned arr;
#ifndef XTENSA
Arr=(ars<<(32-sar)) | (and art>>sar);
#else
Asm volatile (" src t% 0, %1, %2 ": "=a " (arr): " a " (ars), " a "
(art));
#endif
return arr;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Displacement depositor is set
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
static inline void
SSAI(int count)
{
#ifndef XTENSA
Sar=count;
#else
switch(count){
Case 8:
asm volatile(″ssai\t8″);
break;
Case 16:
asm volatile(″ssai\t16″);
break;
Case 24:
asm volatile(″ssai\t24″);
break;
Default:
exit(-1);
}
#endif
}
APPENDIX VI:SOURCE CODE
/ *
Block Motion Estimation:
The purposeof motion estimation is to find the unaligned 8×8 block
of
an existing (old) image that most closely resemblesan aligned 8×8
block.The search here is at any byte offset in+/- 16 bytes in×and
+/- 16 bytes in y.The search is a set of six nested loops。
OldB is pointer to a byte array of old block
NewB is pointer to a byte array of base block
*/
#define NY 480
#define NX 640
#define BLOCKX 16
#define BLOCKY 16
#define SEARCHX 16
#define SEARCHY 16
unsigned char OldB[NX][NY];
unsigned char NewB[NX][NY];
unsigned short VectX[NX/BLOCKX][NY/BLOCKY];
unsigned short VectY[NX/BLOCKX][NY/BLOCKY];
#define MIN (x, y) ((x < y)?x∶y)
#define MAX (x, y) ((x > y)?x∶y)
#define ABS(x) ((x<0)?(-x): (x))
/ * initialization with reference image data for test purposes*/
void init()
{
Intx, y;
For (x=0;x<NX;X++) for (y=0;y<NY;y++){
OldB [x] [y]=x^y;
NewB [x] [y]=x+2*y+2;
}
}
main()
{
Int by, bx, cy, cx, yo, xo;
Unsigned short best, bestx, besty, sumabsdiff0;
init();
For (by=0;by<NY/BLOCKY;by++){
For (bx=0;bx<NX/BLOCKX;Bx++) {/* for each 8 × 8 block in the
Image*/
Best=0 × ffff;/ * look for the minimum difference*/
For (cy=MAX (0, (by*BLOCKY)-SEARCHY);
Cy < MIN (NY-BLOCKY, (by*BLOCKY)+SEARCHY);
Cy++)/* for the old block at each line*/
For (cx=MAX (0, (bx*BLOCKX)-SEARCHX);
Cx < MIN (NX-BLOCKX, (bx*BLOCKX)+SEARCHX);
cx++){
/ * test the N × N block at (bx, by) against NxN blocks*/
/ * at (cx, cy) */
Sumabsdiff0=0;
For (yo=0;yo<BLOCKY;Yo++) {/* for each of N rows in block
*/
For (xo=0;xo<BLOCKX;Xo++) {/* for each of N pixels in
Row*/
Sumabsdiff0+=
ABS(OldB[cx+xo][cy+yo]-
NewB [bx*BLOCKX+xo] [by*BLOCKY+yo]);
}
}
if(sumabsdiff0<best){
Best=sumabsdiff0;Bestx=cx;Besty=cy;}
}
}
VectX [bx] [by]=bestx;
VectY [bx] [by]=besty;
}
}
Annex VII: optimize C code with TIE
Pixel number is packaged as 4/every word
OldW is directed to the pointer of a word array of old piece
NewW is directed to the pointer of a word array of matrix
#define NY 480
#define NX 640
#define BLOCKX 16
#define BLOCKY 16
#define SEARCHX 16
#define SEARCHY 16
#define MIN (x, y) ((x < y)?x∶y)
#define MAX (x, y) ((x > y)?x∶y)
unsigned long OldW[NY][NX/sizeof(long)];
unsigned long NewW[NY][NX/sizeof(long)];
unsigned short VectX[NY/BLOCKY][NX/BLOCKX];
unsigned short VectY[NY/BLOCKY][NX/BLOCKX];
void init()
{
Int x, y;
For (x=0;x<NX/sizeof(long);X++) for (y=0;y<NY;y++){
OldW [y] [x]=((x < < 2) ^y) < < 24 | (((x < < 2)+1) ^y) < < 16 | (((x < < 2)+2) ^y) < < 8
|((x<<2)+3)^y;
NewW [y] [x]=((x < < 2)+2*y+2) < < 24 | (((x < < 2)+1)+2*y+2) < < 16 |
(((x < < 2)+2)+2*y+2) < < 8 | ((x < < 2)+3)+2*y+2;
}
}
main()
{
Register int by, bx, cy, cx, yo, xo;
register unsigned short
Best, bestx, besty, sumabsdiff0, sumabsdiffl, sumabsdiff2, sumabsdiff3;
init();
For (by=0;by<NY/BLOCKY;By++)
For (bx=0;bx<NX/BLOCKX;Bx++) {/* for each N × N block in the
Image*/
Best=0 × ffff;/ * look for the minimum difference*/
For (cy=MAX (0, (by*BLOCKY)-SEARCHY);
Cy < MIN (NY-BLOCKY, (by*BLOCKY)+SEARCHY);
Cy++)/* for the old block at each line*/
For (cx=MAX (0, (bx*BLOCKX-SEARCHX)/sizeof (long));
Cx < MIN ((NX-BLOCKX-2)/sizeof (long), (bx*BLOCKX+SEARCHX)/
sizeof(long));
Cx++)/* and each word (4byte) offset in line*/
/ * test the NxN block at (bx, by) against four N × N blocks*/
/ * at (cx, cy), (cx+1B, cy), (cx+2B, cy) (cx+3B, cy) */
Sumabsdiff0=sumabsdiff1=sumabsdiff2=sumabsdiff3=0;
For (yo=0;yo<BLOCKY;yo++){/*for each of the N lines in
The block*/
For (xo=0;xo<BLOCKX/8;Xo+=2)
Register unsigned long*N, N1, N2*O, A, B, C, W, X;
N=& NewW [by+yo] [bx*BLOCKX/sizeof (long)+xo];
N1=*N;N2=* (N+1);/ * 2words of subject image*/
O=& OldW [cy+yo] [cx+xo];
A=*O;B=* (O+1);C=* (O+2);/ * 3words of
Reference*/
Sumabsdiff0+=sad (A, N1)+sad (B, N2);
SHIFT (24)/* shiftA, B, C left by one byte into W, X*/
Sumabsdiff1+=sad (W, N1)+sad (X, N2);
SHIFT (16)/* shift, B, C left by two bytes into W, X*/
Sumabsdiff2+=sad (W, N1)+sad (X, N2);
SHIFT (8)/* shift A, B, C lft by three bytes into W, X
*/
Sumabsdiff3+=sad (W, N1)+sad (X, N2);
}
}
if(sumabsdiff0<best){
Best=sumabsdiff0;Bestx=cx;Besty=cy;}
if(sumabsdiff1<best){
Best=sumabsdiffl;Bestx=cx+1;Besty=cy;}
if(sumabsdiff2<best){
Best=sumabsdiff2;Bestx=cx+2;Besty=cy;}
if(sumabsdiff3<best){
Best=sumabsdiff3;Bestx=cx+3;Besty=cy;}
}
}
VectX [bx] [by]=bestx;
VectY [bx] [by]=besty;
}
}
}
Adnexa D
/ *
* TIE to Verilog translation routines
*/
/ * SId:tie2ver_write.c, v 1.27 1999/05/11 00:10:18 awang Exp S*/
/ *
* Copyright 1998-1999 Tensilica Inc.
* These coded instructions, statements, and computer programs are
* Confidential Proprietary Information of Tensilica Inc.and may not
be
* disclosed to third parties or copied in any form, in whole or in
Part,
* without the prior written consent of Tensilica Inc.
*/
#include <math.h>
#include″tie.h″
#include″st.h″
#define COMMENTS″//Do not modify this automatically generated file.″
static void tie2ver_write_expression(
FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os);
#define tie2ver_program_foreach_instruction (_ prog, _ inst)
Tie_t*_iclass;\
Tie_program_foreach_iclass (_ prog, _ iclass)
if(tie_get_predefined(_iclass))continue; \
Tie_iclass_foreach_instruction (_ iclass, _ inst)
#define end_tie2ver_program_foreach_instruction \
}end_tie_iclass_foreach_instruction; \
} end_tie_program_foreach_iclass; \
}
#defineTIE_ENFLOP″\n\
Module tie_enflop (tie_out, tie_in, en, clk);\n\
Parameter size=32;\n\
output[size-1∶0]tie_out;\n\
input[size-1∶0]tie_in;\n\
input en;\n\
input clk;\n\
reg[size-1∶0] tmp;\n\
Assign tie_out=tmp;\n\
always@(posedge clk)begin\n\
if(en)\n\
Tmp≤#1tie_in;\n\
end\n\
endmodule\n″
#define TIE_FLOP″\n\
Module tie_flop (tie_out, tie_in, clk);\n\
Parameter size=32;\n\
output [size-1∶0] tie_out;\n\
input [size-1∶0] tie_in;\n\
input clk;\n\
reg [size-1∶0] tmp;\n\
Assign tie_out=tmp;\n\
always @(posedge clk)begin\n\
Tmp≤#1 tie_n;\n\
end\n\
endmodule\n″
#define TIE_ATHENS_STATE″\n\
Module tie athens_state (ns, we, ke, kp, vw, clk, ps);\n\
Parameter size=32;\n\
input[size-1∶0]ns;//next state\n\
input we; //write enable\n\
input ke; //Kill E state\n\
input kp; //Kill Pipeline\n\
input vw; //Valid W state\n\
input clk; //clock\n\
output [size-1∶0]ps;//present state\n\
\n\
wire[size-1∶0]se; //state at E stage\n\
wire[size-1∶0]sm; //state at M stage\n\
wire[size-1∶0]sw; //state at W stage\n\
wire[size-1∶0]sx; //state at X stage\n\
wire ee; //write enable for EM register\n\
wire ew; //write enable for WX register\n\
\n\
Assign se=kp?Sx:ns;\n\
Assign ee=kp | we &~ke;\n\
Assign ew=vw &~kp;\n\
Assign ps=sm;\n\
\n\
Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),
.clk(clk));\n\
Tie_flop # (size) state_MW (.tie_out (sw) .tie_in (sm) .clk (clk));\n\
Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),
.clk(clk));\n\
\n\
endmodule\n″
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Set up and return global program → for the behaviour of operand of user-defined instructions
Count form.The form returned is not contained in each behaviour that predefined instructions is used
Count.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
Static t_table*
Tie2ver_program_get_operand_table (tie_t*prog)
{
Static st_table*tie2ver_program_args=0;
Tie_t*inst;
Char*key, * value;
St_table*operand_table;
St_generator*gen;
If (tie2ver_program_args==0)
Tie2ver_program_args=st_init_table (strcmp, st_strhash);
Tie2ver_program_foreach_instruction (prog, inst)
Operand_table=tie_instruction_get_operand_table (inst);
St_foreach_item (operand_table, gen , &key , &value)
St_insert (tie2ver_program_args, key, value);
}
}end_tie2ver_program_foreach_instruction;
}
return tie2ver_program_args;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print a wiring statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_wire (FILE*fp, tie_t*wire)
{
Int from, to, write_comma;
Tie_t*first, * second, * var;
First=tie_get_first_child (wire);
ASSERT (tie_get_type (first)==TIE_INT);
From=tie_get_integer (first);
Second=tie_get_next_sibling (first);
ASSERT (tie_get_type (second)==TIE_INT);
To=tie_get_integer (second);
Fprintf (fp, " wire ");
if(!(from==0 && to==0))
Fprintf (fp, " [%d: %d] ", from, to);
}
Write_comma=0;
Var=tie_get_next_sibling (second);
while(var!=0)
if(write_comma){
Fprintf (fp, ", ");
}else{
Write_comma=1;
}
Fprintf (fp, " %s ", tie_get_identifier (var));
Var=tree_get_next_sibling (var);
}
Fprintf (fp, ";\n″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print a unary expression formula
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_unary(
FILE*fp, const char*op, tie_t*exp, intlhs, st_table*is, st_table
* os)
{
Fprintf (fp, " %s (", op);
Tie2ver_write_expression (fp, exp, lhs, is, os);
Fprintf (fp, ") ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print a binary expression
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_binary(
FILE*fp, const char*op, tie_t*exp1, tree_t*exp2,
Int lhs, st table*is, st_table*os)
{
Fprintf (fp, " (");
Tie2ver_write expression (fp, exp1, lhs, is, os);
Fprintf (fp, ") %s (", op);
Tie2ver_write_expression (fp, exp2, lhs, is, os);
Fprintf (fp, ") ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print an identifier
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_identifier(
FILE*fp, tie_t*id, int lhs, st_table*is, st_table*os)
{
Tie_t*prog, * first, * second;
Char*name, * dummy;
Name=tie_get_identifier (id);
if((is!=0) && st_lookup (is, name , &dummy))
Fprintf (fp, " %s_%s ", name, lhs?" ns ": " ps ");
}else if((os!=0) && st_lookup (os, name , &dummy))
Fprintf (fp, " %s_%s ", name, lhs?" ns ": " ps ");
}else{
Fprintf (fp, " %s ", name);
}
First=tie_get_first_child (id);
If (first==0)
return;
}
/ * detect whether this is a table access*/
Prog=tie_get_program (id);
If (tie_program_get_table_by_name (prog, name)!=0)
switch(tie_get_type(first)){
CaseTIE_ID:
Fprintf (fp, " (%s) ", tie_get_identifier (first));
break;
Case TIE_INT:
Fprintf (fp, " (%d) ", tie_get_integer (first));
break;
Default:
DIE (" Error:expected type n ");
}
return;
}
Second=tie_get_next_sibling (first);
If (second==0)
Fprintf (fp, " [%d] ", tie_get_integer (first));
return;
}
Fprintf (fp, " [%d: %d] ", tie_get_integer (first)
tie_get_integer(second));
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Print chain expression formula
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_concatenation(
FILE*fp, tie_t*exp, intlhs, st_table*is, st_table*os)
{
Tie_t*comp;
int write_comma;
Write_comma=0;
Fprintf (fp, " { ");
Tie_foreach_child (exp, comp)
if(write_comma){
Fprintf (fp, ", ");
}else{
Write_comma=1;
}
Tie2ver_write_expression (fp, comp, lhs, is, os);
}end_tie_foreach_child;
Fprintf (fp, " } ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Print conditions statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_conditional(
FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)
{
Tie_t*cond_exp, * then_exp, * else_exp;
Cond_exp=tie_get_first_child (exp);
Then_exp=tie_get_next_sibling (cond_exp);
Else_exp=tie_get_next_sibling (then_exp);
ASSERT (tie_get_last_child (exp)==else_exp);
Fprintf (fp, " (");
Tie2ver_write_expression (fp, cond_exp, lhs, is, os);
Fprintf (fp, ")?(″);
Tie2ver_write_expression (fp, then_exp, lhs, is, os);
Fprintf (fp, "): (");
Tie2ver_write_expression (fp, else_exp, lhs, is, os);
Fprintf (fp, ") ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print copy statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_replication(
FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)
{
Tie_t*num, * comp;
Num=tie_get_first_child (exp);
Comp=tie_get_next_sibling (num);
ASSERT (tie_get_last_child (exp)==comp);
ASSERT (tie_get_type (num)==TIE_INT);
Fprintf (fp, " %d{ ", tie_get_integer (num));
Tie2ver_write_expression (fp, comp, lhs, is, os);
Fprintf (fp, " } } ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print an expression formula
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_expression(
FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)
{
tie_type_ttype;
Tie_t*first, * second;
First=tie_get_first_child (exp);
Second=first==0?0:tie_get_next_sibling (first);
Switch (type=tie_get_type (exp)) (
Case TIE_ID:
Tie2ver_write_identifier (fp, exp, lhs, is, os);
break;
Case TIE_INT:
Fprintf (fp, " %d ", tie_get_integer (exp)); break;
Case TIE_CONST:
Fprintf (fp, " %s ", tie_get_constant (exp)); break;
Case TIE_LOGICAL_NEGATION:
Tie2ver_write_unary (fp, "!", first, lhs, is, os);break;
Case TIE_LOGICAL_AND:
Tie2ver_write_binary (fp, " && ", first, second, lhs, is, os);
break;
Case TIE_LOGICAL_OR:
Tie2ver_write_binary (fp, " | | ", first, second, lhs, is, os);
break;
Case TIE_BITWISE_NEGATION:
Tie2ver_write_unary (fp, "~", first, lhs, is, os);break;
Case TIE_BITWISE_AND:
Tie2ver_write_binary (fp, " & ", first, second, lhs, is, os);
break;
Case TIE_BITWISE_OR:
Tie2ver_write_binary (fp, " | ", first, second, lhs, is, os);
break;
Case TIE_BITWISE_XOR:
Tie2ver_write_binary (fp, " ^ ", first, second, lhs, is, os);
break;
Case TIE_BITWISE_XNOR:
Tie2ver_write_binary (fp, "~^ ", first, second, lhs, is, os);
break;
Case TIE_ADD:
Tie2ver_write_binary (fp, "+", first, second, lhs, is, os);
break;
Case TIE_SUB:
Tie2ver_write_binary (fp, "-", first, second, lhs, is, os);
break;
Case TIE_MULT:
Tie2ver_write_binary (fp, " * ", first, second, lhs, is, os);
break;
Case TIE_GT:
Tie2ver_write_binary (fp, " > ", first, second, lhs, is, os);
break;
Case TIE_GEQ:
Tie2ver_write_binary (fp, " >=", first, second, lhs, is, os);
break;
Case TIE_LT:
Tie2ver_write_binary (fp, " < ", first, second, lhs, is, os);
break;
Case TIE_LEQ:
Tie2ver_write_binary (fp, "≤", first, second, lhs, is, os);
break;
Case TIE_EQ:
Tie2ver_write_binary (fp, "==", first, second, lhs, is, os);
break;
Case TIE_NEQ:
Tie2ver_write_binary (fp, "!=", first, second, lhs, is, os);
break;
Case TIE_REDUCTION_AND:
Tie2ver_write_unary (fp, " & ", first, lhs, is, os);break;
Case TIE_REDUCTION_OR:
Tie2ver_write_unary (fp, " | ", first, lhs, is, os);break;
Case TIE_REDUCTION_XOR:
Tie2ver_write_unary (fp, " ^ ", first, lhs, is, os);break;
Case TIE_SHIFT_LEFT:
Tie2ver_write_binary (fp, " < < ", first, second, lhs, is, os);
break;
Case TIE_SHIFT_RIGHT:
Tie2ver_write_binary (fp, " > > ", first, second, lhs, is, os);
break;
Case TIE_REPLICATION:
Tie2ver_write_replication (fp, exp, lhs, is, os);
break;
Case TIE_CONCATENATION:
Tie2ver_write_concatenation (fp, exp, lhs, is, os);
break;
Case TIE_CONDITIONAL:
Tie2ver_write_conditional (fp, exp, lhs, is, os);
break;
Default:
Fprintf (stderr, " Wrong type:%d n ", type);
DIE (" Error:wrong expression type n ");
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Print an assignment statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_assignment(
FILE*fp, tie_t*assign, st_table*in_states, st_table*out_states)
{
Tie_t*lval, * rval;
ASSERT (tie_get_type (assign)==TIE_ASSIGNMENT);
Lval=tie_get_first_child (assign);
Rval=tie_get_last_child (assign);
ASSERT (tie_get_next_-sibling (lval)==rval);
ASSERT (tie_get_-prev_sibling (rval)==lval);
Fprintf (fp, " assign ");
Tie2ver_write_expression (fp, lval, 1, in_states, out_states);
Fprintf (fp, "=");
Tie2ver_write_expression (fp, rval, 0, in_states, out_states);
Fprintf (fp, ";\n″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Print a sentence list
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_statement(
FIEE*fp, tie_t*statement, st_table*in_states, st_table*out_states)
{
Tie_t*child;
ASSERT (tie_get_type (statement)==TIE_STATEMENT);
Tie_foreach_child (statement, child)
switch(tie_get_type(child)){
Case TIE_WIRE:
Tie2ver_write_wire (fp, child);
break;
Case TIE_ASSIGNMENT:
Tie2ver_write_assignment (fp, child, in_states, out_states);
break;
Default:
DIE (" Error:illegal program statement n ");
}
}end_tie_foreach_child;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Module definition is write for " iclass "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_module_declaration (FILE*fp, tie_t*semantic)
{
St_table*operand_table, * state_table;
St_generator*gen;
Tie_t*ilist, * inst;
Char*c, * key, * value;
Fprintf (fp, " n ");
Fprintf (fp, " module %s (", tie_semantic_get_name (semantic));
C=" ";
Operand_table=tie_semantic_get_operand_table (semantic);
St_foreach_item (operand_table, gen , &key , &value)
Fprintf (fp, " %s%s ", c, key);
C=", ";
}
State_table=tie_semantic_get_in_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " %s%s_ps ", c, key);
C=", ";
}
State_table=tie_semantic_get_out_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " %s%s_ns ", c, key);
Fprintf (fp, " %s%s_we ", c, key);
C=", ";
}
Ilist=tie_semantic_get_inst_list (semantic);
Tie_inst_list_foreach_instruction (ilist, inst)
Fprintf (fp, ", %s ", tie_instruction_get_name (inst));
}end_tie_inst_list_foreach_instruction;
Fprintf (fp, ");\n″);
St_foreach_item (operand_table, gen , &key , &value)
switch((tie_type_t)value){
Case TIE_ARG_IN:
Fprintf (fp, " input [31: 0] %s;N ", key);break;
Case TIE_ARG_OUT:
Fprintf (fp, " output [31: 0] %s;N ", key);break;
Case TIE_ARG_INOUT:
Fprintf (fp, " inout [31: 0] %s;N ", key);break;
Default:
DIE (" Error:unexpected arg type n ");
}
}
State_table=tie_semantic_get_in_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " input [%d: 0] %s_ps;N ", (int) value_1, key);
}
State_table=tie_semantic_get_out_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " output [%d: 0] %s_ns;N ", (int) value-1, key);
Fprintf (fp, " output%s_we;N ", key);
}
Tie_inst_list_foreach_instruction (ilist, inst)
Fprintf (fp, " input %s;N ", tie_instruction_get_name (inst));
}end_tie_inst_list_foreach_instruction;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
" form " is printed to a TIE file
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_table (FILE*fp, tie_t*table)
{
Int i, width, size, bits, ivalue;
Char*oname, * iname, * cvalue;
Tie_t*value;
Oname=tie_table_get_name (table);
Iname=" index ";
Width=tie_table_get_width (table);
Size=tie_table_get_depth (table);
Bits=(int) ceil (log (size)/log (2));
Fprintf (fp, " nfunction [%d: 0] %s;N ", width-1, oname);
Fprintf (fp, " input [%d: 0] %s;N ", bits-1, iname);
Fprintf (fp, " case (%s) n ", iname);
I=0;
Tie table_foreach_value (table, value)
Fprintf (fp, " %d ' d%d:%s=", bits, i, oname);
switch(tie_get_type(value)){
Case TIE_CONST:
Cvalue=tie_get_constant (value);
Fprintf (fp, " %d ' b%s;N ", width,
tie_constant_get_binary_string(cvalue));
break;
Case TIE_INT:
Ivalue=tie_get_integer (value);
Fprintf (fp, " %d ' d%d;N ", width, ivalue);
break;
Default:
DIE (" Internal Error:unexpected type n ");
}
i++;
}end_tie_table_foreach_value;
Fprintf (fp, " default:%s=%d ' d0;N ", oname, width);
Fprintf (fp, " endcase n ");
Fprintf (fp, " endfunction n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Enable logic is write for being write by each of " semantic " statement amendment state
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_semantic_write_we (FILE*fp, tie_t*semantic)
{
Tie_t*inst;
St_table*semantic_state_table, * inst_state_table;
St_generator*gen;
Char*key, * value, * c, * iname;
int found;
Semantic_state_table=tie_semantic_get_out_state_table (semantic);
St_foreach_item (semantic_state_table, gen , &key , &value)
Fprintf (fp, " assign%s_we=", key);
C=" ";
Tie_semantic_foreach_instruction (semantic, inst)
Iname=tie_instruction_get_name (inst);
Inst_state_table=tie_instruction_get_state_table (inst);
Found=st_lookup (inst_state_table, key , &value);
if (found && ((tie_type_t) value!=TIE_ARG_IN))
Fprintf (fp, " %s1 ' b1 & %s ", c, iname);
}else{
Fprintf (fp, " %s1 ' b0 & %s ", c, iname);
}
C=" n | ";
}end_tie_semantic_foreach_instruction;
Fprintf (fp, ";\n″);
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
" semantic " statement is write TIE file
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_semantic (FILE*fp, tie_t*semantic)
{
Tie_t*table, * statement;
Ls_t*tables;
St_table*in_state_table, * out_state_table;
ASSERT (tie_get_type (semantic)==TIE_SEMANTIC);
Tie2ver_write_module_declaration (fp, semantic);
Statement=tie_semantic_get_statement (semantic);
In_state_table=tie_semantic_get_in_state_table (semantic);
Out_state_table=tie_semantic_get_out_state_table (semantic);
Tie2ver_write_statement (fp, statement, in_state_table,
out_state_table);
Tables=tie_expression_get_tables (statement,
tie_get_program(semantic));
Ls_foreach data (tie_t*, tables, table)
Tie2ver_write_table (fp, table);
}end_ls_foreach_data;
ls_free(tables);
Tie2ver_semantic_write_we (fp, semantic);
Fprintf (fp, " endmodule n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Top module declaration is printed for combination semanteme
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_top_module (FILE*fp, tie_t*prog)
{
St_generator*gen;
Char*key, * value;
St_table*operand_table;
Tie_t*inst, * iclass;
Fprintf (fp, " n ");
Fprintf (fp, " module UserInstModule (clk, out_E, ars_E, art_E,
inst_R″);
Fprintf (fp, ", Kill_E, killPipe_W, valid_W ");
Tie_program_foreach_iclass (prog, iclass)
if(tie_get_predefined(iclass))continue;
Tie_iclass_foreach_instruction (iclass, inst)
Fprintf (fp, ", %s_R ", tie_instruction_get_name (inst));
}end_tie_iclass_foreach_instruction;
}end_tie_program_foreach_iclass;
Fprintf (fp, ", en_R);\n″);
Fprintf (fp, " input clk;\n″);
Fprintf (fp, " output [31: 0] out_E;\n″);
Fprintf (fp, " input [31: 0] ars_E;\n″);
Fprintf (fp, " input [31: 0] art_E;\n″);
Fprintf (fp, " input [23: 0] inst_R;\n″);
Fprintf (fp, " input en_R;\n″);
Fprintf (fp, " input Kill_E, killPipe_W, valid_W;\n″);
Tie2ver_program_foreach_instruction (prog, inst)
Fprintf (fp, " input %s_R;N ", tie_instruction_get_name
(inst));
}end_tie2ver_program_foreach_instruction;
Tie2ver_program_foreach_instruction (prog, inst)
Fprintf (fp, " wire %s_E;N ", tie_instruction_get_name (inst));
}end_tie2ver_program_foreach_instruction;
Operand_table=tie2ver_program_get_operand_table (prog);
St_foreach_item (operand_table, gen , &key , &value)
if((tie_type_t)value!=TIE_ARG_IN)
Fprintf (fp, " wire [31: 0] %s_E;N ", key);
}
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
One is write for each semantic chunk with for each each output selecting signal
Section wiring program
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_wire_declaration (FILE*fp, tie_t*prog)
{
Tie_t*semantic, * state;
St_table*operand_table, * global_operand_table;
St_table*state_table;
St_generator*gen;
Char*key, * value, * shame;
int width;
Global_operand_table=tie2ver_program_get_operand_table (prog);
St_forsach_item (global_operand_table, gen , &key , &value)
If ((tie_type_t) value==TIE_ARG_IN)
If (strcmp (key, " art ")!=0 && strcmp (key, " ars ")!=0)
Fprintf (fp, " wire [31: 0] %s_R, %s_E;N ", key, key);
}
}
}
Tie_program_foreach_state (prog, state)
if(tie_get_predefined(state))continue;
Sname=tie_state_get_name (state);
Width=tie_state_get_width (state);
Fprintf (fp, " wire [%d: 0] %s_ps, %s_ns;N ", width-1, sname,
sname);
Fprintf (fp, " wire %s_we;N ", sname);
} end_tie_program_foreach_state;
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Sname=tie_semantic_get_name (semantic);
Operand_table=tie_semantic_get_operand_table (semantic);
St_foreach_item (operand_table, gen , &key , &value)
if((tie_type_t)value!=TIE_ARG_IN)
Fprintf (fp, " wire{31: 0] %s_%s;N ", sname, key);
}
}
State_table=tie_semantic_get_out_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " wire [%d: 0] %s_%s_ns;N ", (int) value-1,
Sname, key);
Fprintf (fp, " wire%s_%s_we;N ", sname, key);
}
Fprintf (fp, " wire%s_select;N ", sname);
}end_tie_program_foreach_semantic;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Write a floating-point operation declarative statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_flop_instance (FILE*fp, char*name, int num)
{
Char*fmt;
Fmt=" tie_flop# (%d) f%s (.tie_out (%s_E) .tie_in (%s_R),
.clk(clk));\n″;
Fprintf (fp, fmt, num, name, name, name);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Latch all command signals for R level
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_flop (FILE*fp, tie_t*prog)
{
Char*name;
Tie_t*inst;
Tie2ver_program_foreach_instruction (prog, inst)
Name=tie_instruction_get_name (inst);
Tie2ver_write_flop_instance (fp, name, 1);
}end_tie2ver_program_foreach_instruction;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
An example is write for each semantic chunk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_semantic_instance (FILE*fp, tie_t*prog)
{
Tie_t*semantic, * ilist, * inst;
Const char*iname, * aname, * c;
St_table*operand_table, * state_table;
St_generator*gen;
Char*key, * value;
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Iname=tie_semantic_get_name (semantic);
Fprintf (fp, " %s i%s (", iname, iname);
C=" ";
Operand_table=tie_semantic_get_operand_table (semantic);
St_foreach_item (operand_table, gen , &key , &value)
If ((tie_type_t) value==TIE_ARG_IN)
Fprintf (fp, " %s n.%s (%s_E) ", c, key, key);
}else{
Fprintf (fp, " %s n.%s (%s_%s) ", c, key, iname, key);
}
C=", ";
}
State_table=tie_semantic_get_in_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " %s n.%s_ps (%s_ps) ", c, key, key);
C=", ";
}
State_table=tie_semantic_get_out_state_table (semantic);
St_foreach_item (state_table, gen , &key , &value)
Fprintf (fp, " %s n.%s_ns (%s_%s_ns) ", c, key, iname, key);
Fprintf (fp, " %s n.%s_-we (%s_%s_we) ", c, key, iname, key);
C=", ";
}
Ilist=tie_semantic_get_inst_list (semantic);
Tie_inst_list_foreach_instruction (ilist, inst)
Aname=tie_instruction_get_name (inst);
Fprintf (fp, ", n .%s (%s_E) ", aname, aname);
}end_tie_inst_list_foreach_instruction;
Fprintf (fp, ");\n″);
}end_tie_program_foreach_semantic;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
An example is write for each state
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_state_instance (FILE*fp, tie_t*prog)
{
Tie_t*state;
Char*sname;
int width;
Tie_program_foreach_state (prog, state)
if(tie_get_predefined(state))continue;
Sname=tie_state_get_name (state);
Width=tie_state_get_width (state);
Fprintf (fp, " tie_athens_state # (%d) i%s (n ", width, sname);
Fprintf (fp, " .ns (%s_ns), n ", sname);
Fprintf (fp, " .we (%s_we), n ", sname);
Fprintf (fp, " .ke (Kill_E), n ");
Fprintf (fp, " .kp (killPipe_W), n ");
Fprintf (fp, " .vw (valid_W), n ");
Fprintf (fp, " .clk (clk), n ");
Fprintf (fp, " .ps (%s_ps));N ", sname);
}end_tie_program_foreach_state;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
It is that an output compilation operation number selects logic
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_operand_selection_logic_one (FILE*fp, tie_t*prog, char
* name)
{
Tie_t*semantic;
Char*c, * dummy;
St_table*operand_table;
Fprintf (fp, " assign%s_E=", name);
C=" ";
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Operand_table=tie_semantic_get_operand_table (semantic);
Fprintf (fp, " %s ", c);
If (st_lookup (operand_table, name , &dummy))
Fprintf (fp, " %s_ ", tie_semantic_get_name (semantic));
Fprintf (fp, " %s & ", name);
}else{
Fprintf (fp, " 32{1 ' b0}}& ");
}
Fprintf (fp, " 32{%s_select}} ", tie_semantic_get_name
(semantic));
C=" n | ";
}end_tie_program_foreach_semantic;
Fprintf (fp, ";\n″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
State selection logic is write for a kind of state
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2ver_write_state_selection_logic_one(
FILE*fp, tie_t*prog, char*name, int width)
{
Tie_t*semantic;
Char*c, * value, * sname;
St_table*state_table;
Fprintf (fp, " assign %s_ns=", name);
C=" ";
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Sname=tie_semantic_get_name (semantic);
State_table=tie_semantic_get_out_state_table (semantic);
Fprintf (fp, " %s ", c);
If (st_lookup (state_table, name , &value))
Fprintf (fp, " %s_%s_ns & ", sname, name);
}else{
Fprintf (fp, " %d{1 ' b0}}& ", width);
}
Fprintf (fp, " %d{%s_select}} ", width, sname);
C=" n | ";
}end_tie_program_foreach_semantic;
Fprintf (fp, ";\n″);
Fprintf (fp, " assign %s_we=", name);
C=" ";
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Sname=tie_semantic_get_name (semantic);
State_table=tie_semantic_get_out_state_table (semantic);
Fprintf (fp, " %s ", c);
If (st_lookup (state_table, name , &value))
Fprintf (fp, " %s_%s_we & ", sname, name);
}else{
Fprintf (fp, " 1 ' b0 & ");
}
Fprintf (fp, " %s_select ", sname);
C=" n | ";
}end_tie_program_foreach_semantic;
Fprintf (fp, ";\n″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Selection logic is write for top module
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_selection_logic (FILE*fp, tie_t*prog)
{
Tie_t*semantic, * ilist, * inst, * state;
Char*key, * value, * c, * sname;
St_table*global_operand_table;
St_generator*gen;
int width;
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Ilist=tie_semantic_get_inst_list (semantic);
Fprintf (fp, " assign %s_select=",
tie_semantic_get_name(semantic));
C=" ";
Tie_inst_list_foreach_instruction (ilist, inst)
Fprintf (fp, " %s%s_E ", c, tie_instruction_get_name (inst));
C=" n | ";
}end_tie_inst_list_foreach_instruction;
Fprintf (fp, ";\n″);
}end_tie_program_foreach_semantic;
Global_operand_table=tie2ver_program_get_operand_table (prog);
St_foreach_item (global_operand_table, gen , &key , &value)
if((tie_type_t)value!=TIE_ARG_IN)
Tie2ver_write_operand_selection_logic_one (fp, prog, key);
Fprintf (fp, " assign out_E=%s_E;N ", key);
}
}
Tie_program_foreach_state (prog, state)
if(tie_get_predefined(state))continue;
Sname=tie_state_get_name (state);
Width=tie_state_get_width (state);
Tie2ver_write_state_selection_logic_one (fp, prog, sname, width);
}end_tie_program_foreach_state;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Write a series of assignment statement, in order to from instruction, extract " field "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_field_recur (FILE*fp, tie_t*prog, tie_t*field, char
* suffix)
{
Tie_t*subfield, * newfield;
Char*c, * name;
C=" ";
Fprintf (fp, " { ");
Tie_field_foreach_subfield (field, subfield)
Fprintf (fp, " %s ", c);
switch(tie_get_type(subfield)){
Case TIE_ID:
Name=tie_get_identifier (subfield);
Newfield=tie_program_get_field_by_name (prog, name);
If (newfield==0)
Fprintf (fp, " inst R ");
}else{
Tie2ver_write_field_recur (fp, prog, newfield, suffix);
}
break;
Case TIE_SUBFIELD:
Name=tie_subfield_get_name (subfield);
Newfield=tie_program_get_field_by_name (prog, name);
If (newfield==0)
Fprintf (fp, " inst_R ");
}else{
DIE (" Error:unexpected subfield name (expect ' inst ') n ");
}
Fprintf (fp, " [%d: ", tie_subfield_get_from_index
(subfield));
Fprintf (fp, " and %d] ", tie_subfield_get_to_index (subfield));
break;
Default:
DIE (" Error:unexpected subfield type n ");
}
C=", ";
}end_tie_field_foreach_subfield;
Fprintf (fp, " } ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Write a series of assignment statement, in order to from instruction, extract " field "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_field (FILE*fp, tie_t*prog, tie_t*field, char*suffix)
{
Fprintf (fp, " assign %s%s=", tie_field_get_name (field), suffix);
Tie2ver_write_field_recur (fp, prog, field, suffix);
Fprintf (fp, ";\n″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
A module is write for " operand "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_one_immediate (FILE*fp, tie_t*prog, tie_t*operand)
{
Tie_t*decoding, * field, * table;
Char*oname, * fname;
Ls_t*tables;
int width;
ASSERT (tie_get_type (operand)==TIE_OPERAND);
Oname=tie_operand_get_name (operand);
Fname=tie_operand_get_field_name (operand);
Field=tie_program_get_field_by_name (prog, fname);
Width=tie_field_get_width (field);
Fprintf (fp, " n ");
Fprintf (fp, " module %s (inst_R, %s);N ", oname, oname);
Fprintf (fp, " input [23: 0] inst_R;\n″);
Fprintf (fp, " output [31: 0] %s;N ", oname);
Fprintf (fp, " wire [%d: 0] %s;N ", tie_field_get_width (field)-1,
fname);
Tie2ver_write_field (fp, prog, fieid, " ");
Decoding=tie_operand_get_decoding_expression (operand);
Fprintf (fp, " assign%s=", oname);
Tie2ver_write_expression (fp, decoding, 0,0,0);
Fprintf (fp, ";\n″);
Tables=tie_expression_get_tables (decoding, prog);
Ls_foreach_data (tie_t*, tables, table)
Tie2ver_write_table (fp, table);
}end_ls_foreach_data;
ls_free(tables);
Fprintf (fp, " endmodule n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
A module is write for each immediate operation number decoder logic
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_immediate (FILE*fp, tie_t*prog)
{
St_table*operand_table;
Char*key, * value;
St_generator*gen;
tie_t*operand;
tie_t*field;
Operand_table=tie2ver_program_get_operand_table (prog);
St_foreach_item (operand_table, gen , &key , &value)
If ((tie_type_t) value==TIE_ARG_IN)
If (strcmp (key, " art ")!=0 && strcmp (key, " ars ")!=0)
Operand=tie_program_get_operand_by_name (prog, key);
if(operand!=0)
if(!tie_get_predefined(operand)){
Tie2ver_write_one_immediate (fp, prog, operand);
}
}else{
Field=tie_program_get_fieid_by_name (prog, key);
If (field==0)
Fprintf (stderr, " Error:invalidoperand %s
N ", key);
}
}
}
}
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
For operand " write a module
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_one_operand_instance (FILE*fp, tie_t*prog, tie_t
* operand)
{
Char*oname;
ASSERT (tie_get_type (operand)==TIE_OPERAND);
Oname=tie_operand_get_name (operand);
Fprintf (fp, " %s i%s (.inst (inst_R) .%s (%s_R));N ", oname, oname,
Oname, oname);
Tie2ver_write_flop_instance (fp, oname, 32);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Write a statement, in order to from inst_R, extract " field name "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_one_field_instance (FILE*fp, tie_t*prog, tie_t*field)
{
Char*name;
Tie2ver_write_field (fp, prog, field, " _ R ");
Name=tie_field_get_name (field);
Tie2ver_write_flop_instance (fp, name, 32);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
An example is write for each immediate operation number decoder logic
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2ver_write_immediate_instance (FILE*fp, tie_t*prog)
{
Char*key, * value;
St_table*operand_table;
St_generator*gen;
Tie_t*operand, * field;
Operand_table=tie2ver_program_get_operand_table (prog);
St_foreach_item (operand_table, gen , &key , &value)
If ((tie_type_t) value==TIE_ARG_IN)
Operand=tie_program_get_operand_by_name (prog, key);
if(operand!=0 && tie_operand_is_immediate (operand))
Tie2ver_write_one_operand_instance (fp, prog, operand);
Else if (operand==0)
Field=tie_program_get_field_by_name (prog, key);
if(field!=0)
Tie2ver_write_one_field_instance (fp, prog, field);
}
}
}
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
" prog " is printed to TIE file
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
void
Tie2ver_write_verilog (FILE*fp, tie_t*prog)
{
tie_t*semantic;
/ * write tie primitives*/
Fprintf (fp, COMMENTS);
Fprintf (fp, TIE_ENFLOP);
Fprintf (fp, TIE_FLOP);
Fprintf (fp, TIE_ATHENS_STATE);
/ * write each semantic block as a verilog module*/
ASSERT (tie_get_type (prog)==TIE_PROGRAM);
Tie_program_foreach_semantic (prog, semantic)
if(tie_get_predefined(semantic))continue;
Tie2ver_write_semantic (fp, semantic);
}end_tie_program_foreach_semantic;
/ * write each immediate operand as a verilog module*/
Tie2ver_write_immediate (fp, prog);
/ * write the top_level Verilog module*/
Tie2ver_write_top_module (fp, prog);
Tie2ver_write_wire_declaration (fp, prog);
Tie2ver_write_flop (fp, prog);
Tie2ver_write_immediate_instance (fp, prog);
Tie2ver_write_semantic_instance (fp, prog);
Tie2ver_write_state_instance (fp, prog);
Tie2ver_write_selection_logic (fp, prog);
Fprintf (fp, " endmodule n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
" prog " is printed to TIE file
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
void
Tie2ver_write_instruction (FILE*fp, tie_t*prog)
{
tie_t*inst;
Int first=1;
Tie2ver_program_foreach_instruction (prog, inst)
if(first){
Fprintf (fp, " %s ", tie_instruction_get_name (inst));
First=0;
}else{
Fprintf (fp, " %s ", tie_instruction_get_name (inst));
}
}end_tie2ver_program_foreach_instruction;
}
/ *
* Local Variables:
* mode:c
* c-basic-offset:4
* End:
*/
Adnexa E
#include″tie.h″
#define COMMENTS "/* Do not modify.This is automatically
Generated.*/"
#define tie2gcc_program_foreach_instruction (_ prog, _ inst)
Tie_t*_iclass;\
Tie_program_foreach_iclass (_ prog, _ iclass)
if(tie_get_predefined(_-iclass))continue;\
Tie_iclass_foreach_instruction (_ iclass, _ inst)
#define end_tie2gcc_program_foreach_instruction \
}end_tie_iclass_foreach_instruction;\
}end_tie_program_foreach_iclass;\
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Set up and return global program → for the independent variable form of user-defined instructions.
The form returned is not contained in each independent variable used in predefined instructions.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
Static st_table*
Tie2gcc_program_get_operand_table (tie_t*prog)
{
Static st_table*tie2gcc_program_args=0;
Tie_t*inst;
Char*key, * value;
St_table*arg_table;
St_generator*gen;
If (tie2gcc_program_args==0)
Tie2gcc_program_args=st_init_table (strcmp, st_strhash);
Tie2gcc_program_foreach_instruction (prog, inst)
Arg_table=tie_instruction_get_operand_table (inst);
St_foreach_item (arg_table, gen , &key , &value)
St_insert (tie2gcc_program_args, key, value);
}
st_free_table(arg_table);
}end_tie2gcc_program_foreach_instruction;
}
return tie2gcc_program_args;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce function and independent variable explanation
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2gcc_write_function (FILE*fp, tie_t*inst, tie_t*args)
{
Tie_t*arg;
Char*c;
C=" ";
Fprintf (fp, " n#define %s (", tie_instruction_get_name (inst));
Tie_args_foreach_arg (args, arg)
if(tie_get_type(arg)!=TIE_ARG_OUT)
Fprintf (fp, " %s%s ", c, tie_arg_get_name (arg));
C=", ";
}
}end_tie_args_foreach_arg;
Fprintf (fp, ") n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Return the list of each independent variable in " args ", first export args.The row returned, table
Should callee release.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
Ls_t*
Tie2gcc_args_get_ordered (tie_t*args)
{
Tie_t*arg;
Ls_t*arglist;
Arglist=ls_alloc ();
Tie_args_foreach_arg (args, arg)
if(tie_get_type(arg)!=TIE_ARG_IN)
Ls_append (arglist, arg);
}
}end_tie_args_foreach_arg;
Tie_args_foreach_arg (args, arg)
if(tie_get_type(arg)!=TIE_ARG_OUT)
Ls_append (arglist, arg);
}
}end_tie_args_foreach_arg;
return arglist;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Write out an ASM statement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
tie2gcc_write_one_asm(
FILE*fp, tie_t*prog, tie_t*inst, tie_t*args, int value)
{
Tie_t*arg, * operand, * state;
Tie_type_t type, ptype;
Ls_t*arglist;
Char*t, s, c, * name, * n;
int i;
/ * write the asm statement*/
Fprintf (fp, " asm volatile (" %s t ",
tie_instruction_get_name(inst));
I=0;
Tie_args_foreach_arg (args, arg)
Fprintf (fp, " %s%%%d ", i==0?" ": ", ", i);
i++;
}end_tie_args_foreach_arg;
Fprintf (fp, " " ");
Ptype=TIE_UNKNOWN;
Arglist=tie2gcc_args_get_ordered (args);
Ls_foreach_data (tie_t*, arglist, arg)
Name=tie_arg_get_name (arg);
Operand=tie_program_get_operand_by_name (prog, name);
if(operand!=0)
State=tie_operand_get_state (operand);
if(state!=0)
N=tie_state_get_name (state);
If (strcmp (n, " AR ")==0}{
C=' a ';
Else if (strcmp (n, " FR ")==0)
C=' f ';
Else if (strcmp (n, " DR ")==0)
C=' d ';
Else if (strcmp (n, " BR ")==0)
C=' b ';
}else{
DIE (" Internal Error:invalid state n ");
}
}else{
C=' i ';
}
}else{
C=' i ';
}
Type=tie_get_type (arg);
If (ptype==TIE_UNKNOWN && type==TIE_ARG_IN)
Fprintf (fp, ": ");
}
S=type==ptype?', ': ': ';
T=type==TIE_ARG_IN?" ":, "=";
Fprintf (fp, " %c " %s%c " (%s) ", s, t, c, name);
Ptype=type;
}end_ls_foreach_data;
ls_free(arglist);
Fprintf (fp, ");″);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce at line function for " inst "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2gcc_write_asm (FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)
{
Tie_t*arg, * out_arg;
/ * declear output variable and find the immediate operand*/
Fprintf (fp, " ({ ");
Out_arg=0;
Tie_args_foreach arg (args, arg)
If (tie_get_type (arg)==TIE_ARG_OUT)
Fprintf (fp, " iht%s;", tie_arg_get_name (arg));
Out_arg=arg;
}
}end_tie_args_foreach_arg;
Tie2gcc_write_one_asm (fp, prog, inst, args, _ 1);
/ * return the results*/
if(out_arg!=0)
Fprintf (fp, " %s;", tie_arg_get_name (out_arg));
}
Fprintf (fp, " }) n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
For " inst " produce one grand
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2gcc_write_inst (FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)
{
Tie2gcc_write_function (fp, inst, args);
Tie2gcc_write_asm (fp, prog, inst, args);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Producing gcc header file, it will be included into application code, in order to uses user's definition
Instructions.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
void
Tie2gcc_write_gcc (FILE*fp, tie_t*prog)
{
Tie_t*iclass, * ilist, * inst, * args;
ASSERT (tie_get_type (prog)==TIE_PROGRAM);
Fprintf (fp, " %s n ", COMMENTS);
Tie_program_foreach_iclass (prog, iclass)
if(tie_get_predefined(iclass))continue;
Ilist=tie_iclass_get_inst_list (iclass);
Args=tie_iclass_get_-io_args (iclass);
Tie_inst_list_foreach_instruction (ilist, inst)
Tie2gcc_write_inst (fp, prog, inst, args);
}end_tie_inst_list_foreach_instruction;
}end_tie_program_foreach_iclass;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Write out each function right value with each numerical value immediately of test
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie2gcc_write_operand_check_one (FILE*fp, char*name)
{
Fprintf (fp, " nint n ");
Fprintf (fp, " tensilica_%s (int v) n ", name);
Fprintf (fp, " { n ");
Fprintf (fp, " tensilica_insnbuf_type insn;\n″);
Fprintf (fp, " int new_v;\n″);
Fprintf (fp, " if (!Set_%s_field (insn, v)) return O;N ", name);
Fprintf (fp, " new_v=get_%s_field (insn);N ", name);
Fprintf (fp, " return new_v==v;\n″);
Fprintf (fp, " } n ");
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * *
Write out each function right value with each numerical value immediately of test
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
void
Tie2gcc_write_operand_check (FILE*fp, tie_t*prog)
{
St_table*arg_table;
St_generator*gen;
Char*key, * value;
Arg_table=tie2gcc_program_get_operand_table (prog);
St_foreach_item (arg_table, gen , &key , &value)
If ((tie_type_t) value==TIE_ARG_IN)
If (strcmp (key, " art ")!=0&&strcmp (key, " ars ")!=0)
Tie2gcc_write_operand_check_one (fp, key);
}
}
}
}
Adnexa F
/ *
* TIE user_register routines
*/
/ * $ Id*/
/ *
* Copyright 1998-1999 Tensilica Inc.
* These coded instructions, statements, and computer programs are
* Confidential Proprietary Information of Tensilica Inc.and may not
be
* disclosed to third parties or copied in any form, in whole or in
Part,
* without the prior written consent of Tensilica Inc.
*/
#include<math.h>
#include″tie.h″
#include″tie_int.h″
typede fstruct ureg_struct{
int statef;
int statet;
int uregf;
int uregt;
int ureg;
Char*name;
}ureg_t;
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Return the index of " ureg "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
int
Tie_ureg_get_index (tie_t*ureg)
{
ASSERT (tie_get_type (ureg)==TIE_UREG);
return tie_get_integer(tie_get_first_child(ureg));
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Return the expression formula of " ureg "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
Tie_t*
Tie_ureg_get_expression (tie_t*ureg)
{
Tie_t*index;
ASSERT (tie_get_type (ureg)==TIE_UREG);
Index=tie_get_first_child (ureg);
return tie_get_next_sibling(index);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce the character string of the constant index of an expression " ureg "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static char ureg_index[10];
Char*tie_ureg_get_index_constant (tie_t*ureg)
{
Sprintf (ureg_index, " 8 ' d%d ", tie_ureg_get_index (ureg));
return ureg_index;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * *
Produce the st field for RUR instruction
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_st_field (tie_t*program)
{
Tie_t*field;
Field=tie_alloc (TIE_FIELD);
Tie_append_child (field, tie_create_identifier (" st "));
Tie_append_child (field, tie_create_identifier (" s "));
Tie_append_child (field, tie_create_identifier (" t "));
Tie_program_add (program, field);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Produce RUR operation code
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_rur_opcode (tie_t*program)
{
Tie_t*opcode, * encode;
Opcode=tie_alloc (TIE_OPCODE);
Tie_append_child (opcode, tie_create_identifier (" RUR "));
Encode=tie_alloc (TIE_ENCODING);
Tie_append_child (opcode, encode);
Tie_append_child (encode, tie_create_identifier (" op2 "));
Tie_append_child (encode, tie_create_constant (" 4 ' b1110 "));
Encode=tie_alloc (TIE_ENCODING);
Tie_append_child (opcode, encode);
Tie_append_child (encode, tie_create_identifier (" RST3 "));
Tie_program_add (program, opcode);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce WUR operation code
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_wur_opcode (tie_t*program)
{
Tie_t*opcode, * encode;
Opcode=tie_alloc (TIE_OPCODE);
Tie_append_child (opcode, tie_create_identifier (" WUR "));
Encode=tie_alloc (TIE_ENCODING);
Tie_append_child (opcode, encode);
Tie_append_child (encode, tie_create_identifier (" op2 "));
Tie_append_child (encode, tie_create_constant (" 4 ' b1111 "));
Encode=tie_alloc (TIE_ENCODING);
Tie_append_child (opcode, encode);
Tie_append_child (encode, tie_create_identifier (" RST3 "));
Tie_program_add (program, opcode);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Produce RUR iclass
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_rur_iclass (tie_t*program)
{
Tie_t*iclass, * ilist, * args, * arg, * state;
Char*name;
Iclass=tie_alloc (TIE_ICLASS);
Tie_append_child (iclass, tie_create_identifier (" rur "));
Ilist=tie_alloc (TIE_INST_LIST);
Tie_append_child (iclass, ilist);
Tie_append_child (ilist, tie_create_identifier (" RUR "));
Args=tie_alloc (TIE_ARG_LIST);
Tie_append_child (iclass, args);
Arg=tie_alloc (TIE_ARG_OUT);
Tie_append_child (args, arg);
Tie_append_child (arg, tie_create_identifier (" arr "));
Arg=tie_alloc (TIE_ARG_IN);
Tie_append_child (args, arg);
Tie_append_child (arg, tie_create_identifier (" st "));
Args=tie_alloc (TIE_ARG_LIST);
Tie_append_child (iclass, args);
Tie_program_foreach_state (program, state)
if(tie_get_predefined(state))continue;
Arg=tie_alloc (TIE_ARG_IN);
Tie_append_child (args, arg);
Name=tie_state_get_name (state);
Tie_append_child (arg, tie_create_identifier (name));
}end_tie_program_foreach_state;
Tie_program_add (program, iclass);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Produce WUR operation code
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_wur_iciass (tie_t*program)
{
Tie_t*iclass, * ilist, * args, * arg, * state;
Char*name;
Iclass=tie_alloc (TIE_ICLASS);
Tie_append_child (iclass, tie_create_identifief (" wur "));
Ilist=tie_alloc (TIE_INST_LIST);
Tie_append_child (iclass, ilist);
Tie_append_child (ilist, tie_create_identifier (" WUR "));
Args=tie_alloc (TIE_ARG_LIST);
Tie_append_child (iclass, args);
Arg=tie_alloc (TIE_ARG_IN);
Tie_append_child (args, arg);
Tie_append_child (arg, tie_create_identifier (" art "));
Arg=tie_alloc (TIE_ARG_IN);
Tie_append_child (args, arg);
Tie_append_child (arg, tie_create_identifier (" sr "));
Args=tie_alloc (TIE_ARG_LIST);
Tie_append_child (iclass, args);
Tie_program_foreach_state (program, state)
if(tie_get_predefined(state))continue;
Arg=tie_alloc (TIE_ARG_INOUT);
Tie_append_child (args, arg);
Name=tie_state_get_name (state);
Tie_append_child (arg, tie_create_identifier (name));
}end_tie_program_foreach_state;
Tie_program_add (program, iclass);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
A group selection signal is produced for each ureg
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_selection_signals (tie_t*prog, tie_t*stmt, char
* fname)
{
Tie_t*ureg, * wire, * assign, * equal, * id;
Intindex, max_index, width;
char wname[80];
Max_index=0;
Tie_program_foreach_ureg (prog, ureg)
Index=tie_ureg_get_index (ureg);
Max_index=MAX (max_index, index);
}end_tie_program_foreach_ureg;
Width=(int) ceil (log (max_index+1)/log (2));
Tie_program_foreach_ureg (prog, ureg)
Index=tie_ureg_get_index (ureg);
Wire=tie_alloc (TIE_WIRE);
Sprintf (wname, " ureg_sel_%d ", index);
Tie_append_child (wire, tie_create_integer (0));
Tie_append_child (wire, tie_create_integer (0));
Tie_append_child (wire, tie_create_identifier (wname));
Tie_append_child (stmt, wire);
Assign=tie_alloc (TIE_ASSIGNMENT);
Tie_append_child (assign, tie_create_identifier (wname));
Tie_append_child (stmt, assign);
Equal=tie_alloc (TIE_EQ);
Sprintf (wname, " %d ' d%d ", width, index);
Id=tie_create_identifier (fname);
Tie_append_child (id, tie_create_integer (width_1));
Tie_-append_child (id, tie_create_integer (0));
Tie_append_child (equal, id);
Tie_append_child (equal, tie_create_constant (wname));
Tie_append_child (assign, equal);
}end_tie_program_foreach_ureg;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Return RUR for " ureg " and all each ureg before it and select logic
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
Static tie_t*
Tie_program_rur_semantic_recur (ls_handle_t*ureg_handle)
{
Tie_t*and, * node, * or, * rep;
Node=tie_program_rur_semantic_recur (handle);
Tie_append_child (assign, node);
ls_free(ureg_list);
Tie_program_add (program, semantic);
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
All members of " ureg " are sent into " list "
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_ureg_exp_get_components (tie_t*exp, ls_t*list)
{
Tie_t*child;
If (tie_get_type (exp)==TIE_ID)
Ls_prepend (list, exp);
}
Tie_foreach_child (exp, child)
Tie_ureg_exp_get_components (child, list);
}end_tie_foreach_child;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Take a status list and be sent to ur mapping
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_state_list_insert (ls_t*list, ureg_t*ur)
{
Ureg_t*item;
Ls_handle_t*handle;
Handle=0;
Ls_foreach_handle (list, handle)
Item=(ureg_t*) ls_handle_get_data (handle);
if(item->statef<ur->statet){
break;
}
}end_ls_forea_handle;
If (handle==0)
Ls_append (list, ur);
}else{
Ls_insert_before (handle, ur);
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Take a status list and be sent to ur mapping
Tie_t*ureg=(tie_t*) ls_handle_get_data (ureg_handle);
Ls_handle_t*ureg_next;
char sname[80];
And=tie_alloc (TIE_BITWISE_AND);
Rep=tie_alloc (TIE_REPLICATION);
Tie_append_child (and, rep);
Tie_append_child (rep, tie_create_integer (32));
Sprintf (sname, " ureg_sel_%d ", tie_ureg_get_index (ureg));
Tie_append_child (rep, tie_create_identifier (sname));
Tie_append_child (and, tie_dup (tie_ureg_get_expression (ureg)));
Ureg_next=ls_handle_get_next_handle (ureg_handle);
If (ureg_next==0)
return and;
}else{
Node=tie_program_rur_semantic_recur (ureg_next);
Or=tie_alloc (TIE_BITWISE_OR);
Tie_append_child (or, and);
Tie_append_child (or, node);
return or;
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce RUR semantic chunk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_rur_semantic (tie_t*program)
{
Tie_t*ureg, * semantic, * ilist, * statement, * assign, * node;
Ls_t*ureg_list;
Ls_handle_t*handle;
Semantic=tie_alloc (TIE_SEMANTIC);
Tie_append_child (semantic, tie_create_identifier (" rur "));
Ilist=tie_alloc (TIE_INST_LIST);
Tie_append_child (ilist, tie_create_identifier (" RUR "));
Tie_append_child (semantic, ilist);
Statement=tie_alloc (TIE_STATEMENT);
Tie_append_child (semantic, statement);
Tie_program_generate_selection_signals (program, statement, " st ");
Assign=tie alloc (TIE_ASSIGNMENT);
Tie_append_child (statement, assign);
Tie_append_child (assign, tie_create_identifier (" arr "));
Ureg_list=ls_alloc ();
Tie_program_foreach_ureg (program, ureg)
Ls_append (ureg_list, ureg);
}end_tie_program_foreach_ureg;
Handle=ls_get_first_handle (ureg_list);
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_state_get_ur_mapping (tie_t*prog, tie_t*state, tie_t*ureg, ls_t
* list)
{
Tie_t*exp, * child, * s, * id;
Int num, uregf, uregt, statef, statet;
Ls_t*id_list;
Char*sname, * iname;
Ureg_t*ur;
Exp=tie_ureg_get_expression (ureg);
Num=tie_ureg_get_index (ureg);
Sname=tie_state_get_name (state);
Id_list=ls_alloc ();
Tie_ureg_exp_get_components (exp, id_list);
Uregt=uregf=-1;
Ls_foreach_data (tie_t*, id_list, id)
Iname=tie_get_identifier (id);
Child=tie_get_first_child (id);
/ * compute the next uregf and uregt*/
If (child==0)
S=tie_program_get_state_by_name (prog, iname);
ASSERT(s!=0);
Statet=0;
Statef=tie_state_get_width (s)-1;
}else{
Statef=tie_get_integer (child);
Child=tie_get_next_sibling (child);
If (child==0)
Statet=statef;
}else{
Statet=tie_get_integer (child);
}
}
Uregt=uregf+1;
Uregf=uregt+ (statef-statet);
If (strcmp (iname, sname)==0)
Ur=ALLOC (ureg-t, 1);
Ur-> statef=statef;
Ur-> statet=statet;
Ur-> uregf=uregf;
Ur-> uregt=uregt;
Ur-> ureg=num;
Ur-> name=" art ";
Tie_state_list_insert (list, ur);
}
}end_ls_foreach data;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Space is filled in state-to-ur mapping table
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_state_fill_gap (tie_t*state, ls_t*list)
{
Int width, statet, statef;
Ls_handle_t*handle;
Ureg_t*ur, * gap;
Char*name;
Width=tie_state_get_width (state);
Name=tie_state_get_name (state);
Statet=statef=width;
Ls_foreach_handle (list, handle)
Ur=(ureg_t*) ls_handle_get_data (handle);
if(ur->statef<(statet-1)){
Gap=ALLOC (ureg_t, 1);
Gap-> statef=statet-1;
Gap-> statet=ur-> statef+1;
Gap-> uregf=gap-> uregt=gap-> ureg=-1;
Gap-> name=0;
Ls_insert_before (handle, gap);
}
Statet=ur-> statet;
Statef=ur-> statef;
}end_ls_foreach_handle;
Handle=ls_get_last_handle (list);
Ur=(ureg_t*) ls_handle_get_data (handle);
if(ur->statet>0){
Gap=ALLOC (ureg_t, 1);
Gap-> statef=ur-> statet-1;
Gap-> statet=0;
Gap-> uregf=gap-> uregt=gap-> ureg=-1;
Gap-> name=0;
Ls_insert_after (handle, gap);
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce WUR semantic chunk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
static void
Tie_program_generate_wur_semantic (tie_t*program)
{
Tie_t*ureg, * semantic, * ilist, * statement, * assign, * cond;
Tie_t*state, * concat, * id;
Ureg_t*ur;
Char*sname, selname [80];
Ls_t*list;
Semantic=tie_alloc (TIE_SEMANTIC);
Tie_append_child (program, semantic);
Tie_append_child (semantic, tie_create_identifier (" wur "));
Ilist=tie_alloc (TIE_INST_LIST);
Tie_append_child (ilist, tie_create_identifier (" WUR "));
Tie_append_child (semantic, ilist);
Statement=tie_alloc (TIE_STATEMENT);
Tie_append_child (semantic, statement);
Tie_program_generate_selection_signals (program, statement, " sr ");
Tie_program_foreach_state (program, state)
if(tie_get_predefined(state))continue;
Sname=tie_state_get_name (state);
List=ls_alloc ();
Tie_program_foreach_ureg (program, ureg)
Tie_state_get_ur_mapping (program, state, ureg, list);
}end_tie_program_foreach_ureg;
Tie_state_fill_gap (state, list);
Assign=tie_alloc (TIE_ASSIGNMENT);
Tie_append_child (statement, assign);
Tie_append_child (assign, tie_create_identifier (sname));
Concat=tie_alloc (TIE_CONCATENATION);
Tie_append_child (assign, concat);
Ls_foreach_data (ureg_t*, list, ur)
if(ur_>name!=0)
Cond=tie_alloc (TIE_CONDITIONAL);
Tie_append_child (concat, cond);
Sprintf (selname, " ureg_sel_%d ", ur_ > ureg);
Id=tie_create_identifier (selname);
Tie_append_child (cond, id);
Id=tie_create_identifier (ur_ > name);
Tie_append_child (id, tie_create_integer (ur-> uregf));
Tie_append_child (id, tie_create_integer (ur-> uregt));
Tie_appemd_-child (cond, id);
Id=tie_create_identifier (sname);
Tie_append_child (id, tie_create_integer (ur-> statef));
Tie_append_child (id, tie_create_integer (ur-> statet));
Tie_append_child (cond, id);
}else{
Id=tie_create_identifier (sname);
Tie_append_child (id, tie_create_integer (ur-> statef));
Tie_append_child (id, tie_create_integer (ur-> statet));
Tie_append_child (concat, id);
}
}end_ls_foreach_data;
ls_free(list);
}end_tie_program_foreach_state;
}
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * *
Produce WUR semantic chunk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * */
void
tie_program_generate_rurwur(tie_t*program)
{
Tie_t*ureg;
Int-num=0;
Tie_program_foreach_ureg (program, ureg)
num++;
}end_tie_program_foreach_ureg;
If (num==0)
return;
}
tie_program_generate_st_field(program);
tie_program_generate_rur_opcode(program);
tie_program_generate_wur_-opcode(program);
tie_program_generate_rur_iclass(program);
tie_program_generate_wur_iclass(program);
tie_program_generate_rur_semantic(program);
tie_program_generate_wur_semantic(program);
}
Adnexa G
150
//define a new opcode for BYTESWAP based on
// -a predefined instruction field op2
// -a predefined opcode CUSTO
//refer to Xtensa ISA manual for descriptions of op2 and CUSTO
Opcode BYTESWAP op2=4 ' b0000 CUSTO
//declare a state ACC used to accumulate byte-swapped data
state ACC 32
//declare a mode bit SWAP to control the swap
//use " RUR ar, 0 " and " WUR ar, 0 " to move data between AR and ACC
user_register 0 ACC
//use " RUR ar, 1 " and " WUR ar, 1 " to move data between AR and SWAP
//define a new instruction class that
// -reads data from ars(predefined to be AR[s])
// -uses and writes state ACC
// -uses state SWAP
Iclass bs{BYTESWAP}{in ars}{inout ACC, in SWAP}
//semantic definition of byteswap
// Accumulates to ACC the byte-swapped ars(AR[s])or
// ars depending on the SWAP bit
semantic bs{BYTESWAP}{
Wire [31:0] ars_swap={ars [7:0], ars [15:8], ars [23:16], ars [31:24] };
Assign ACC=ACC+ (SWAP?Ars_swap:ars);
}
Adnexa H
#define PARAMS(_arg)_arg
typedef signed int int32_t;
typedef unsigned int u_int32_t;
Typedef void* xtensa_isa;
Typedef void* xtensa_operand;
typedef int xtensa_opcode;
#define XTENSA_UNDEFINED-1
typedef u_int32_t xtensa_insnbuf_word;
Typedef xtensa insnbuf_word * xtensa insnbuf;
typedef enum{
Xtensa_encode_result_ok,
Xtensa_encode_result_align,
Xtensa_encode_result_not_in_table,
Xtensa_encode_result_too_low,
Xtensa_encode_result_too_high,
xtensa_encode_result_not_ok
}xtensa_encode_result;
Typedef u_int32_t (* xtensa_immed_decode_fn) PARAMS ((u_int32_t val));
Typedef xtensa_encode_result (* xtensa_immed_encode_fn)
PARAMS ((u_int32_t*valp));
Typedef u_int32_t (* xtensa_get_field_fn) PARAMS ((const xtensa_insnbuf
insn));
Typedef void (* xtensa_set_field_fn) PARAMS ((xtensa_insnbuf insn,
u_int32_t val));
Typedef int (* xtensa_insn_decode_fn) PARAMS ((const xtensa_insnbuf
insn));
typedef struct xtensa_operand_internal_struct{
char operand_kind;
char inout;
xtensa_get_field_fn get_field;
xtensa_set_field_fn set_field;
xtensa_immed_encode_fn encode;
xtensa_immed_decode_fn decode;
}xtensa_operand_internal;
typede fstruct xtensa_iclass_internal_struct{
int num_operands;
Xtensa_operand_internal**operands;
}xtensa_iclass_internal;
typedef struct xtensa_opcode_internal_struct{
Const char*name;
int length;
xtensa_insnbuf encoding_template;
Xtensa_iclass_internal*iclass;
}xtensa_opcode_internal;
typedef structopname_lookup_entry_struct{
Const char*key;
xtensa_opcode opcode;
}opname_lookup_entry;
typedef struct xtensa_isa_internal_struct{
int insn_size;
int insnbuf_size;
int num_opcodes;
Xtensa_opcode_internal**opcode_table;
int num_modules;
Int*module_opcode base;
Xtensa_insn_decode_fn*module_decode_fn;
Opname_lookup_entry*opname_lookup_table;
}xtensa_isa_internal;
externu_int32_tget_r_field(const xtensa_insnbuf insn);
Extern void set_r_field (xtensa_insnbuf insn, u_int32_t val);
extern u_int32_t get_s_field(const xtensa_insnbuf insn);
Extern void set_s_field (xtensa_insnbuf insn, u_int32_t val);
extern u_int32_t get_sr_field(const xtensa_insnbuf insn);
Extern void set_sr_field (xtensa_insnbuf insn, u_int32_t val);
extern u_int32_t get_t_field(const xtensa_insnbuf insn);
Extern void set_t_field (xtensa_insnbuf insn, u_int32_t val);
Extern xtensa_encode_result encode_r (u_int32_t*valp);
extern u_int32_t decode_r(u_int32_t val);
Extern xtensa_encode_result encode_s (u_int32_t*valp);
extern u_int32_t decode_s(u_int32_t val);
Extern xtensa_encode_result encode_sr (u_int32_t*valp);
extern u_int32_t decode_sr(u_int32_t val);
Extern xtensa_encode_result encode_t (u_int32_t*valp);
extern u_int32_t decode_t(u_int32_tval);
static u_int32t get_st_field(insn)
const xtensa_insnbuf insn;
{
u_int32_t temp;
Temp=0;
Temp |=((insn [0] & 0 × f00)>>8)<<4;
Temp |=((insn [0] & 0 × f0)>>4)<<0;
return temp;
}
Static void set_st_field (insn, val)
xtensa_insnbuf insn;u_int32_tval;
{
Insn [0]=(insn [0] & 0 × fffff0ff) | ((val & 0 × f0) < < 8);
Insn [0]=(insn [0] & 0 × ffffff0f) | ((val & 0 × f) < < 4);
}
static u_int32t decode_st(u_int32_t val)
{
return val;
}
Static xtensa_encode_result encode_st (u_int32_t*valp)
{
If ((* valp > > 8)!=0)
return xtensa_encode_result_too_high;
}else{
return xtensa_encode_result_ok;
}
}
Static xtensa_operand_internal aor_operand={
' a ',
' > ',
Get_r_field,
Set_r_field,
Encode_r,
decode_r
};
Static xtensa_operand_internal ais_operand={
' a ',
' < ',
Get_s_field,
Set_s_field,
Encode_s,
decode_s
};
Static xtensa_operand_internal ait_operand={
' a ',
' < ',
Get_t_field,
Set_t_field,
Encode_t,
decode_t
};
Static xtensa_operand_internal iisr_operand={
' i ',
' < ',
Get_sr_field,
Set_sr_field,
Encode_sr,
decode_sr
};
Static xtensa_operand_internal iist_operand={
' i ',
' < ',
Get_st_field,
Set_st_field,
Encode_st,
decode_st
};
Static xtensa_operand_internal*bs_operand_list []=
&ais_operand
};
Static xtensa_iclass_internal bs_iclass={
1,
&bs_operand_list[0]
};
Static xtensa_operand_internal*rur_operand_list []=(
&aor_operand,
&iist_operand
};
Static xtensa_iclass_internal rur_iclass={
2,
&rur_operand_list[0]
};
Static xtensa_operand_internal*wur_operand_list []=
&ait_operand,
&iisr_operand
};
Static xtensa_iclass_internal wur_iclass={
2,
&wur_operand_list[0]
};
Static xtensa_insnbuf_word BYTESWAP_template []={ 0x60000};
Static xtensa_opcode_internal BYTESWAP_opcode={
" byteswap ",
3,
&BYTESWAP_template [0],
&bs_iclass
};
Static xtensa_insnbuf_word RUR_template []={ 0xe30000};
Static xtensa_opcode_internal RUR_opcode={
" rur ",
3,
&RUR_template [0],
&rur_iclass
};
Static xtensa_insnbuf_word WUR_template []={ 0xf30000};
Static xtensa_opcode_internal WUR_opcode={
" wur ",
3,
&WUR_template [0],
&wur_iclass
};
Static xtensa_opcode_internal*opcodes []=
&BYTESWAP_opcode,
&RUR_opcode,
&WUR_opcode
};
Xtensa_opcode_internal**get_opcodes () { return & opcodes [0];}
const int get_num_opcodes(){return3;}
#define xtensa_BYTESWAP_op 0
#define xtensa_RUR_op1
#define xtensa_WUR_op2
int decode_insn(const xtensa_insnbuf insn)
{
If ((insn [0] & 0 × ff000f)==0 × 60000) return xtensa_BYTESWAP_op;
If ((insn [0] & 0 × ff000f)==0 × e30000) return xtensa_RUR_op;
If ((insn [0] & 0 × ff000f)==0 × f30000) return xtensa_WUR_-op;
return XTENSA_UNDEFINED;
}
Adnexa I
typedef unsigned u32;
typedef struct u64str{unsigned int lo;unsigned int hi;}u64;
extern u32 state32(inti);
extern u64 state64(inti);
Extern void set_state32 (int i, u32v);
Extern void set_state64 (int i, u64v);
Extern void set_ar (int i, u32v);
extern u32 ar(int i);
extern void pc_incr(int i);
extern int au×32_fetchfirst(void);
extern void pipe_use_ifetch(intn);
extern void pipe_use_dcache(void);
extern void pipe_def_ifetch(int n);
extern int arcode(void);
Extern void pipe_use (int n, int v, int i);
Extern void pipe_def (int n, int v, int i);
struct state_tbl_entry{
Const char * name;
int numbits;
};
#define STATE_ACC 0
#define STATE_SWAP 1
#define NUM_STATES 2
Struct state_tbl_entrylocal_state_tbl [NUM_STATES+1]=
" ACC ", 32},
" SWAP ", 1},
{ " ", 0}
};
Extern " C " structstate_tbl_entry*get_state_tbl (void);
Structstate_tbl_entry*get_state_tbl (void)
{
return & local_state_tbl[0];
}
/ * constant table ai4const */
Static const unsigned CONST_TBL_AI4CONST []=
0 × ffffffff,
0 × 1,
0 × 2,
0 × 3,
0 × 4,
0 × 5,
0 × 6,
0 × 7,
0 × 8,
0 × 9,
0 × a,
0 × b,
0 × c,
0 × d,
0 × e,
0×f
};
/ * constant table b4const*/
Static const unsigned CONST_TBL_B4CONST []=
0 × ffffffff,
0 × l,
0 × 2,
0 × 3,
0 × 4,
0 × 5,
0 × 6,
0 × 7,
0 × 8,
0 × a,
0 × c,
0 × 10,
0 × 20,
0 × 40,
0 × 80,
0×100
};
/ * constant table b4constu*/
Static const unsigned CONST_TBL_B4CONSTU []=
0 × 8000,
0 × 10000,
0 × 2,
0 × 3,
0 × 4,
0 × 5,
0 × 6,
0 × 7,
0 × 8,
0 × a,
0 × c,
0 × 10,
0 × 20,
0 × 40,
0 × 80,
0×100
};
/ * constant table d01tab*/
Static const unsigned CONST_TBL_D01TAB []=
0,
0×1
};
/ * constanttable d23tab*/
Static const unsigned CONST_TBL_D23TAB []=
0 × 2,
0×3
};
/ * constant table i4plconst*/
Static const unsigned CONST_TBL_I4P1CONST []=
0 × 1,
0 × 2,
0 × 3,
0 × 4,
0 × 5,
0 × 6,
0 × 7,
0 × 8,
0 × 9,
0 × a,
0 × b,
0 × c,
0 × d,
0 × e,
0 × f,
0×10
};
/ * constant table mip32const*/
Static const unsigned CONST_TBL_MI P32CONST []=
0 × 20,
0 × 1f,
0 × 1e,
0 × 1d,
0 × 1c,
0 × 1b,
0 × 1a,
0 × 19,
0 × 18,
0 × 17,
0 × 16,
0 × 15,
0 × 14,
0 × 13,
0 × 12,
0 × 11,
0 × 10,
0 × f,
0 × e,
0 × d,
0 × c,
0 × b,
0 × a,
0 × 9,
0 × 8,
0 × 7,
0 × 6,
0 × 5,
0 × 4,
0 × 3,
0 × 2,
0×1
};
void
BYTESWAP_func (u32_OPND0_, u32_OPND1_, u32_OPND_2_, u32_OPND3)
{
Unsigned ars=ar (_ OPND0_);
U32 ACC=state32 (STATE_ACC);
U32S WAP=state32 (STATE_SWAP);
unsigned_tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
unsigned ACC_ns;
unsigned ars_swap;
SWAP_ps=SWAP;
ACC_ps=ACC;
Ars_swap=(((ars & 0 × ff))<<24) | ((((ars>>8) & 0 × ff))<<
16)|((((ars>>16) & 0×ff))<<8)|(((ars>>24) & 0×ff));
if(SWAP_ps){
_ tmp0=ars_swap;
)else{
_ tmp0=ars;
}
ACC_ns=ACC_ps+_tmp0;
ACC=ACC_ns;
Set_state32 (STATE_ACC, ACC);
pc_incr(3);
}
void
RUR_func (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)
{
unsigned arr;
Unsigned st=_OPND1_;
U32 ACC=state32 (STATE_ACC);
U32 SWAP=state32 (STATE_SWAP);
unsigned_tmp1;
unsigned_tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
SWAP_ps=SWAP;
ACC_ps=ACC;
If (st==1)
_ tmp0=SWAP_ps;
}else{
_ tmp0=0;
}
If (st==0)
_ tmp1=ACC_ps;
}else{
_ tmp1=_tmp0;
}
Arr=_tmp1;
Set_ar (_ OPND0_, arr);
pc_incr(3);
}
void
WUR_func (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)
{
Unsigned art=ar (_ OPND0_);
Unsigned sr=_OPND1_;
U32 ACC=state32 (STATE_ACC);
U32 SWAP=state32 (STATE_SWAP);
unsigned _tmp1;
unsigned _tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
unsigned SWAP_ns;
unsigned ACC_ns;
unsigned ureg_sel_0;
unsigned ureg_sel_1;
SWAP_ps=SWAP;
ACC_ps=ACC;
Ureg_sel_0=sr==0;
Ureg_sel_1=sr==1;
if(ureg_sel_0){
_ tmp0=art;
}else{
_ tmp0=ACC_ps;
}
ACC_ns=_tmp0;
if(ureg_sel_1){
_ tmp1=(art & 0 × 1);
}else{
_ tmp1=(SWAP_ps & 0 × 1);
}
SWAP_ns=_tmp1;
ACC=ACC_ns;
SWAP=SWAP_ns;
Set_state32 (STATE_ACC, ACC);
Set_state32 (STATE_SWAP, SWAP);
pc_incr(3);
}
Void BYTESWAP_sched (u32 op0, u32 op1, u32 op2, u32 op3)
{
int ff;
int cond;
Ff=au × 32_fetchfirst ();
if(ff){
pipe_use_ifetch(3);
}
Pipe_use (arcode (), op0,1);
if(!ff){
pipe_use_ifetch(3);
}
pipe_use_dcache();
pipe_def_ifetch(-1);
}
Void RUR_sched (u32 op0, u32 op1, u32 op2, u32 op3)
{
int ff;
int cond;
Ff=au × 32 fetchfirst ();
if(ff){
pipe_use_ifetch (3);
}
if(!ff){
pipe_use_ifetch (3);
}
pipe_use_dcache ();
Pipe_def (arcode (), op0,2);
pipe_def_ifetch (-1);
}
Void WUR_sched (u32 op0, u32 op1, u32 op2, u32 op3)
{
int ff;
int cond;
Ff=au × 32_fetchfirst ();
if (ff){
pipe_use_i fetch (3);
}
Pipe_use (arcode (), op0,1);
if(!ff){
pipe_use_ifetch (3);
}
pipe_use_dcache ();
pipe_def_ifetch (-1);
}
Typedef void (SEMFUNC) (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32
_OPND3_);
struct isafunc_tbl_entry{
Const char * opname;
SEMFUNC * semfn;
SEMFUNC * schedfn;
};
Static struct isafunc_tbl_entrylocal_fptr_tbl []=
" byteswap ", BYTESWAP_func, BYTESWAP_sched},
" rur ", RUR_func, RUR_sched},
" wur ", WUR_func, WUR_sched},
" ", 0,0}
};
Extern " C " struct isafunc_tbl_entry*get_isafunc_tbl (void);
Struct isafunc_tbl_entry * get_isafunc_tbl (void)
{
return & local_fptr_tbl[0];
}
Adnexa J
/ * does not modify.This automatically generates.*/
#define BYTESWA
P(ars)\
(asm volatile (" BYTESWAP % 0 ":: " a " (ars));})
#define RUR(st)\
({int arr;Asm volatile (" RUR %0, %1 ": "=a " (arr): " i " (st));
arr;})
#define WUR (art, sr)
(asm volatile (" WUR % 0, %1 ":: " a " (art), " i " (sr))
Adnexa K
#ifdef TIE_DEBUG
#define BYTESWAP TIE_BYTESWAP
#define RUR TIE_RUR
#define WUR TIE_WUR
#endif
typedef unsigned u32;
#define STATE32_ACC 0
#define STATE_ACC STATE32_ACC
#define STATE32_SWAP 1
#define STATE_SWAP STATE32_SWAP
#define NUM_STATE32 2
static u32 state32table[NUM_STATE32];
Static char*state32_name_table [NUM_STATE32]=
" ACC ",
″SWAP″
};
static u32 state32(int rn){return state32_table[rn];}
Static void set_state32 (int rn, u32s) { state32_table [rn]=s;}
static int num_state32(void){return NUM_STATE32;}
Static char*state32_name (int rn) { return state32_name_table [rn];}
void
BYTESWAP(unsigned ars)
{
U32 ACC=state32 (STATE_ACC);
U32 SWAP=state32 (STATE_SWAP);
unsigned_tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
unsigned ACC_ns;
unsigned ars_swap;
SWAP_ps=SWAP;
ACC_ps=ACC;
Ars_swap=(((ars & 0 × ff))<<24) | ((((ars>>8) & 0 × ff))<<
16)|((((ars>>16)& 0×ff))<<8)|(((ars>>24)& 0×ff));
if(SWAP_ps){
_ tmp0=ars_swap;
}else{
_ tmp0=ars;
}
ACC_ns=ACC_ps+_tmp0;
ACC=ACC_ns;
Set_state32 (STATE_ACC, ACC);
}
unsigned
RUR(unsigned st)
{
unsigned arr;
U32 ACC=state32 (STATE_ACC);
U32 SWAP=state32 (STATE_SWAP);
unsigned_tmp1;
unsigned_tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
SWAP_ps=SWAP;
ACC_ps=ACC;
If (st==1)
Tmp0=SWAP_ps;
}else{
Tmp0=0;
}
If (st==0)
_ tmp1=ACC_ps;
}else{
_ tmp1=tmp0;
}
Arr=_tmp1;
return arr;
}
void
WUR (unsigned art, unsigned sr)
{
U32 ACC=state32 (STATE_ACC);
U32 SWAP=state32 (STATE_SWAP);
unsigned_tmp1;
unsigned_tmp0;
unsigned SWAP_ps;
unsigned ACC_ps;
unsigned SWAP_ns;
unsigned ACC_ns;
unsigned ureg_sel_0;
unsigned ureg_sel_1;
SWAP_ps=SWAP;
ACC_ps=ACC;
Ureg_sel_0=sr==0;
Ureg_sel_1=sr==1;
if(ureg_sel_0){
Tmp0=art;
}else{
_ tmp0=ACC_ps;
}
ACC_ns=_tmp0;
if(ureg_sel_1){
Tmp1=(art& 0 × 1);
}else{
_ tmp1=(SWAP_ps & 0 × 1);
}
SWAP_ns=_tmp1;
ACC=ACC_ns;
SWAP=SWAP_ns;
Set_state32 (STATE_ACC, ACC);
Set_state32 (STATE_SWAP, SWAP);
}
#ifdef TIE_DEBUG
#unde fBYTESWAP
#undef RUR
#undef WUR
#endif
Adnexa L
//Do not modify this automatically generated file.
Module tie_enflop (tie_out, tie_in, en, clk);
Parameter size=32;
output [size-1∶0] tie_out;
input [size-1∶0] tie_in;
input en;
input clk;
reg[size-1∶0] tmp;
Assign tie_out=tmp;
always @(p@osedge clk) begin
if (en)
Tmp≤#1tie_in;
end
endmodule
Module tie_flop (tie_out, tie_in, clk);
Parameter size=32;
output [size-1∶0] tie_out;
input [size-1∶0] tie_in;
input clk;
reg [size-1∶0] tmp;
Assign tie_out=tmp;
always @(posedge clk) begin
Tmp≤#1 tie_in;
end
endmodule
Module tie_athens_state (ns, we, ke, kp, vw, clk, ps);
Parameter size=32;
input[size-1∶0]ns; //next state
input we; //write enable
input ke; //Kill E state
input kp; //Kill Pipeline
input vw; //Valid W state
input clk; //clock
output [size-1∶0] ps;//presentstate
wire [size-1∶0] se; //state at E stage
wire[size-1∶0]sm; //state at M stage
wire[size-1∶0]sw; //state at W stage
wire[size-1∶0]sx; //state at X stage
wire ee; // write enable for EM register
wire ew; // write enable for WX register
Assign se=kp?Sx: ns;
Assign ee=kp l we &~ke;
Assign ew=vw &~kp;
Assign ps=sm;
Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),
.clk(clk));
Tie_flop # (size) state_MW (.tie_out (sw) .tie_in (sm) .clk (clk));
Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),
.clk(clk));
endmodule
Module bs (ars, ACC_ps, SWAP_ps, ACC_ns, ACC_we, BYTESWAP);
input [31∶0] ars;
input [31∶0] ACC_ps;
input [0∶0] SWAP_ps;
output [31∶0] ACC_ns;
output ACC_we;
input BYTESWAP;
wire [31∶0] ars_swap;
Assign ars_swap={ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] };
Assign ACC_ns=(ACC_ps)+((SWAP_ps)?(ars_swap): (ars));
Assign ACC_we=1 ' b1 & BYTESWAP;
endmodule
Module rur (arr, st, ACC_ps, SWAP_ps, RUR);
output [31∶0] arr;
input [31∶0] st;
input [31∶0] ACC_ps;
input [0∶0] SWAP_ps;
input RUR;
Assign arr=((st)==(8 ' d0))?(ACC_ps): (((st)==(8 ' d1))?
(SWAP_ps): (32 ' b0));
endmodule
Module wur (art, sr, ACC_ps, SWAP_ps, ACC_ns, ACC_we, SWAP_ns, SWAP_we, WUR);
input [31∶0] art;
input [31∶0] sr;
input [31∶0] ACC_ps;
input [0∶0] SWAP_ps;
output [31∶0] ACC_ns;
output ACC_we;
output [0∶0] SWAP_ns;
output SWAP_we;
input WUR;
wire ureg_sel_0;
Assign ureg_sel_O=(sr)==(8 ' h0);
wire ureg_sel_1;
Assign ureg_sel_1=(sr)==(8 ' h1);
Assign ACC_ns={ (ureg_sel_0)?(art [31: 0]): (ACC_ps [31: 0]) };
Assign SWAP_ns={ (ureg_sel_1)?(art [0: 0]): (SWAP_ps [0: 0]) };
Assign ACC_we=1 ' b1 & WUR;
Assign SWAP_we=1 ' b1 & WUR;
endmodule
Module UserInstModule (clk, out_E, ars_E, art_E, inst_R, Kill_E,
KillPipe_W, valid_W, BYTESWAP_R, RUR_R, WUR_R, en_R);
input clk;
output [31∶0] out_E;
input [31∶0] ars_E;
input [31∶0] art_E;
input [23∶0] inst_R;
input en_R;
Input Kill_E, killPipe_W, valid_W;
input BYTESWAP_R;
input RUR_R;
input WUR_R;
wire BYTESWAP_E;
wire RUR_E;
wire WUR_E;
wire [31∶0]arr_E;
Wire [31: 0] sr_R, sr_E;
Wire [31: 0] st_R, st_E;
Wire [31: 0] ACC_ps, ACC_ns;
wire ACC_we;
Wire [0: 0] SWAP_ps, SWAP_ns;
wire SWAP_we;
wire [31∶0]bs_ACC_ns;
wire bs_ACC_we;
wire bs_select;
wire [31∶0]rur_arr;
wire rur_select;
wire [31∶0]wur_ACC_ns;
wire wur_ACC_we;
wire [0∶0]wur_SWAP_ns;
wire wur_SWAP_we;
wire wur_select;
Tie_enflop# (1) fBYTESWAP (.tie_out (BYTESWAP_E) .tie_in (BYTESWAP_R),
.en (en_R) .clk (clk));
Tie_enflop# (1) fRUR (.tie_out (RUR_E) .tie_in (RUR_R) .en (en_R),
.clk(clk));
Tie_enflop# (1) fWUR (.tie_out (WUR_E) .tie_in (WUR_R) .en (en_R),
.clk(clk));
Assign sr_R={{inst_R [11: 8] }, { inst_R [15: 12] } };
Tie_enflop# (32) fsr (.tie_out (sr_E) .tie_in (sr_R) .en (en_R),
.clk(clk));
Assign st_R={{inst_R [11: 8] }, { inst_R [7: 4] } };
Tie_enflop# (32) fst (.tie_out (st_E) .tie_in (st_R) .en (en_R),
.clk(clk));
bs ibs(
.ars (ars_E),
.ACC_ps (ACC_ps),
.SWAP_ps (SWAP_ps),
.ACC_ns (bs_ACC_ns),
.ACC_we (bs_ACC_we),
.BYTESWAP(BYTESWAP_E));
rur irur(
.arr (rur_arr),
.st (st_E),
.ACC_ps (ACC_ps),
.SWAP_ps (SWAP_ps),
.RUR(RUR_E));
wur iwur(
.art (art_E),
.sr (sr_E),
.ACC_ps (ACC_ps),
.SWAP_ps (SWAP_ps),
.ACC_ns (wur_ACC_ns),
.ACC_we (wur_ACC_we),
.SWAP_ns (wur_SWAP_ns),
.SWAP_we (wur_SWAP_we),
.WUR(WUR_E));
tie_athens_state#(32)iACC(
.ns (ACC_ns),
.we (ACC_we),
.ke (Kill_E),
.kp (killPipe_W),
.vw (valid_W),
.clk (clk),
.ps(ACC_ps));
tie_athens_state#(1)iSWAP(
.ns (SWAP_ns),
.we (SWAP_we),
.ke (Kill_E),
.kp (killPipe_W),
.vw (valid_W),
.clk (clk),
.ps(SWAP_ps));
Assign bs_select=BYTESWAP_E;
Assign rur_select=RUR_E;
Assign wur_select=WUR_E;
Assign arr_E={32{1 ' b0}} & { 32{bs_select}}
| rur_arr & {32{rur_select}}
| {32{1′b0}} & {32{wur_select}};
Assign out_E=arr_E;
Assign ACC_ns=bs_ACC_ns&{32{bs_select}}
| {32{1′b0}} & {32{rur_select}}
| wur_ACC_ns & {32{wur_select}};
Assign ACC_we=bs_ACC_we & bs_select
|1′b0 & rur_select
|wur_ACC_we & wur_select;
Assign SWAP_ns={1{1 ' b0}} & { 1{bs_select}}
| {1{1′b0}} & {1{rur_select}}
| wur_SWAP_ns & {1{wur_select}};
Assign SWAP_we=1 ' b0 & bs_select
|1′b0 & rur_select
|wur_SWAP_we & wur_select;
endmodule
Adnexa M
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
You need to insert the information of necessity for this part
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*/
/ * Set the search path to include the library directories*/
SYNOPSYS=get_unix_variable (" SYNOPSYS ")
Search_path=SYNOPSYS+/libraries/syn
/ * Set the path and name of target library*/
Search_path=<...>+search_path
Target_library=<nameofthelibrary>
/ * Constraint information*/
OPERATING_CONDITION=<name of the operating condition>
WIRE_LOAD=<name of the wire-load model>
BOUNDARY_LOAD=<library name>/<smallest inverter name>/<input pin
name>
DRIVE_CELL=<alargeFF name>
DRIVE_PIN=<Q pin name of the FF>
DRIVE_PIN_FROM=<clock pin name of the FF>
/ * target rocessor clock period*/
CLOCK_PERIOD=<target clock period>
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*
You need not make any change below
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*/
Link_library={ " * " }+target_library
Symbol_library=generic.sdb
/ * prepare workdir for hdl compiler*/
Hdlin_auto_save_templates=" TRUE "
define_design_lib WORK-path workdir
sh mkdir -p workdir
read -f verilog./prim.v
read -f erilog./ROOT.v
current_design UserInstModule
link
set_operating_conditions OPERATING_CONDITION
set_wire_load WIRE_LOAD
create_clock clk-period CLOCK_PERIOD
set_dont_touch_network clk
Set_load{2*load_of (BOUNDARY_LOAD) } all_outputs ()
Set_load{2*load_of (BOUNDARY_LOAD) } all_inputs ()
set_driving_cell-cellDRIVE_CELL-pin DRIVE_PIN-from_pin
DRIVE_PIN_FROM all_inputs()
Set_max_delay 0.5*CLOCK_PERIOD-from all_inputs ()-to find (clock,
clk)
Set_max_delay 0.5*CLOCK_PERIOD-from find (clock, clk)-to
all_outputs()
Set_max_delay 0.5*CLOCK_PERIOD-from all_inputs ()-toall_outputs ()
set_drive-rise O clk
set_drive-fall O clk
compile-ungroup_all
report_timing
report_constraint-all_viol
report_area
Claims (134)
1., for designing a system for configurable processor, this system includes:
For producing the device of the description of the hardware embodiments of processor based on configuration instruction, wherein said configuration instruction bag
Include: for determining whether some feature is included binary system within a processor and selects part and for certain of given processor
The parameter of the parameter of a little predetermined characteristic selects part;And
For producing the device of the special SDK of this hardware embodiments based on configuration instruction,
Wherein, configuration instruction includes at least one extension explanation of the expansible characteristic of processor, and this extension explanation appointment is included in
Article one, user-defined instruction and a kind of embodiment for this instruction.
System the most according to claim 1, wherein, described SDK runs on this processor for generation
Code.
System the most according to claim 1, wherein, SDK includes one section of compiler, and it is suitable for configuration
Illustrate, for application being compiled as the code that can be performed by processor.
System the most according to claim 1, wherein, SDK includes a paragraph assembly program, and it is adapted to configuration
Illustrate, for application is collected as the code that can be performed by processor.
System the most according to claim 1, wherein, SDK includes a segment linker, and it is adapted to configuration
Illustrate, for connecting the code that can be performed by processor.
System the most according to claim 1, wherein, SDK includes one section of decompiler, and it is adapted to join
Put explanation, for the code that can be performed by processor is carried out dis-assembling.
System the most according to claim 1, wherein, SDK includes one section of debugging routine, and it is adapted to configuration
Illustrate, for the code that can be performed by processor is debugged.
System the most according to claim 7, wherein, debugging routine has common interface and configuration, for instruction-set simulation
Program and hardware embodiments.
System the most according to claim 1, wherein, SDK includes one section of instruction-set simulation program, and it adapts to
In configuration instruction, for the code that can be performed by processor is simulated.
System the most according to claim 9, wherein, instruction-set simulation program can simulate the execution of the code being modeled,
In order to measure the one or more performance specifications within the execution cycle.
11. systems according to claim 10, wherein, performance specification spy based on specific configurable microarchitecture
Levy.
12. systems according to claim 10, wherein, instruction-set simulation program can configure holding of the program that is modeled
OK, add up with the configuration of record standard, be included in each function being modeled performed periodicity.
13. systems according to claim 1, wherein, hardware embodiments describes at least one including in the following:
Detailed HDL hardware embodiments describes;Synthesis manuscript;Place and route manuscript;PLD manuscript;Testboard;
Diagnostic test for checking;The manuscript of operational diagnostics test on one section of simulation program;And testing tool.
14. systems according to claim 1, wherein, the device described for producing hardware embodiments includes:
For producing the device of the hardware description language description that hardware embodiments describes from configuration instruction;
Describe based on hardware description language, for synthesizing the device of the logic for hardware embodiments;And
Based on the logic synthesized, for each element is laid out and connects up being formed on chip the device of circuit.
15. systems according to claim 14, the device described for producing hardware embodiments also includes:
For verifying the device of the timing of circuit;And
For determining the device of the area of circuit, cycle time and power consumption.
16. systems according to claim 1, also include the device for producing configuration instruction.
17. systems according to claim 16, wherein, for producing the configuration ginseng that user is made by the device of configuration instruction
The selection of number responds.
18. systems according to claim 16, wherein, for producing the device of configuration instruction for producing based on processor
The explanation of design object.
19. systems according to claim 1, wherein, configuration instruction includes at least the one of the revisable characteristic of processor
Item parameter declaration.
20. systems according to claim 19, wherein, at least one parameter declaration is specified and is included functional unit, Yi Jizhi in
A few processor instruction running this functional unit.
21. systems according to claim 19, wherein, at least one parameter declaration specifies the one affecting processor state
The including in of structure, get rid of and one in feature.
22. systems according to claim 21, wherein, described structure is that register file and parameter declaration are specified at this
The number of depositor in register file.
23. systems according to claim 21, wherein, described structure is instruction cache.
24. systems according to claim 21, wherein, described structure is data caching.
25. systems according to claim 21, wherein, described structure is write buffering memory.
26. systems according to claim 21, wherein, described structure is in the ROM on chip and the RAM on chip
One.
27. systems according to claim 19, wherein, at least one parameter declaration specifies a kind of feature of semanteme, and it controls
Data and at least one explanation in instruction within a processor.
28. systems according to claim 19, wherein, at least one parameter declaration specifies one to perform characteristic, and it controls
The execution of instruction within a processor.
29. systems according to claim 19, wherein, the debugging characteristic of at least one parameter declaration given processor.
30. systems according to claim 19, wherein, configuration instruction includes a parameter declaration, and it is specified from predetermined spy
Levy, the imparting of the size of processor elements or number and numerical value at least selects one of which.
31. systems according to claim 1, also include the device of the suitability for assessing configuration instruction.
32. systems according to claim 31, wherein, the device for assessment includes interactive assessment instrument.
33. systems according to claim 31, wherein, the device for assessment is used for assessing being described by configuration instruction
The ardware feature of reason device.
34. systems according to claim 31, wherein, the device for assessment is used for the property assessed according to processor
Energy characteristic assesses the suitability of configuration instruction.
35. systems according to claim 34, also include the device for providing information, and it is special according to the performance assessed
Property carries out the amendment of configuration instruction.
36. systems according to claim 34, wherein, needed for Performance Characteristics includes realizing this processor on one chip
Area, power that processor is consumed and processor clock speed at least one.
37. systems according to claim 31, wherein, for assessment device be used for according to processor assessed soft
Part characteristic assesses the suitability of configuration instruction.
38. according to the system described in claim 37, and wherein, the device for assessment passes through by described by configuration instruction
Performing a set of benchmark on reason device, therefrom to required code size and periodicity, at least one makes assessment,
Thus interactively provide a user with suitability assessment.
39. systems according to claim 31, wherein, are used for the device of assessment to by the processor described by configuration instruction
Every ardware feature and every software feature make assessment.
40. systems according to claim 1, wherein, for producing the device of the description of the hardware embodiments of processor
Performance and the cost behavior of hardware are provided the most simultaneously, and for producing the device of SDK together with for producing process
The device that device hardware embodiments describes is used for producing software application performance information, in order to modify configuration instruction.
41. systems according to claim 1, wherein, for producing the device of the description of the hardware embodiments of processor
The performance of hardware and the characteristic of cost are provided the most simultaneously, and for producing the device of SDK together with for producing place
The device that reason device hardware embodiments describes is used for producing software application performance information, in order to be extended configuration instruction.
42. systems according to claim 1, wherein, the device of the hardware description for producing processor provides the most simultaneously
The performance of hardware and the characteristic of cost, and for producing the device of SDK together with real for producing processor hardware
The device that scheme of executing describes is for producing software application performance information, in order to the description of configuration instruction;And be used for producing place
The device of the hardware description of reason device provides the performance of hardware and the characteristic of cost, and for producing the device of SDK
Together with the device for producing the description of processor hardware embodiment for producing software application performance information, in order to configuration is said
Bright extension is described.
43. systems according to claim 1, also include that the basic configuration by extensible processor generates the one of processor
Plant the device of configuration.
44. systems according to claim 1, wherein, additional instruction is specified in extension explanation.
45. systems according to claim 1, wherein, include advising to user for producing the device of SDK
The possible user being suitable at least one application defines the device of instruction.
46. systems according to claim 1, wherein, SDK includes one section of compiler, fixed to produce user
Justice instruction.
47. systems according to claim 46, wherein, described compiler can optimize the generation defining instruction containing user
Code.
48. systems according to claim 1, wherein, SDK include every at least one: use can be produced
The assembly program of family definition instruction;The simulation program of execution using personal code work that user defines instruction can be simulated;And
It is able to verify that user defines the instrument of user's embodiment of instruction.
49. systems according to claim 46, wherein, compiler can automatically produce additional instruction.
50. systems according to claim 1, wherein:
A kind of new feature is specified in extension explanation, and this feature has the function designed by user with abstract form;And
New feature is also redefined by the device described for producing hardware embodiments, and is integrated into detailed
Among hardware embodiments describes.
51. systems according to claim 50, wherein, extension explanation is the statement in instruction set architecture language,
It is used to specify a kind of operation code assignment and a kind of instruction semantic.
52. systems according to claim 51, wherein, the device described for producing hardware embodiments includes from instruction
Architecture language definition produces the device of instruction decoding logic.
53. systems according to claim 52, wherein, for produce hardware embodiments describe device also include based on
Instruction set architecture language definition, for producing the signal specifying register operand purposes for instruction interlocking and hang-up logic
Device.
54. systems according to claim 50, wherein, include referring to for generation for producing the device of SDK
Making the device of coding/decoding method, above-mentioned coding/decoding method is for being adapted among the instruction-set simulation program of configuration instruction.
55. systems according to claim 50, wherein, include for producing volume for producing the device of SDK
The device of code table, above-mentioned coding schedule for be adapted to configuration instruction, produce processor object code a paragraph assembly program it
In.
56. systems according to claim 50, wherein, the device described for producing hardware embodiments is additionally operable to as newly
Feature generate the hardware description of data path, the specific streamline system knot of the hardware of above-mentioned data path and this processor
Structure is consistent.
57. systems according to claim 44, wherein, extra-instruction does not increases new state to processor.
58. systems according to claim 44, wherein, extra-instruction increases state to processor.
59. systems according to claim 1, wherein, configuration instruction includes that being described language by instruction set architecture describes
Specified is at least some of.
60. systems according to claim 59, wherein, the device described for producing hardware embodiments includes from instruction
Architecture language automatically produces the device of instruction decoding logic in describing.
61. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK
Architecture language automatically produces the device of a paragraph assembly program kernel in describing.
62. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK
Architecture language automatically produces the device of one section of compiler in describing.
63. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK
Architecture language automatically produces the device of one section of disassembler in describing.
64. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK
Architecture language automatically produces the device of one section of instruction-set simulation program in describing.
65. systems according to claim 1, wherein, the device described for producing hardware embodiments includes hardware
Embodiment describes and in the device of SDK, an at least one of part carries out pretreatment, in order to according to configuration
The device described hardware embodiments respectively and software tool is modified is described.
66. systems according to claim 65, wherein, for the device of pretreatment according to configuration instruction to hardware embodiment party
Case describes and one of them a expression formula of SDK is estimated, and replaces this expression with a numerical value
Formula.
67. systems according to claim 66, wherein, this expression formula includes iteration structure, construction of condition and data base
In inquiry at least one.
68. systems according to claim 1, wherein, configuration instruction includes at least one parameter declaration, in order to designated treatment
The characteristic revised of device.
69. systems according to claim 68, wherein, can revise characteristic is the amendment that core is described, and in core
One in the optional feature do not specified in explanation.
70. systems according to claim 1, wherein, configuration instruction includes the binary of at least one given processor
The parameter declaration of selectable properties, the processor characteristic that at least one available parameter is specified.
71. 1 kinds are used for the method designing configurable processor, and the method includes:
According to configuration instruction, the hardware embodiments producing processor describes, and wherein said configuration instruction includes: be used for determining certain
Whether a little features are included binary system within a processor and select part and some predetermined characteristic for given processor
The parameter of parameter selects part;And
According to configuration instruction, produce the SDK being exclusively used in this hardware embodiments,
Wherein, configuration instruction includes at least one extension explanation of the expansible characteristic of processor, and this extension explanation appointment is included in
Article one, user-defined instruction and a kind of embodiment for this instruction.
72. 1 kinds of systems being used for designing configurable processor, this system includes:
For producing the device of the configuration instruction containing user's definable part, user's definable part of configuration instruction includes:
About the explanation of user-defined processor state, and
At least one user defines instruction and relevant user-defined function, and this function includes from user-defined processor state
Reading and at least one in the write of user-defined processor state;And
For producing the device that the hardware embodiments of processor describes based on configuration instruction, wherein the hardware of processor is implemented
Scheme includes defining instruction execution unit for the user performing user-defined instruction.
73. according to the system described in claim 72, and wherein, the hardware embodiments of processor describes and includes for performing at least one
Bar user defines instruction and for realizing the description of the control logic needed for user-defined processor state.
74. according to the system described in claim 73, wherein:
The hardware embodiments of processor describes the streamline that an instruction performs;And
Control logic to include with each several part that every one-level of the streamline of instruction execution is relevant.
75. according to the system described in claim 74, wherein:
Hardware embodiments describes the description including the circuit for suspended market order execution;And
Control logic to include for preventing by the circuit of the instruction modification user's definition status stopped.
76. according to the system described in claim 75, wherein, controls logic and includes defining instruction at least one user, use
In performing, instruction sends, operand bypasses and operand writes the circuit of at least one operation in the middle of enable.
77. according to the system described in claim 74, and wherein, hardware embodiments description is included in the streamline of instruction execution
For realizing the depositor of user's definition status in many levels.
78. according to the system described in claim 74, wherein:
Hardware embodiments describes and includes such status register, and they produce each output function number wherein being different from
The pipeline stages of pipeline stages is written into;
Hardware embodiments describes to specify and walks around such write and enter follow-up instruction, and these instructions are written to shape in confirmation
Before state depositor, quote the state of user's definition processor.
79. according to the system described in claim 72, wherein:
Configuration instruction includes a predetermined portions beyond user's definitional part;And
The predetermined portions illustrated includes an instruction being easy to user's definition status is stored in memorizer, and one is easy to from depositing
Reservoir takes out the instruction of user's definition status.
80. according to the system described in claim 79, also includes the device producing software, be used for using described in be easy to user fixed
Justice state is stored in instruction contexts switching user's definition status of memorizer.
81. according to the system described in claim 72, also includes producing at least one device following:
One paragraph assembly program, collects for user-defined processor state and at least one user are defined instruction;
One section of compiler, is compiled for user-defined processor state and at least one user are defined instruction;
One section of simulation program, is simulated for user-defined processor state and at least one user define instruction;
And
One section of debugging routine, debugs for user-defined processor state and at least one user are defined instruction.
82. according to the system described in claim 72, also includes producing a paragraph assembly program, for user-defined processor
State and at least one user define instruction and collect;One section of compiler, for user-defined processor state
And at least one user define instruction and be compiled;One section of simulation program, for user-defined processor state and
At least one user defines instruction and is simulated;And one section of debugging routine, for user-defined processor state and
At least one user defines the device that instruction carries out debugging.
83. according to the system described in claim 72, and wherein, user's definitional part of explanation includes specifying user's definition status
At least one statement of size and index.
84. systems described in 3 according to Claim 8, wherein, user's definitional part of explanation includes depositing with at a processor
User's definition status in device and specify relevant at least one attribute of the encapsulation of user's definition status.
85. according to the system described in claim 72, wherein, user's definitional part of explanation include specify user's definition status with
At least one statement of the mapping relations of processor depositor.
86. according to the system described in claim 72, and wherein, the device described for producing hardware embodiments includes user
Definition status is automatically mapped to the device of the depositor of processor.
87. according to the system described in claim 72, and wherein, user's definitional part of explanation includes illustrating that a class user is fixed
Justice instruction and at least one statement of the impact on user's definition status thereof.
88. according to the system described in claim 72, and wherein, user's definitional part of explanation includes in order to user's definition status
Give at least one assignment statement of a numerical value.
89. 1 kinds of systems being used for designing configurable processor, this system includes:
The kernel software instrument of the SDK being exclusively used in this explanation is produced for illustrating according to instruction set architecture;
And
User-defined instruction module, for according to user-defined instruction, produces at least one module, and this module is for core
Use during user defines instruction implemented by heart software tool, and wherein the hardware embodiments of configurable processor includes using
Instruction execution unit is defined in the user performing user-defined instruction.
90. systems described in 9 according to Claim 8, wherein, kernel software instrument includes producing the generation run on a processor
The software tool of code.
91. systems described in 9 according to Claim 8, wherein, at least one module is implemented as dynamic link library.
92. systems described in 9 according to Claim 8, wherein, at least one module is implemented as a table.
93. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of compiler, and it uses user
The instruction module of definition, for being compiled as generation that is that use user-defined instruction and that can be executed by processor by application
Code.
94. according to the system described in claim 93, and wherein, at least one module includes by compiler for defining user
The module that is compiled of instruction.
95. systems described in 9 according to Claim 8, wherein, kernel software instrument includes a paragraph assembly program, and it uses user
The module of definition, for collecting application as code that is that use user-defined instruction and that can be executed by processor.
96. according to the system described in claim 95, and wherein, at least one module includes by assembly program for by assembler language
Command mappings is the module of user-defined instruction.
97. according to the system described in claim 96, wherein:
This system also includes that kernel instruction set illustrates, in order to the instruction that non-user defines to be described;And
Kernel instruction set illustrates, is used for application compilation being the code that can be executed by processor by assembly program.
98. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of instruction-set simulation program, is used for
The code that simulation can be executed by processor.
99. according to the system described in claim 98, and wherein, at least one module includes that one is modeled program for every
User defines the simulation program module performing to be simulated of instruction.
100. according to the system described in claim 99, and wherein, the module that the program that is modeled uses includes for defining user
The data that are decoded of instruction.
101. according to the system described in claim 100, wherein, when instruction can not be decoded as predefined instruction, and this mould
Plan program uses a module, and the instruction to using this simulation program module is decoded.
102. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of debugging routine, and it uses user
Code that is that use user-defined instruction and that can be executed by processor is debugged by the module of definition.
103. according to the system described in claim 102, and wherein, at least one module includes that a debugged program is for by machine
Device instruction decoding is the module of assembly instruction.
104. according to the system described in claim 102, and wherein, at least one module includes that a debugged program will be for converging
Compile instruction and be converted to the module of character string.
105. according to the system described in claim 102, wherein:
Kernel software instrument includes one section of instruction-set simulation program, the code that can be performed by processor for simulation;And
Debugging routine is for communicating with simulation program, in order to obtain the information about user's definition status for debugging.
106. systems described in 9 according to Claim 8, wherein, according to different kernel instruction set explanations, a single user
Definition instruction can be used by multiple kernel software instrument without modification.
107. 1 kinds of systems being used for designing configurable processor, this system includes:
The kernel software work of the SDK being exclusively used in this explanation is produced for explanation based on instruction set architecture
Tool;
For producing the user-defined instruction module of the group of at least one module based on user-defined instruction, its quilt
Kernel software instrument is used for realizing every user-defined instruction, and wherein the hardware embodiments of processor includes for performing use
The user of the instruction of family definition defines instruction execution unit;And
Storage device, stores the group that instruction module defined by the user produces for simultaneously, and each of which group both corresponds to use
One different set of family definition instruction.
108. according to the system described in claim 107, and wherein, at least one module is implemented as dynamic link library.
109. according to the system described in claim 107, and wherein, at least one module is implemented as a table.
110. according to the system described in claim 107, and wherein, kernel software instrument includes one section of compiler, and it uses use
Family definition instruction module, for by compiling of application be use user-defined instruction and can be held by processor
The code of row.
111. according to the system described in claim 110, and wherein, at least one module includes by compiler for fixed to user
The module that the instruction of justice is compiled.
112. according to the system described in claim 107, and wherein, kernel software instrument includes a paragraph assembly program, and it uses use
The instruction module of family definition, for by application compilation for that use user-defined instruction and can be executed by processor
Code.
113. according to the system described in claim 112, and wherein, at least one module includes by assembly program for the language that will collect
Speech command mappings is the module of user-defined instruction.
114. according to the system described in claim 107, and wherein, kernel software instrument includes one section of instruction-set simulation program, uses
In the code that simulation can be executed by processor.
115. according to the system described in claim 114, wherein, at least one module include one be modeled program for
The module that the implementation status of family definition instruction is simulated.
116. according to the system described in claim 115, and wherein, the module that the program that is modeled uses includes for defining user
The data that instruction is decoded.
117. according to the system described in claim 116, wherein, when instruction can not be decoded as predefined instruction, and this mould
Plan program uses a module, and the instruction to using this simulation program module is decoded.
118. according to the system described in claim 107, and wherein, kernel software instrument includes one section of debugging routine, and it uses use
The module of family definition, debugs code that is that use user-defined instruction and that can be executed by processor.
119. according to the system described in claim 118, and wherein, at least one module includes that debugged program is for referring to machine
Order is decoded as the module of assembly instruction.
120. according to the system described in claim 118, and wherein, at least one module includes that debugged program is for referring to compilation
Order is converted to the module of character string.
121. 1 kinds of systems being used for designing configurable processor, this system includes:
Explanation based on instruction set architecture, for producing the soft-hearted part of polykaryon of the SDK being exclusively used in this explanation
Instrument;
Illustrating based on user-defined instruction set, for producing the user-defined instruction module of at least one module, it is by one
Group kernel software instrument is used for realizing user-defined instruction, and wherein the hardware embodiments of processor includes for performing user
The user of the instruction of definition defines instruction execution unit.
122. according to the system described in claim 121, and wherein, at least one module is implemented as dynamic link library.
123. according to the system described in claim 121, and wherein, at least one module is implemented as a table.
124. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of compiler,
It uses user-defined instruction module, for that application is compiled as using user-defined instruction and can be processed
The code that device performs.
125. according to the system described in claim 124, and wherein, at least one module includes by compiler for fixed to user
The module that the instruction of justice is compiled.
126. according to the system described in claim 121, and wherein, least one set kernel software instrument includes a paragraph assembly program,
It uses user-defined instruction module, for will apply that compilation is the user-defined instruction of use and can be processed
The code that device performs.
127. according to the system described in claim 126, and wherein, at least one module includes by assembly program for the language that will collect
Speech command mappings is the module that user defines instruction.
128. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of instruction-set simulation
Program, the code can being executed by processor for simulation.
129. according to the system described in claim 128, wherein, at least one module include one be modeled program for
The module performing to be simulated of family definition instruction.
130. according to the system described in claim 129, and wherein, the module that the program that is modeled uses includes for defining user
The data that instruction is decoded.
131. according to the system described in claim 130, wherein, when instruction can not be decoded as predefined instruction, and this mould
Plan program uses a module, and the instruction to using this simulation program module is decoded.
132. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of debugging routine,
It uses user-defined module, adjusts code that is that use user-defined instruction and that can be executed by processor
Examination.
133. according to the system described in claim 132, and wherein, at least one module includes
One debugged program for being decoded as the module of assembly instruction by machine instruction.
134. according to the system described in claim 132, and wherein, at least one module includes that a debugged program will be for converging
Compile instruction and be converted to the module of character string.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/246,047 US6477683B1 (en) | 1999-02-05 | 1999-02-05 | Automated processor generation system for designing a configurable processor and method for the same |
US09/246,047 | 1999-02-05 | ||
US09/323,161 | 1999-05-27 | ||
US09/323,161 US6701515B1 (en) | 1999-05-27 | 1999-05-27 | System and method for dynamically designing and evaluating configurable processor instructions |
US09/322,735 | 1999-05-28 | ||
US09/322,735 US6477697B1 (en) | 1999-02-05 | 1999-05-28 | Adding complex instruction extensions defined in a standardized language to a microprocessor design to produce a configurable definition of a target instruction set, and hdl description of circuitry necessary to implement the instruction set, and development and verification tools for the instruction set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1382280A CN1382280A (en) | 2002-11-27 |
CN1382280B true CN1382280B (en) | 2016-11-30 |
Family
ID=
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5896521A (en) * | 1996-03-15 | 1999-04-20 | Mitsubishi Denki Kabushiki Kaisha | Processor synthesis system and processor synthesis method |
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5896521A (en) * | 1996-03-15 | 1999-04-20 | Mitsubishi Denki Kabushiki Kaisha | Processor synthesis system and processor synthesis method |
Non-Patent Citations (2)
Title |
---|
Barry SHACKLEFORD等.An integrated Processor Synthesis and Compiler Generation System.《IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS》.1996,第E79-D卷(第10期),1373-1380页. * |
Mark R.Hartoog等.Generation of Software Tools from Processor Description for Hardware/Software Codesign.《DAC97》.1997,第303-306页. * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100874738B1 (en) | Automated processor generation system for designing a configurable processor and method for the same | |
US8875068B2 (en) | System and method of customizing an existing processor design having an existing processor instruction set architecture with instruction extensions | |
US10360327B2 (en) | Modifying a virtual processor model for hardware/software simulation | |
Hoffmann et al. | Architecture exploration for embedded processors with LISA | |
Hoffmann et al. | A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language | |
Hoffmann et al. | A methodology for the design of application specific instruction set processors (ASIP) using the machine description language LISA | |
US20070277130A1 (en) | System and method for architecture verification | |
Chattopadhyay et al. | LISA: A uniform ADL for embedded processor modeling, implementation, and software toolsuite generation | |
Halambi et al. | Automatic software toolkit generation for embedded systems-on-chip | |
CN1382280B (en) | For designing automatic processor generation system and the method thereof of configurable processor | |
Brandner | Compiler backend generation from structural processor models | |
Balboni et al. | Partitioning of hardware-software embedded systems: A metrics-based approach | |
JP2010238256A (en) | System for designing extension processor | |
Balarin et al. | High Level Synthesis | |
Huang | Instruction-Level Abstraction for Program Compilation and Verification in Accelerator-Rich Platforms | |
Mishra et al. | Architecture description languages | |
Hauff | Compiler directed codesign For FPGA-based embedded systems | |
Huang et al. | Automatic Platform Synthesis and Application Mapping for Multiprocessor Systems On-Chip | |
Pees | Modeling embedded processors and generating fast simulators using the machine description language LISA | |
Hoffmann et al. | A Novel Methodology for the Design of Application Specific Integrated Precessors (ASIP) Using a Machine Description Language | |
Augustine et al. | Generation and use of an ASIP software tool chain | |
Hanna et al. | A symbolic execution framework for algorithm-level modelling | |
Lam | High level design methodology for systems including FPGSs | |
Weber et al. | Efficiently Describing and Evaluating the ASIPs | |
Melham | A Symbolic Execution Framework for Algorithm− Level Modelling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
SE01 | Entry into force of request for substantive examination | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
CX01 | Expiry of patent term |
Granted publication date: 20161130 |