CN1382280B

CN1382280B - For designing automatic processor generation system and the method thereof of configurable processor

Info

Publication number: CN1382280B
Application number: CN00812731.XA
Authority: CN
Inventors: 厄尔利·A·基利安; 理查多·E·冈萨雷兹; 阿西什·B·迪克斯特; 蒙妮卡·莱姆; 沃尔特·D·里奇坦斯坦; 克里斯托弗·劳恩; 约翰·拉坦伯格; 罗伯特·P·威尔森; 阿伯特·R－R·王; 多尔·E·麦丹; 文·K·蒋; 理查德·鲁戴尔
Original assignee: Tensilica Inc
Current assignee: Tensilica Inc
Filing date: 2000-02-04
Publication date: 2016-11-30
Anticipated expiration: 2020-02-04

Abstract

A kind of configurable risc processor realizes user-defined instruction set with high performance fixing and variable-length coding.The process of definition KNI is by the support of various instruments, and these instruments allow user add new instruction and they carried out rapid evaluation, to keep multiple instruction set and to switch between which.A kind of standardized language is used to configurable every definition of development goal instruction set and describes for realizing the HDL of the hardware needed for this instruction set, and for checking and the various developing instruments of application development, realize the automatization of height the most in the design process.

Description

For designing automatic processor generation system and the method thereof of configurable processor

Background of invention

1. invention field

The present invention relates to microprocessor system, it is more particularly related to containing the one of one or more processors Planting the design of application program solution, here, each processor in system is so configured in their design process And reinforcement, to improve they suitabilitys to a kind of application-specific.The present invention is also towards such a system, wherein, Application developer can develop instruction extension, the newest finger on the basis of existing instruction set architecture rapidly Order, instructs including controlling the new of user-defined processor state, and measures such extension immediately to application program operation Time and the impact on processor cycle time.

2. the explanation of correlation technique

Traditionally, it is once highly difficult for being designed processor and revise.For this reason, great majority are containing place The system of reason device all uses those (schemes) once designing for general-use and verifying, then by multiple application program always Continue to use.So, they to the suitability of application-specific and are the most all preferable.Amendment processor so that Preferably perform that the code of application-specific is typically suitable for (such as, running more hurry up, lower power consumption is a little, or cost Reduce).But, even if revising the design of existing processor, its difficulty, thus its time, cost and risk, be all the highest , so typically not doing that.

In order to be more fully understood that the processor making prior art becomes the difficulty that configurable processor is run into, let us Consider its development process.First, its instruction set architecture (ISA) will be developed.Substantially, carry out the step for taking second place Rear it is necessary to used decades by many systems.Such as, Intel Pentium^TMInstruction set used by processor is probably and inherits Legacy as far back as 8008 and 8080 microprocessors that 1970 mid-nineties 90s introduced.In this process, based on predetermined ISA design Specification, each ISA instructs, and syntax etc. is developed, and for the SDK of ISA, such as assembly program, debugs journey Sequence, compiler etc. is also developed.Subsequently, the simulated program for specific ISA, various benchmark quilts are developed Run, to assess the effectiveness of ISA, and according to the result of assessment, ISA is modified.On certain some, ISA will be recognized For being satisfied, and the ISA description fully developed along with portion, one section of ISA simulated program, a ISA proving program group And a kind of exploitation program groups, including such as assembly program, debugging routine, completing of compiler etc., ISA process is just declared end Tie.Then, processor design is proceeded by.Owing to processor may have the service life in many years, so this process Execution is the most typically, after a kind of processor once designs, always by many systems with many years.As long as Provide ISA, its proving program group, simulated program and the various development goals of different processor, just can be to this processor Microarchitecture is designed, emulates and revises.Once microarchitecture is finalized, and it is just included into a kind of hardware description language (HDL) among, and developing a kind of microarchitecture proving program group, in order to verify this HDL embodiment, (majority is afterwards Carry out).Then, processing with the craft described for this point and contrast, design aids can describe based on HDL and close Become a circuit, and its each element is laid out and connects up.Layout can be modified subsequently, to optimize chip area Use and timing.Alternatively, it is possible to use additional craft processes and generates the site plan described based on HDL, HDL is converted to circuit, the most artificially and automatically to circuit verifies and carry out layout designs.Finally, one is used Layout is verified by automation tools, to confirm that it matches with circuit, and enters each circuit according to every layout parameter Row checking.

After completing processor exploitation, system is carried out master-plan.It is different from the design of ISA and processor, system Design (it can include that chip designs, and present chip includes processor) is the most common, and typically enters system Row design continuously.Each system is all used one section of extremely short time cycle (1 or 2 year) by a kind of application-specific.Base In predetermined aims of systems, such as cost, performance, power and function, the processor description existed in advance, chip version type explanation Book (is generally closely connected with processor distributor), is designed the architecture of whole system, selects a kind of processor to make Match with design object, and the version type (this with processor select be closely connected) of selected processor.

Subsequently, selected processor, ISA, version type and simulated program, checking and the developing instrument of exploitation in advance is given (being also used for the standard cell lib of selected version type), designs the embodiment of this system, for the HDL embodiment party of this system Case develops a kind of proving program group, and makes this embodiment be verified.Secondly, the circuit of this system is synthesized, at circuit board On carry out place and route, and layout and timing are carried out re-optimization.Finally, these plates are designed and layout, produce Each chip, and assemble each circuit board.

Another difficulty of prior art processor design is exactly, owing to any given application program only needs every spy The specific combination levied, and to allow a processor have this unwanted feature of application program will be undue costliness , consume more power, and be more difficult to manufacture, so design has traditional processor of more features to cover simply It is unsuitable for covering all of application program.Additionally, when starting to design a kind of processor, it has not been possible to know all of application Target.If the amendment process of processor can realize automatization and very reliably, then system designer produces application solution Ability will strengthen significantly.

As an example, it is considered to such a device, it is designed on a channel using complex protocol Send and receive data.Owing to this agreement is complicated, it is impossible to all use hardwire (such as combination logic) to close Complete processing procedure to reason, the substitute is, programmable processor is introduced this system and is used for protocol processes.Programmability is also Permission mistake is fixed, and by by new software load memorizer, protocol update in the future just can be completed.But, pass System processor do not design for this application-specific (when design this kind of processor time, even this application Program may not yet occur), and it needs to perform such certain operations, and these operations need several instructions to go, and only These operations just can be completed with one or several instruction in additional processor logic.

Owing to processor can not improve easily so that many system designers are not intended to do so, and change into one Plant on available processor, select to perform the pure software solution of a kind of poor efficiency.This poor efficiency causes a kind of solution Scheme may be slower, or need more power, or relatively costly (such as, it may need that one piece bigger, function more Strong processor, performs this program with enough speed).Other designers select to design for this application program at them Specialized hardware in provide some to process requirement, such as one coprocessor, then allow programmer logical in the difference of program Cross coding to access this specialized hardware.But, owing to the most sizable working cell is just sufficiently accelerated so that by making The time saved with specialized hardware travels to and fro between adding needed for specialized hardware transmits data more than (translator's note: should be and be less than) Time, so, between processor and specialized hardware, transmit time restriction this scheme the making in system optimization of data With.

In the example of communication channel application, this agreement may need encryption, error correction, or compression/decompression processes.This The process of sample generally carries out operating rather than operating on the bigger word of processor on individual other bit.For one The circuit that item calculates is probably moderate, but allows processor go to extract each bit, sequentially processes it, then Reload each bit, considerable expense will be increased.

As a example the most special, it is considered to (similar coding is used for use the Hafman decoding of rule shown in table 1 MPEG compression standard).

Pattern	Value	Length
				0 0 X X X X X X	0	2
0 1 X X X X X X	1	2
			1 0 X X X X X X	2	2
1 1 0 X X X X X	3	3
			1 1 1 0 X X X X	4	4
1 1 1 1 0 X X X	5	5
			1 1 1 1 1 0 X X	6	6
1 1 1 1 1 1 0 X	7	7
			1 1 1 1 1 1 1 0	8	8
1 1 1 1 1 1 1 1	9	8

Numerical value and length thereof will calculate, and therefore, in code stream, each length bit can be eliminated, in order to Find the starting point of next element to be decoded.

For a conventional instruction set, this carries out coding multiple method, but much tests needs owing to having Do, and with the simple gate time delay of combination logic compares, each Software implementations is required for multiple processor cycle, so All of which needs many bar instructions.Such as, the embodiment of a kind of effective prior art using MIPS instruction set may 6 logical operationss, 6 conditional branchings, 1 arithmetical operation, and relevant depositor is needed to load.A kind of optimization is used to design Instruction set can make to encode, but in terms of the time, still expense is the biggest: 1 logical operations, 6 conditional branchings, 1 Arithmetical operation, and relevant depositor loading.

In terms of processor resource, expense is so large that so that typically to use the synopsis of a 256 row, comes Replace the coding of the processing procedure of the sequence as successive appraximation.But, the synopsis of a 256 row to take substantial amounts of sky Between, and access this table and may also need to many cycles.For longer Huffman encoding, the size of table will become nothing Method uses, and it will cause more complicated and slow code.

Within a processor, the possible issue-resolution catering to special applications requirement uses configurable process exactly Device, it has the instruction set and architecture being prone to revise and extend, in order to improves the function of processor and realizes determining of function System.Configurability allows designer to specify in its product the need of or needs how many additional function.Configurability is Simple one is that binary system selects: a kind of feature be with or without.For example, it is possible to provide one to be with or without floating point hardware Processor.

By using the selection of configuration of finer Asymptotical Method, motility is made to be improved.Such as, processor is permissible System designer is allowed to specify the number of depositor in register file, the width of memorizer, cache memory big Little, the relatedness etc. of cache memory.But, these options are still not reaching to by system designer wanting according to oneself The level that method is customized.Such as, in the example of superincumbent Hafman decoding, although in the prior art, it is not known that system Designer may like and includes a special instruction in and be decoded, such as,

Huff8 t1, t0

Here, the most-significant byte of result is decoded numerical value, and meanwhile, least-significant byte is length.With described above soft Part embodiment contrasts, the direct hardware embodiments of Hafman decoding be foolproof except instruction decoding etc. with Outward, the decoding logic for the instruction of combination logic function generally has 30 doors, or the door of a typical processor Number less than 0.1%, and can be calculated in a monocycle by an application specific processor, therefore, with only making Comparing by universal command, it improves the factor is 4-20.

Prior art effort in terms of configurable processor generation is generally divided into two classes: work-in parameters hardware description and The logic synthesis used；And from abstract machine describe compiler and the repurposing of assembly program.Belong to the 1st class The processor hardware design that can synthesize, such as Synopsys DW 8051 processor, ARM/Synopsys ARM7-S, Lexra The configurable risc core of LX-4080, ARC；And the most also include that Synopsys can synthesize/configurable Pci bus interface.

In the above example, Synopsys DW 8051 includes the binary compatible of a kind of existing processor architecture Embodiment；And synthetic parameters in a small amount, 128 or 256 bytes of such as internal RAM, parameter rom_addr_size determine ROM address realm, an optional intervalometer, the serial port of a variable number (0-2), and one support 6 Or the interrupt location in 13 sources.Although the architecture of DW 8051 be may be made that some change, but at its instruction set Structure can not be made change.

ARM/Synopsys ARM7-S processor includes the reality of the binary compatible of existing architecture and microarchitecture Execute scheme.It has two configurable parameters: high-performance or the selection of low performance multiplier, and include in debugging routine and Line emulation logic.Although it is possible to make the instruction set architecture of ARM7-S change, but they are existing can not to join The subset of the processor embodiment put, so need not new software.

LX-4080 processor has a configurable variant of the MIPS architecture of standard, and to instruction set extension not Software support is provided.Its option includes a customization engine interface, and its permission dedicated operations is to MIPS ALU The operation code of ALU is extended；One interior hardware interface, it includes a register source and a depositor or 16 bit wides Immediate source, and target and pending signal；One simple MMU option；3 MIPS coprocessor interface； One leads to cache memory, scratch RAM or the local memory interface flexibly of ROM；One bus control unit, it External function and memorizer are connected to the local bus of this processor self；And the write buffer of a configurable deep.

Between the configurable risc core of ARC and the door counting estimation rapidly obtaining data, there is a user interface, on State estimation to configure based on object technology and clock speed, instruction cache, instruction set extension, an intervalometer choosing , a scratch-pad storage device option, and Memory Controller option；One instruction set with selectable option, Such as there are the local scratch RAM of the data block being sent to memorizer, special register, up to 16 kinds additional state code choosings Select, 32 × 32 bit scoreboard multiplication blocks, 32 barrel-shifter/ spill spin blocks of a monocycle, a normalization (finding the 1st) instructs, and result is directly write order buffer storage (not being written into register file), 16 MUL/ MAC block and 36 bit accumulators, and use the sliding pointer in order to access local SRAM of linear arithmetic；And by manual The user instruction that editor's VHDL source code defines.ARC is designed without describing the device of language for realizing a kind of instruction set, also Do not produce the software tool that configurable processor is special.

The configurable pci interface of Synopsys includes for installing, configure and the GUI of synthesis activity or command line interface； Check whether the user action taking necessity in each of the steps；That selected, based on configuration (such as Verilog is to VHDL) The installation of design document；Selectable configuration, such as parameter are arranged, and prompt the user with the inspection of Combination efficiency The numerical value of every configuration, the HDL source code updated with user produces HDL and does not goes to edit HDL source file；And synthesis merit Can, such as one user interface, technology bank is analyzed by it, to select I/O buffer, the constraints unrelated with technology with And synthesis manuscript, Buffer Insertion and the prompting of the buffer for particular technology, and the formula unrelated with technology is converted to Depend on the manuscript of technology.Owing to configurable pci bus interface achieves the consistency check of parameters, based on configuration Install, and the automatic amendment of hdl file, so such EBI is noticeable.

Additionally, the synthetic technology of prior art illustrates based on ownership goal and selects different mapping relations, it is allowed to this Speed, power, area or target component are optimized by mapping relations.In this, in the prior art, not by whole On the premise of individual mapping process is designed, it is impossible to obtain the feedback of the effect reconfiguring processor by this way. Such feedback can be used to bootstrap processor and further reconfigure, until it reaches till system design goal.

In the field that configurable processor produces, (that is, compiler and assembly program is automatic for the 2nd class prior art Repurposing) relate to large-scale academic research, see for example that Hanono et al. write " sends out at AVIV retargetable code Instruction in raw device selects, resource distribution and scheduling " (for the expression of the machine instruction automatically generated of code generator)； " the using nML to describe instruction set processor " that Fauth et al. is write；What Ramsey et al. was write " uses in embedded systems In the machine description setting up instrument "；" code using tree coupling and dynamic programming produces " that Aho et al. is write is (in order to mate The algorithm of the various conversion relevant with each machine instruction, such as, is added, loads, stores, branch etc., has a series of It is represented as the procedure operation of some machine-independent intermediate form, uses the various methods of such as pattern match)；And " formalization of code generator and be derived automatically from " that Cattell is write (machine architecture for compiler research Abstractdesription).

Once processor has been devised, and just should verify its running.In other words, processor is usual Use a streamline (its every one-level is all adapted to the stage that instruction performs), perform each from the instruction of a storage Item instruction.Therefore, change or increase an instruction or change configuration may be made universal by needs in the logic of processor Changing, therefore, each in multiple pipeline stages can perform suitable action in each such instruction.One It is verified by configuration requirement again that plant processor, and this checking is applicable to every change and interpolation.This is not one The simple task of item.Various processors are all the complicated logical devices of internal data and the controlled state with extension, and Control, data make processor checking become the technology of a kind of needs with the combination of program.The difficulty increased verified by processor It it is exactly the difficulty in the verification tool that exploitation is suitable.Due in the prior art, checking is carried out the most automatically, so it Motility, speed and reliability are below optimum.

Additionally, once processor is devised and through checking, if easily can not be programmed it, that is the most not It is useful especially.Generally extension software tool with the help of processor is programmed, above-mentioned instrument include compiler, Assembly program, linker, debugging routine, simulated program and tracing program.When processor changes, software tool also must Must change therewith.If one instruction can not be compiled, collects, emulates or debug, then it is unhelpful for adding such instruction. In the prior art, it is main that relevant to processor amendment and improvement software changes be once to promote processor to design one Obstacle.

Thus, it will be seen that design and revise various process owing to being generally typically not for a kind of special applications Device, so the processor design of prior art is among a certain degree of difficulty.If also, it can be seen that can be for spy Very should be used for configuring and extend various processor, then be possible to obtain considerable improvement in system effectiveness.Further, if can be Feedback used to improve the design of processor in embodiment characteristic (such as power consumption, speed etc.), just can promote design process Efficiency and effectiveness.And, in the prior art, a processor is once modified, it is necessary to carries out substantial amounts of effort, tests Demonstrate,prove the correct running of amended processor.Finally, although prior art provides limited processor configurability, but they The finishing of configured processor can not be used for for the offer that produces of SDK.

The system meeting above-mentioned specification must be an improvement in industry, may be made that improvement for example, it is desired to Having such a processor system, the information (that is, processor state) that it has being stored in inside special register is visited The instructions asked or revise, it significantly limit the scope that can obtain instructions, and therefore limits obtainable property The quantity that can improve.

Equally, invent new special instruction to relate to reducing cycle count, adding hardware resource and time cpu cycle shadow The compromise of complexity is made between sound.Another challenge is exactly the most complicated in high-performance microprocessor embodiment In details, on the premise of being not related to application developer, obtain effective hardware embodiments for new instruction.

Said system gives the user the motility designing a kind of processor the most supporting with her application.But it is right For the interactive development of hardware and software, remain pretty troublesome.In order to be more fully understood by this problem, it is considered to so A kind of typical scenario, the program is used for being adjusted the performance of its software application by many software developers.They will Typically expect a kind of possible improvement, revising their software to use this possible improvement, recompilating theirs Software source, in order to produce the application program run containing that possible improvement, and subsequently possible improvement is carried out Assessment.According to the result of assessment, they can retain or abandon these possible improvement.Typically, whole process may only exist Complete in a few minutes.This enables a user to freely test, and is quickly carried out attempting and determine retaining or abandoning Idea.In some cases, it is the most complicated for assessing a kind of possible idea rightly.User may need in several cases This idea is tested.In this case, user generally retains the miscellaneous editions of the application program compiled: a kind of Prototype version and the another kind of version containing possible improvement.In some cases, possible improvement can be interactively, And user can retain the plural copy of this application program, each all uses of possible improvement Different subsets.By retaining miscellaneous editions, user just can in varied situations, the version that easily repeatable test is different.

The user of configurable processor likes being similar to software developer and develops the side of software on traditional processor Formula interactively develops jointly hardware and software.Consider that the instruction of customization is added in configurable processor by user to go so Situation.User likes interactively various possible instructions being added in their processor, and their spy Fixed application program is tested and is assessed those instructions.In prior art systems, due to 3 kinds of reasons so that this becomes difficulty.

First, after proposing a possible instruction, obtaining the compiler and emulation that can have benefited from this instruction Before program, user has to wait for more than one hour.

Secondly, when user wishes to test with many possible instructions, user be necessary for each instruction generate with Retain a software development system.Software development system may be the hugest.Retain many versions may become to manage.

Finally, software development system configures for whole processor.This makes to decompose exploitation in the middle of different engineers Process becomes highly difficult.Consider that two developers are operated such a example in a specific application simultaneously.One Developer may be responsible for determining the characteristic of the cache memory of processor, another instruction being then responsible for adding customization. When the two developer working relation together time, the most a piece of is all the most separable so that each developer Her task can be carried out in isolation from each other.The developer of cache memory may propose a kind of special joining at the very start Put.Another developer starts from this configuration, and attempts several instruction, sets up a software for each possible instruction Development system.Now, the configuration of the cache memory that developer's amendment of cache memory has pointed out.Due to his All use the configuration of original cache memory each of in configuration, so another developer now has to rebuild her Configuration in each of.If there are many developers to be operated in a project simultaneously, be by different configuration tissues Cannot manage to may become soon together.

The summary of the present invention

Instant invention overcomes these problems of prior art, and its target is just to provide such system System.It is used for entering processor from identical configuration instruction by the description and a group producing the hardware embodiments of processor The SDK of row programming, automatically configures a kind of processor.

Another target of the present invention is just to provide such a system, and it can be for different performance specifications, to hardware Embodiment and SDK are optimized.

A further object of the present invention is just to provide such a system, and it is given different types of configurable for processor Property, including extensibility, binary system selects and parameter modification.

Another target of the present invention is just to provide such a system, and it is with a kind of language that can be readily implanted hardware Speech describes the instruction set architecture of processor.

The further object of the present invention is just to provide such a system and method, in order to develop and to realize and can amendment process The instruction set extension of device state.

Another target of the present invention is just to provide such a system and method, in order to develop and to realize revising and can join The instruction set extension of each depositor of the processor put.

A further object of the present invention is exactly to allow user to customize a kind of processor configuration by adding new instruction, and And this feature can be assessed within a few minutes.

By providing an automatic processor generation system, it is possible to reach above-mentioned target, said system uses with standardization The configuration definition of a kind of target instruction set is developed in the processor instruction set option of the customization that language is write and extension, for realizing being somebody's turn to do The hardware description language explanation of the circuit needed for instruction set, and various developing instrument, such as compiler, assembly program, adjust Examination program and simulated program, they can be used to generate software for this processor and verify this processor.Can With for different specifications, such as area, power consumption and speed, carry out the embodiment of optimized processor circuit.A kind of processor is joined Put and be once developed, it just can be tested, and is imported into system to be modified, in order to optimized processor repeatedly Embodiment.

In order to develop an automatic processor generation system according to the present invention, need a kind of instruction set architecture of definition Describe language, and develop various developing instrument, such as assembly program, linker, compiler and debugging routine.This is out A part for the process of sending out, because while major part instrument is all standard, but they should be modified to be able to retouch according to ISA State and be automatically configured.This part of design process is typically by designer or the life of automatic processor design tool itself Product person complete.

One running according to the automatic processor generation system of the present invention is as follows.One user, such as one system sets Meter person, develops a kind of configurable instruction set architecture.In other words, ISA definition and the instrument of previously exploitation, exploitation are used Go out to follow the configurable instruction set architecture of one of certain ISA design object.Then, for this configurable instruction collective Architecture configuration developing instrument and simulated program.Use configurable emulator, run benchmark test, to assess configurable finger Make the effectiveness of architecture, and revise its core according to assessment result.The most configurable instruction set architecture It is in a kind of satisfactory state, just for a kind of proving program group of its exploitation.

While paying close attention to the software aspects of this process, this system is closed also by developing a kind of configurable processor Note hardware aspect.Then, use the such as aims of systems such as cost, performance, power, function and produce about available processor The information of producer, the system architecture that the design of this system is overall, it is in view of configurable ISA option, extension and processor Feature.Use the system architecture of entirety, exploitation software, simulated program, configurable instruction set architecture and process The HDL embodiment of device, is configured processor ISA, HDL embodiment, software and simulated program, and system by this system HDL is designed to system design on a single die.Equally, based on system architecture and the explanation of chip version type, base Assessment in the version type ability relative to system HDL selects the version type of chip (unlike relating to processor choosing in the prior art Select like that).Finally, use the standard cell lib of this edition type, this configuration system synthesis circuit, it is laid out and connects up, and The ability that layout and timing carry out re-optimization is provided.Subsequently, if this design is not belonging to monolithic type, then to circuit-board laying-out It is designed, manufactures each chip, and assemble each circuit board.

As seen above, employ several technology so that realizing the in extensive range of processor design process Automatization.Be exactly design in order to solve the 1st technology of these problems and realize special mechanism, it unlike random amendment or Extend the most flexible, but it still allows for great function and improves.The randomness changed by restriction, related to this is various Problem also suffers restraints.

2nd technology is exactly to provide a single explanation to every change, and automatically to all affected parts Produce amendment or extension.Owing to something is done once by hand, go automatically to do this part thing also with writing a kind of instrument Using this instrument once to compare, the former is typically more cheap, so not accomplishing this point with the processor of prior art design. When this task is repeated as many times as required execution, the advantage that just can find out automatization.

The 3rd technology used sets up a data base exactly, in order to the estimation assessed for follow-up user is with automatic Configuration provides help.

Finally, the 4th technology is exactly to provide hardware and software with a kind of form of configuration that is suitable for.In the present invention one In individual embodiment, some hardware and software is not directly to write with the hardware and software language of standard, but with such one Plant language to write: by adding a preprocessor, it allows queries configuration database, and has displacement, condition, duplication Standard hardware and the generation of software language with other amendment functions.Then with every hook connecting that improves being come Complete the design of processor.

In order to these technology are described, it is considered to add every special instruction.By be limited to the method to have depositor and Constant operand number also produces the various instructions of a register result, just can be only with for combination (stateless, feedback-less) logic The running of bright various instruction.This inputs the distribution of assigned operation code, instruction name, assembly program syntax, and for this instruction Combination logic (various instruments thus produce):

The instruction decoding logic of this processor, in order to identify new operation code；

Add a functional unit, in order on register operand, perform combination logic function；

It is sent to the input of the instruction scheduling logic of processor, to confirm only when its operand is effective, just sends finger Order；

The amendment of assembly program, to accept new operation code and operand thereof, and produces correct machine code；

The amendment of compiler, increases new intrinsic function, in order to access new instruction；

The amendment of disassembler/debugging routine, in order to be translated as machine code newly instructing；

The amendment of simulated program, in order to accept the logic function specified by new operation code execution；And

Diagnotor generator, it produces direct and random code sequence, is increased in order to comprise and to check The result of instructions.

Above all technology are all used to add various special instruction.Input is restricted to input and exports each operand With logic so that they are estimated.At one, every change is described, and the amendment of all hardware and software is all Derive from this description.This set represents how a single input can be used to improve multiple parts.

The result of this processing procedure is such system, due in the design process more a little later time, May be made that various compromise, so this system is excellent in terms of meeting application demand between processor and the remainder of system logic In prior art.Owing to its configuration can apply to more representation, so it is better than discussed above multiple existing There is technical scheme.One single source may be used for all of ISA coding, and software tool and senior emulation can be included one in and join Put bag, and flow process can be designed to iterative to find out the best of breed of every configuration numerical value.Further, noted earlier Various methods concentrate on hardware configuration or software arrangements the most individually, and not used for the single user interface controlled, or The measurement system redefined that person one guides for user, the present invention then by whole assignment of traffic to processor hardware and The configuration of software, including the result from hardware designs and software performance, to help to select optimal configuration.

According to an aspect of the present invention, by providing the processor design tool of a kind of automatization just can reach these mesh Mark, the description of the processor instruction set extension of the customization that the use of above-mentioned design tool is write with standardized language, carrys out development goal The configurable definition of instruction set, illustrates for realizing the hardware description language of the circuit needed for this instruction set, and various exploitation Instrument, such as compiler, assembly program, debugging routine and simulated program, it is each that they can be utilized for the exploitation of this processor Plant application, and it is verified.Standardized language can process instruction set extension, and the latter revises processor state or use Configurable processor.By providing a kind of extension being restricted and the field of optimization, just can realize in higher degree The automatization of process, thus promote to develop quickly and reliably.

According to another aspect of the present invention, by providing such system also can reach above-mentioned mesh further Mark, within the system, user can preserve organize possible instruction or state (hereinafter, possible configurable instruction or The combination of state will be collectively referred to as " processor improvement "), and when assessing their application, cut the most between which Change.

User uses method described here select and set up a basic processing unit.User generates new one group User-defined processor improves and they is put among a file directory.Then, user enables a kind of in order to process use Family improve instrument, and convert them to basic SDK can with use form.Owing to it only relates to user Define improves and does not set up a complete software system, so this conversion is quickly.Then user enables substantially SDK, tell this instrument dynamically use in new directory generate every processor improve.It is preferably, via One command-line option or via an environmental variable, provides the position of this catalogue to each instrument.In order to simplify further This process, user can use the software makefiles of standard.These processor instructions allowing users to revise them, And subsequently via a single make order, process every improvement, and use basic SDK, new Rebuild and assess their application under the name that processor improves.

Instant invention overcomes 3 kinds of restrictions in prior art.Giving one group of new possible improvement, user is permissible Every new improvement is assessed within time a few minutes.By generating new catalogue for each group, user just can preserve possible each The miscellaneous editions that item improves.Owing to this catalogue only includes describing rather than the description of whole software system of every new improvement, So required memory space is minimum.Finally, every new improving connects with the remainder of configuration releases.Once user Having generated the catalogue of a possible set with every new improvement, this catalogue just can be joined by she with any basic configuration Close and use.

The brief description of accompanying drawing

When combining all accompanying drawings to read following detailed description, the above and other target of the present invention will become brighter Aobvious, in all accompanying drawings:

Fig. 1 is a block diagram, represents and is performing at of instruction set according to a preferred embodiment of the present invention Reason device；

Fig. 2 is a block diagram, represents according to should the square frame of a streamline used in the processor of embodiment Figure；

Fig. 3 represents a kind of configuration manager in the graphical user interface (GUI) according to the present embodiment；

Fig. 4 represents a configuration edit routine in the graphical user interface (GUI) according to the present embodiment；

It is dissimilar that Fig. 5 represents at the configurability according to the present embodiment；

Fig. 6 is a block diagram, and expression is in the flow process of the processor configuration of this embodiment；

Fig. 7 is a block diagram, represents an instruction set according to the present embodiment；

Fig. 8 is a block diagram, represents one piece of emulation board for the processor configured according to the present invention；

Fig. 9 is a block diagram, represents the logical structure of the configurable processor according to the present embodiment；

Figure 10 is a block diagram, represents and is added among the structure of Fig. 9 by a multiplier；

Figure 11 is a block diagram, represents and is added among the structure of Fig. 9 by a multiply-accumulator；

Figure 12 and 13 these two parts figure represents the configuration of memorizer in the present embodiment；And

Figure 14 and 15 these two parts figure represents the interpolation of the user's defined function unit in the structure of Fig. 8；

Figure 16 is a block diagram, represents in a further advantageous embodiment, the flow of information between each system unit；

Figure 17 is a block diagram, represents in the present embodiment, the custom code for various SDKs be as What produces；

Figure 18 is a block diagram, represents in another preferred embodiment of the invention, the various software moulds used The generation of block；

Figure 19 is a block diagram, the knot of expression streamline in a configurable processor according to the present embodiment Structure；

Figure 20 is the embodiment of the status register according to the present embodiment；

Figure 21 is a figure, represents in the present embodiment, for realizing the additional logic needed for status register；

Figure 22 is a figure, represents the combination of lower a kind of State-output of a kind of state from several semantic chunks, and One of them is selected to be input among a status register according to the present embodiment；

Figure 23 represents the logic corresponding to the semantic logic according to the present embodiment；

Figure 24 represents in the present embodiment, when being mapped to a bit of user register, for one of state The logic of bit.

The detailed description of each preferred embodiment

In general, automatic processor produce process start from configurable processor definition and user specify to it Amendment, also await the application program specified of user for its configuration processor.This information is used to produce one and examines Consider the configurable processor to user's amendment, and produce SDK, such as, for its compiler, emulation Program, assembly program and disassembler, etc..Equally, use various new SDK that application program is carried out again Compiling.Use simulated program that the application program through compiling again is emulated, produce a software features file, in order to retouch State the performance of the configured processor running this application program, and with regard to aspects pair such as silicon chip area utilization, power consumption, speed Configured processor is estimated, in order to produce a hardware characteristics file characterizing processor circuit embodiment.Software Fed back with hardware characteristics file and be supplied to user, in order to being carried out further iteration configuration, making ground processor for this Application-specific is optimised.

Automatic processor generation system 10 according to a preferred embodiment of the present invention has 4 critical pieces, such as Fig. 1 Shown in: a user configures interface 20, it is desirable to the user being carried out design processor by it inputs its configurability and extensibility Option and other design constraints；A set of SDK 30, it can be customized, in order to selected by user Standard carrys out design processor；Description parameterized, extendible to the hardware embodiments of processor 40；And a foundation System 50, it receives input data from user interface there, produce required processor customization, the hardware that can synthesize retouches State, and revise various SDK to adapt to selected design.Preferably, set up system 50 and produce diagnosis by way of parenthesis Instrument, in order to verify design on hardware and software, also produces an evaluator, in order to assess every characteristic of hardware and software.

" the hardware embodiments description " used in this article and in the appended claims refer in order to The one or more description of the various aspects of the embodiment of the physics of processor design is described, and, it is used alone or combines One or more other describe, in order to according to the production of each chip of this design.Therefore, each portion that hardware embodiments describes Divide and may be at the abstract of different levels, from the most senior as such as hardware description language, arrived by netlist and microcode Every shielding describes.In the present embodiment, the major part that hardware embodiments describes is written among HDL, netlist and manuscript.

And, the HDL used in this article and in the appended claims refers to the hardware of general rank Describing language, it is used to describe micro structure etc, and is not intended to represent any special case of this language with it.

In the present embodiment, the basis of processor configuration is exactly the architecture 60 shown in Fig. 2.Many elements of this structure It it is the fundamental characteristics that can not directly modify of user.These include processor control section 62, the section of adjusting and decode 64 (although Some of this section is based partially on the configuration that user specifies), ALU and address generation section 66, branching logic and instruction fetch section 68, and Processor interface 70.Other each unit are all parts for basic processing unit, but can be configured by user.These include Interrupting control section 72, data monitor

section

74 and 76 with instruction address, and window registers file 78, data are deposited with command high speed buffer Storage and marker field 80, write buffer 82 and intervalometer 84.Can be received alternatively by user for remaining each section shown in Fig. 2 Enter.

The central unit of processor configuration system 10 is that user configures interface 20.This is a module, it be desirable to User provides graphical user interface (GUI), and by means of this interface, user likely goes selection to include that compiler reconfigures And assembly program, disassembler and instruction set simulation program (ISS) are in interior processor function；And prepare for whole place Reason device synthesis, place and route input.It also allows user have benefited from processor area, power consumption, circulation time, application performance And the rapid evaluation of code length, in order to iteration and the configuration improving processor further.Preferably, GUI can also access One configuration database, in order to obtain default value according to user's input, and carry out error detection.

Designing a processor 60 to use the automatic processor according to the present embodiment to generate system 10, user will set Meter parameter is input to user and configures among interface 20.It can be to run on meter under user control that automatic processor generates system 10 An isolated blob on calculation machine system；But, it preferably mainly runs on automatic processor and generates the life of system 10 Produce on a system under the control of producer.So, it is possible to provide user to access on a communication network.Example As, it is possible to use a web browser with the data entry screen write with HTML and Java language provides GUI.This There is the benefit of several respects, such as, keep the confidentiality of any proprietary back-end software, simplify maintenance and the renewal of back-end software, etc. Deng.In this case, in order to access GUI, user first has to log in system 10, in order to prove its identity.

Once user is approved to access, and system will show a Configuration Management Officer screen 86, as shown in Figure 3.Configuration Management Officer Screen 86 is a catalogue, and it lists all configurations of user-accessible.Configuration Management Officer screen 86 in Fig. 3 represents to be used There are two kinds of configurations, " just intr " and " high prio " in family, and the former has been set up, i.e. be finalized for producing, then Person still needs to be set up.A kind of selected configuration can be set up from this screen 86 user, it be deleted, edits, generate one Part report, illustrates to be that a kind of configuration and scaling option of this kind of selection of configuration, or generates a kind of new configuration.To those For the configuration having built up, such as " just intr ", a set of SDK 30 into its customization can be downloaded.

Fig. 4 shows and generates a kind of new configuration or carry out a kind of existing configuration editing shown in Fig. 4 to be used Configuration edit routine 88.Configuration edit routine 88 has one " option " to select menu on the left side, represents configurable and extendible Each general aspect of processor 60.When an option portion is selected, occur as soon as having for this part on the right The screen of each config option, and can be as known to industry, with pull-down menu, notepaper frame, check box, radio Knobs etc. arrange these options.Although user can be randomly chosen each option and input data, but, due at each several part Between there is dependency in logic, so data had better be inputted the most item by item；Such as, in order to be appropriately viewed in " interrupting " each option of part, the number of interruption should be those being selected in " ISA option " part.

In the present embodiment, for every part, following config option is all available:

Target

Technology for assessment

Target ASIC technology: .18 .25 .35 micron

Target operating conditions: typical, worst case

Embodiment target

Target velocity: arbitrarily

Door counting: arbitrarily

Objective function: arbitrarily

Target priority: speed, area function；Speed, function, area ISA option

Numerical digit option

There is the MAC16 of 40 bit accumulators: be, no

16 multipliers: be, no

Except option

The number interrupted: 0-32

High-priority interrupt grade: 0-14

Activate debugging routine: be, no

Intervalometer number: 0-3

Other

Byte order: low level formerly, uimsbf unsigned integer most significant bit first

Can be used for calling the register number of window: 32,64

Processor high speed buffer storage and memorizer

Processor interface reading width (bit): 32,64,128

Write buffer row (address/numerical value to): 4,8,16,32

Processor high speed buffer storage

Instruction/data cache size (kB): 1,2,4,8,16

Instruction/data cache row size (kB): 16,32,64

Peripheral components

Intervalometer

Timer interruption number

Timer interruption grade

Debugging is supported

Instruction address breakpoint register number: 0-2

Data address breakpoint register number: 0-2

Debugging interrupt level

Trace port: be, no

Debugging module on chip: be, no

Full scan: be, no

Interrupt

Source: outside, software

Priority level

System memory addresses

Vector and address computation method: XTOS, manual

Configuration parameter

RAM size, initial address: arbitrarily

ROM size, initial address: arbitrarily

XTOS: arbitrarily

Configuration specific address

Vector except user: arbitrarily

Vector except core: arbitrarily

Register window spilling/underflow vector base address: arbitrarily

Reset vector: arbitrarily

XTOS initial address: arbitrarily

Application program initial address: arbitrarily

TIE instructs

(defining every ISA extension)

Target CAD environment

Emulation

Verilog^TM: it is, no

Synthesis

Design Compiler^TM: it is, no

Place and route

Apollo^TM: it is, no

Additionally, system 10 also provides for adding the option of other functional units, such as 32 integers take advantage of/calculate except unit or floating-point Art arithmetic element；MMU；RAM and ROM option on chip；The relatedness of cache memory；Strengthen DSP and coprocessor command set；The cache memory of write-back；Multiprocessor synchronizes；The inference that compiler guides；And Support to additional CAD encapsulation.Can be used for those config options of a given configurable processor, preferably at portion They are listed by definition file (such as that one shown in appendix A), in order to select suitable option, system once user 10 use it for syntax inspection etc..

From the above it can be seen that automatic processor configuration system 10 provides a user with the configurable of two kinds of broad types Property 300, as shown in Figure 5: extensibility 302, it allows user from function arbitrary defined in search and structure, and can revise Property 304, it allow user from predetermined, select inside affined set of choices.In the range of alterability, system permits Permitted the binary system of some characteristic and selected 306, such as, it should a MAC16 or DSP is added to processor 60 and its The parameter declaration 308 of his processor characteristic, number that the latter is such as interrupted and the size of cache memory.

In above-mentioned config option, many is all that professional person is familiar with；But, also other merits attention.Example As, RAM and ROM option allows designer to bring scratch-pad storage or firmware into processor itself.Processor 10 can be from Instruction fetch or read-write data in these memorizeies.The size and location of memorizer is configurable.In the present embodiment, these are deposited Each in reservoir is to be accessed for as an additional collection in the cache memory of a set associative. By comparing with a single labelling row, just hit at first time in memory can be detected.

Owing to each high precedence interrupts needing 3 special registers, expense is relatively big, so system 10 is for interrupting (realizing various 1 grade of interruption) and high precedence interrupt option (realizing 2-15 level to interrupt and various not maskable interrupts) carry For independent config option.

The MAC16 (being shown in the 90 of Fig. 2) with 40 bit accumulator options with the addition of the multiplier/adders merit of a kind of 16 Can, the latter has the accumulator of 40,8 16 positional operand depositors and one group of compound instruction, and it is by multiplication, tired Add, operand loads and address updates instruction and combines.Can under conditions of parallel with multiplication/accumulating operation, from 16 paired figure places are loaded into operand register by memorizer.This unit can support twice loading of each cycle and 1 The various algorithms of secondary multiplication/accumulating operation.

Debugging module (shown in Figure 2 92) on chip is used to go access process device 60 internal by jtag port 94 , the visible state of software.The generation that module 92 is exclusions provides to be supported, makes processor 60 enter debud mode；Access All of program visible memory or memory location, perform any instruction that processor 60 is configured to perform；Amendment program Enumerator PC makes it jump to the desired location in code；And one section of application program, it allows to return to normal operation mode, This mode is outside from processor 60, triggers via jtag port 94.

Once processor 10 enters debud mode, it just from the external world wait about an effective instruction via The instruction that jtag port 94 is scanned into.Once processor 10 hardware realize be produced, module 92 just by with Debug this system.Can be via the debugging routine run on a distant place main frame to control the execution of processor 10.Adjust Examination program sets up interface via jtag port 94 with processor, and uses the ability of the debugging module 92 on chip determine and control The state of processor 10 processed and control the execution of each instruction.

Up to 3 32 register/intervalometers 84 can be configured.This makes the use of 32 bit registers make each clock Cycle and (for the intervalometer configured to each) comparand register increase by 1, and comparator is by comparand register Hold with the counting of present clock depositor is compared, for interrupting and similar function.Register/intervalometer can be configured For edging trigger, and common and high precedence internal interrupt can be produced.

Infer option by allow loading adjust to changed conditions mobile, to control flow, be allowed to flow to them and be infrequently performed Place, provide compiler scheduling on greater flexibility.Exclusions, such dress may be caused owing to loading Carry movement exclusions to be incorporated among one section of effective procedure originally not occurred.When loading is performed, with The loading of machine strain is avoided that the appearance of these exclusions, but when demanding data, is provided with a kind of exclusions.It is substituted by Once loading mistake and cause a kind of exclusions, flexible loading makes the significance bit of destination register reset (with this choosing The new processor state that item is relevant).

Although core processor 60 is preferably provided with the pipeline synchronization ability that some is basic, but when a system uses multiple During processor, need certain communication between each processor or synchronize.In some cases, use such as inputs and exports Motor synchronizing communication technology as queue.In other cases, shared memory model is used for communication, and deposits owing to sharing Reservoir does not provide required semanteme, it is therefore necessary to provide the instruction set supporting to synchronize.There is acquisition for example, it is possible to add and release Put loading and the storage instruction of semanteme (function).It is likely to be used for synchronizing and data in those different memory locations so that Must keep in the such multicomputer system of accurate order between each synchronization is quoted, this is for controlling memory reference Order is useful.Other instructions can be used to generate the signal known to industry.

In some cases, shared memory model is used for communication, and owing to shared memorizer does not provide required Semanteme, it is therefore necessary to provide the instruction set supporting to synchronize.This point is completed by multiprocessor the synchronization option.

In each config option, perhaps most important be exactly defining of TIE instruction, thus sets up the finger of designer's definition Make performance element 96.It is positioned at the TIE of the Tensilica company exploitation of California Santa Clara^TM(Tensilica instruction set expands Exhibition) to allow user be the various functions that its application program describes customization with the form of extension and new instruction, in order to expand basic ISA.Additionally, due to the motility of TIE, it can be used to describe user's unalterable ISA part；So, whole ISA can be used to as one man produce SDK 30 and hardware embodiments describes 40.TIE explanation uses multiple long-pending Wooden unit, as follows to the attribute description of each new instruction:

Instruction field instruction class

Instruction operation code instruction semantic

Instruction operands constant table

Instruction field statement field is used to improve the readability of TIE code.Each field is to gather together and with one The subset or chain of other each fields that individual name is quoted.In instructing at one, the complete or collected works of each bit are exactly five-star super Collect field inst, and this field can be divided into several less field.Such as,

Field x inst [11:8]

Field y inst [15:12]

Field xy { x, y}

By two 4 bit fields, x and y, it is defined as son field (respectively, bit 8-11 and 12-of highest field inst 15), and by 8 bit fields xy it is defined as the chain of two fields of x and y.

Statement opcode is that each coding specific field defines each operation code.It is intended to specify each operand (respectively to post Storage or each constant the most immediately) each instruction field, if each operation code being prepared as so defining is used, then it is first necessary to use word Section statement is defined, and is then defined with operand statement.

Such as,

Opcode acs op2=4 ' b0000 CUST0

Opcode adse1 op2=4 ' b0001 CUST0

Operation code CUST0 according to the previous definition binary constant 0000 of one group of 4 bit length (4 ' b0000 represent) defines Two groups of new operation codes, acs and adse1.The preferably TIE explanation of core I SA has following statement

Field op0 inst [3:0]

Field op1 inst [19:16]

Field op2 inst [23:20]

Opcode QRST op0=4 ' b0000

Opcode CUST0 op1=4 ' b1000 QRST

A part as its basic definition.Therefore, the definition of acs and adse1 makes TIE compiler produce respectively The instruction decoding logic represented by following statement:

Inst [23:0]=0,000 0110 xxxx xxxx xxxx 0000

Inst [23:0]=0,001 0110 xxxx xxxx xxxx 0000

The instruction operands statement operand each depositor of mark and immediately constant.But, it is being one by a field definition Before individual operand, it should be defined as a field as above in advance.If this operand be one the most normal Number, then can produce the value of this constant, or it can be from the constant table of a predefined as described below from this operand Middle value.Such as, in order to an immediate operand is encoded, TIE code

Field offset inst [23:6]

operand offests4 offset{

Assign offsets4={ { 14{offset [17] } }, offset} < < 2；

}{

Wire [31:0] t；

Assign t=offsets4 > > 2；

Assign offset=t [17:0]；

The field of 18 of one entitled offset of definition, it preserves a signed number and an operand Offsets4, it is stored in 4 times of the number among offset field.The decline of Operand statement is actually described in Verilog^TMOne son of HDL is concentrated in order to carry out the circuit calculated, and above-mentioned HDL is used to describe combinational circuit, as specially As industry personage is known.

Here, wire statement defines the logic wiring of the entitled t that one group of width is 32.After wire statement the 1st The logical signal of individual assign statement appointment driving logic wiring is the constant offsets4 of right shift, and the 2nd Assign statement specifies low 18 of t to be put into offset field.1st assign statement is directly specified as offset's The value of one chain operand offsets4, and 14 copy of its sign bit (position 17) by one move to left two with With.

For a constant table operand, TIE code

table prime 16{

2,3,5,7,9,11,13,17,19,23,29,31,37,41,43,47,

53

}

operand prime_s s{

Assign prime_s=prime [s]；

} {

Assign s=prime_s==prime [0]？4 ' b0000:

Prime_s==prime [1]？4 ' b0001:

Prime_s==prime [2]？4 ' b0010:

Prime_s==prime [3]？4 ' b0011:

Prime_s==prime [4]？4 ' b0100:

Prime_s==prime [5]？4 ' b0101:

Prime_s==prime [6]？4 ' b0110:

Prime_s==prime [7]？4 ' b0111:

Prime_s==prime{8]？4 ' b1000:

Prime_s==prime [9]？4 ' b1001:

Prime_s==prime [10]？4 ' b1010:

Prime_s==prime [11]？4 ' b1011:

Prime_s==prime [12]？4 ' b1100:

Prime_s==prime [13]？4 ' b1101:

Prime_s==prime [14]？4 ' b1110:

4′b1111；

}

(following the numeral after table name word is each unit in table to utilize table statement to define constant array prime Element number), and use operand s as enter this table prime an index, with thinking that operand prime_s encodes One numerical value (notes the Verilog when index of definition^TMThe use of each statement).

Each operation code and each operand are linked together in a kind of common format by instruction class statement iclass.At statement All instructions defined in iclass all have identical form and operand purposes.Before one instruction class of definition, it First each composition should be defined as field, is then defined as operation code and operand.Such as, at previously defined operand On the basis of code used in the example of acs and adse1, set up additional statement

Operand art t { assign art=AR [t]；}{}

Operand ars s { assign ars=AR [s]；}{}

Operand arr r { assign AR [r]=arr；}{}

Use operand statement to define 3 register operand art, ars and arr (to note again that in this defines Verilog^TMThe use of each statement).Subsequently, iclass statement iclass viterbi{adse1, acs}{outarr, inart, Inars} assigned operation number adse1 and acs belongs to the common class of instruction viterbi, and above-mentioned instruction viterbi takes two depositors Operand art and ars is as input, and output is written among register operand arr.

Instruction semantic statement semantic describes and uses Verilog^TMThe same subsets of (for operand is encoded) One or more instruction behavior.By in a plurality of instruction defined in a single semantic statement, can be shared some altogether With expression formula, and hardware embodiments can become more efficient.Semantic statement allows the variable used be for Each operand of each operation code defined in the opcode list of statement, and operate for each in opcode list The single-bit variable that code is specified.This variable has the name identical with operation code, and when this operation code is detected, Its valuation is 1.It is used for calculating section (Verilog^TMSub-portion), in order to indicate the appearance of command adapted thereto.

Such as, TIE code definition one new instruction ADD8_4,48 positional operands in 32 words are followed separately by it 48 positional operands corresponding in one 32 word are added；Also defining another new command M IN16_2, it is at 32 words In, carry out the selection of the minima of two 16 positional operands, and in another 32 words, respective 16 behaviour can be read Count:

Opcode ADD8_4 op2=4 ' b0000 CUST0

Opcode MIN16_2 op2=4 ' b0001 CUST0

Iclass add_min { ADD8_4, MIN16_2}{outar r, inars, in art}

Semantic add_min { ADD8_4, MIN16_2}{

Wire [31:0] add, min；

Wire [7:0] add3, add2, add1, add0；

Wire [15:0] min1, min0；

Assign add3=art{31:24}+ars{31:24]；

Assign add2=art [23:16]+ars [23:16]；

Assign add1=art [15:8]+ars [15:8]；

Assign add0=art{7:0]+ars [7:0]；

Assign add={add3, add2, add1, add0}；

Assign min1=art [31:16] < ars [31:16]？Art{31:16}:

Ars [31:16]；

Assign min0=art [15:0} < ars [15:0]？Art [15:0]:

Ars [15:0]；

Assign min={min1, min0}；

Assign arr=((32{{ADD8_4}}}) & (add))

(({32{{MIN16_2}}}) & (min))；

}

Here, op2, CUST0, arr, art and ars are predefined operation numbers as noted above, and opcode Effect as above is played with iclass statement.

Semantic statement is specified by newly instructing the calculating carried out.As known to professional person, Semantic The 2nd row in statement specifies the calculating carried out by new ADD84, the 3rd and the 4th row therein to specify to be entered by new MIN16_2 The calculating of row, and last column of this program segment specifies and result write arr depositor.

Returning to the discussion of user's input interface 20, once user have input her required configuration and scaling option, Set up system 50 just the most down to carry out.As it is shown in figure 5, set up system 50 to receive the configuration being made up of each parameter of user setup The extendible various features illustrated and designed by user, and by them with defining the every attached of core processor architecture Add parameter (such as, the various features that user can not revise) to combine, to generate the configuration instruction describing whole processor 100.Such as, in addition to the configuration of user's selection arranges 102, set up system 50 and can also add parameters, with thinking place The figure place of the physical address space appointment physical address of reason device, the 1st article of instruction that processor 60 is the most pending, etc. Deng.

In order to the instructions realized as kernel instruction in configurable processor being described and via configuration choosing The example of the instructions being selected to can use of item, Tensilica company the " Xtensa provided^TMInstruction set architecture (ISA) reference manual " (revising 1.0 editions) be incorporated into herein the most as a reference.

Configuration instruction 100 also includes an ISA encapsulation, including the TIE language statement of the basic ISA of appointment, Yong Huyi Chosen any additional encapsulation, such as coprocessor encapsulation 98 (see Fig. 2) or a DSP encapsulation, and provided by user Any TIE extension.Additionally, configuration instruction 100 can also have multiple statement arranges mark, whether represent some architectural feature Await including processor 60 in.Such as

IsaUseDebug 1

IsaUseInterrupt 1

IsaUseHighPriorityInterrupt 0

IsaUseException 1

Represent the debugging module 92 that this processor will include on chip, interrupt device 72 and exclusions manages, but not Device is interrupted including high precedence.

Configuration instruction 100 is used can automatically to produce the following:

The instruction decoding logic of processor 60；

Illegal command detection logic for processor 60；

The ISA private part of assembly program 110；

The special support program of ISA of compiler 108；

The ISA private part (being used by debugging routine) of disassembler 110；And

The ISA private part of simulated program 112.

Owing to a kind of important allocative abilities is exactly including in of the encapsulation of designated order, so automatically producing these projects It is valuable.For some thing, if instruction has been configured, then in each instrument, it is possible to use CC condition code Realizing this step, to manage this instruction, but this is hard-to-use；The more important thing is, it does not allow system designer easy The system that ground is for he adds instruction.

Except using configuration instruction 100 as in addition to the input of designer, it is also possible to accept the objectives, and allow Set up system 50 and automatically determine configuration.Designer can be that the objectives specified by processor 60.Such as, clock frequency, area, Cost, typical case power consumption and maximum power dissipation etc. can become target.Due to some target, to there is contradiction (such as, the most logical Cross increase area or power consumption or the two increase to improve performance simultaneously), subsequently, set up system 50 and consult to search engine 106 Ask, to determine the set of available config option, and determine how a kind of calculation simultaneously reaching every input target from attempt Method there arranges each option.

Search engine 106 includes a data base, and it has each row describing the various impacts measured.Each row can be specified A kind of specific configuration is arranged on a kind of measuring and has effect that is addition, multiplication or that limit.Each row can also be labeled For needing other config option as prerequisite, or it is flagged as with other each options are incompatible.Such as, simply divide Judge option can be each periodicity (a kind of determiner of CPI performance) the appointment multiplication instructed or addition Effect, restriction to clock frequency, the additive effect to area and the additive effect etc. to power.It can be marked as With branch's determining program of a kind of preference is incompatible, and depend on instruction fetch queue is dimensioned at least two row.This The numerical value of a little effects can be a function of the parameter size of table (the such as branch judge).In general, with estimating The various functions of value represent each row of data base.

Different algorithms is possibly used for finding the configuration closest to reaching every input target and arranges.Such as, Yi Zhongjian Single knapsack encapsulation algorithm considers each option according to numerical value divided by the sequence of cost, and accepts any to increase numerical value Cost is limited in the option explanation specifying below limit value simultaneously.So, such as, in order to make maximizing performance, protect simultaneously Hold power and specify numerical value less than one, divided by power, each option can be ranked up according to performance, and accept can increasing property Can but without departing from each option of power limit.More complicated packsacks algorithm provides backtracking to a certain degree.

A kind of for determining that from target and design database the most different algorithm kind of configuration is based on simulated annealing.Respectively One random initial set of item parameter is used as starting point, is then come really by one overall application program function of assessment The fixed change accepted or refuse individual parameters.When connecing according to a threshold value (along with the carrying out optimized, this threshold value reduces) probability During by the change born, the improvement of application program function is generally accepted.In this system, building from every input target should Use program function.Such as, given the objectives are: performance>200, power<100, area<4, according to power, area, and property The priority of energy, it is possible to use following application program function:

Max ((1-Power/100) * 0.5,0)+(max ((1-Area/4) * 0.3,0) * (if Power < 100 then 1 else (1-Power/100) * * 2))+(max (Performance/200*0.2,0) * (if Power < 100 then 1 Else (1-Power/100) * * 2)) * (if Area < 4 then 1 else (1-area/4) * * 2))

The reduction of its return power consumption, until it is less than 100, is neutral subsequently, returns the minimizing of area until it is less than 4, It is neutral subsequently, and returns the raising of performance, until it is higher than 200, be neutral subsequently.Also have such parts: work as power During beyond designated value, reduce the use of area, when power or area are beyond designated value, reduce the use of performance.

Both algorithms and other algorithms can be used to search for and meet the various configurations specifying target.It is important that The design of configurable processor is illustrated in a design database, and this data base has prerequisite and every The explanation of incompatibility option, and the impact that difference is measured by each config option.

The example that we are given has used every hardware target, and these targets are general, and do not rely on operation Special algorithm on processor 60.Described algorithm can be utilized to select match with specific user program Configuration.Such as, user program can run on and have on the accurate emulator of cache memory, to measure inhomogeneity The number of the cache memory of type, these cache memories have different characteristics, the most different sizes, difference Live width and different relatedness is set.The result of these emulation can be added to the data that searching algorithm 106 is used In storehouse, above-mentioned algorithm is described to help to select hardware embodiments explanation 40.

It is likewise possible to for some appearance instructed to modify user's algorithm, above-mentioned instruction can be the most implanted Among hardware.Such as, if user's algorithm takes a significant amount of time carries out multiplying, then search engine 106 can be automatically A hardware multiplier is included in suggestion in.Such algorithm is not necessarily limited to consider a kind of user's algorithm.User can be by one group Algorithm sends into system, and search engine 106 can select such a to configure, and on average, such configuration is to user's journey The set of sequence is useful.

In addition to the pre-configured characteristic of selection processor 60, searching algorithm can be utilized to automatically select or to The TIE extension that family suggestion is possible.Provide every input target, and provide the user program that may write with C programming language Example, these algorithms will advise possible TIE extension.For not having the TIE of state to extend, pattern match journey can be used Sequence is carried out embedded category and is similar to the various instruments of compiler.These pattern matchers according to bottom-up approach at expression formula node Middle search can connect, with an individual character, the multiple byte instruction pattern that instruction replaces.Such as, user's c program contains following statement:

X=(y+z) < < 2；

X2=(y2+z2) < < 2；

Two numbers are added on two diverse locations by pattern matcher by this user of discovery, and result is moved to left two Position.System will produce the probability of a TIE instruction (result is moved to left two by two number phase adductions) and add among data base.

Set up system 50 and follow the tracks of the TIE instruction that many bars are possible, together with them, a counting of how many times occurs.Use one Planting trace tool, system 50 is also followed the tracks of during the whole execution of this algorithm, the frequent degree that each instruction is performed.Make With a hardware emulator, system 50 is followed the tracks of to realize each possible TIE instruction, and the expense of hardware has much.These Numeral is admitted to search for heuristic algorithm, in order to select one group of possible TIE instruction that can make every input target maximum；Above-mentioned Target such as performance, code size, hardware complexity etc..

Similar but more strong algorithm is used to the possible TIE instruction finding have state.Several different Algorithm is used to detect different types of chance.A kind of algorithm uses the instrument of similar compiler to scan user program, and And detect this user program the need of the more depositor that can be provided by than hardware.As many practitioners institute of industry is ripe As knowing, by the counting to register spilling, just this situation can be detected, and with the pattern after the compiling of personal code work Recovered (taking-up).The instrument being similar to compiler advises an association with additional hardware registers 98 to search engine Processor, but it is only supported for personal code work, the computing with the part repeatedly overflowed and recover.This instrument is responsible for logical Know that the data base that search engine 106 is used claims: about coprocessor hardware cost estimation and about user's algorithm Can how estimation be improved.As it has been described above, whether proposed coprocessor 98 can be caused more by search engine 106 Good configuration this point makes the judgement of the overall situation.

Alternatively, or in conjunction, and the instrument being similar to compiler checks whether user program uses position Masking operation, to ensure that some variable is never more than some limit value.In this case, this instrument is advised to search engine 106 At one association using the data type consistent with user's limit value (such as, 12 or 20 or the integer of any other size) Reason device 98.The 3rd kind of algorithm used in another embodiment, for the user program write with C Plus Plus, is similar to compile The instrument of translator program finds that a lot of times all consume in the computing to user-defined abstract data type.If all computings are all Based on being applicable to the data type of TIE, then this algorithm proposes in this kind of data type to search engine, at a TIE association Reason device realizes all of computing.

In order to generate the instruction decoding logic of processor 60, produce for each group of operation code defined in configuration instruction One group of signal.By simply by following statement

Opcode NAME FIELD=VALUE

It is rewritten to HDL statement

Assign NAME=FIELD=VALUE；

And will

Opcode NAME FIELD=VALUE PARENTNAME [FIELD2=

VALUE2]

It is rewritten to

Assign NAME=PARENTNAME & (FIELD==VALUE)

Just can produce this code.

The generation of depositor interlocking and pipelined digital signal also has been carried out automatization.This logic is also based on configuration instruction In information and produce.Information and the latency of this instruction is used based on the depositor being included in iclass statement, when When the source operand of present instruction depends on the target operand of the previous instruction not yet completed, the logic produced is inserted Enter a hang-up (or bubble).The mechanism realizing this hang-up function realizes as a part for hardcore.

By individual other command signal produced being carried out NOR-operation, and its result is retrained bar with their field Part is carried out and computing, produces illegal command detection logic:

Assign illegalinst=！(INST1|INST2…|INSTn)；

Each instruction decoding signal and illegal command signal can be used as the output of decoder module and as hand-written processor The input of logic.

In order to produce other processor feature, the present embodiment uses the Verilog of configurable processor 60^TMDescribe, and And strengthened with a kind of preprocesor language based on Perl.Perl is a kind of full characteristic language, including complicated control Structure processed, subprogram and I/O device.It is referred to as TPP in one embodiment of the invention (such as the source listing in Appendix B Shown in, TPP itself is one section of Perl program) preprocessor, scan its input, some line identifier for using preprocessor The pre-processor code (those with branch as prefix are used for TPP) that language (Perl is used for TPP) is write, and build one section of journey Sequence, including the row extracted and statement, to produce the text of other row.The row of non-preprocessor can have the table of embedding Reaching formula, on its position, the expression formula produced as the result of TPP process is replaced.Then, the program obtained by execution with Produce source code, i.e. in order to describe the Verilog of detailed processor logic 40^TMCode (as will see below that Sample, TPP is also used to configuration software developing instrument 30).

When for this occasion, due to it allow will such as configuration instruction inquiry, conditional expression and iteration structure that The structure of sample brings Verilog into^TMAmong code, and as noted earlier, it is allowed to according at Verilog^TMCode Among configuration instruction 100 realize embed expression formula, so TPP is a kind of strong pretreatment language.Such as, based on Data base querying TPP distribution be similar to

；$ endian=config_get_value (" IsaMemoryOrder ")

Here, config_get_value be illustrate in order to query configuration 100 TPP function, IsaMemoryOrder It is the mark arranged in configuration instruction 100, and $ endian is by afterwards for generating Verilog^TMThe one of code Individual TPP variable.

TPP conditional expression can be

；if(config_get_value(“IsaMemoryOrder”)eq“LittleEndian”)

；{ perform Verilog according to low level formerly order^TMCode }

；Otherwise

；{ perform Verilog according to uimsbf unsigned integer most significant bit first order^TMCode }

Iterative cycles can be realized, such as by TPP structure

；For ($ i=0；$i<$ninterrupts；$i++)

；{do Verilog^TM code for each of 1...N interrupts}

Here, $ i is a TPP loop index variable, and $ ninterrupts is the number of the interruption specified for processor 60 Mesh (uses config_get_value to obtain from configuration instruction 100).

Finally, TPP code can be embedded into Verilog^TMExpression formula, such as

Wire [` $ ninterrupts-1`:0] srInterruptEn；

Xtscenflop # (` $ ninterrupts`) srintrenreg (srInterruptEn, srDataIn_W [` $ Ninterrupts-1`:0], srIntrEnWEn,！CReset, CLK)；

Here, $ ninterrupts definition is interrupted number also determines xtscenflop module (trigger is original Module) width (representing with bit)；

SrInterruptEn is the output of trigger, is defined as a string an appropriate number of bit；

SrDataIn_W is the input of trigger, but only inputs relevant bit according to the number interrupted；

SrIntrEnWEn is the write enable signal of trigger；

CReset is intended for the removing input of trigger；And

CLK is intended for the input clock of trigger.

Such as, the following input being sent to TPP is given:

； # Timer Interrupt

； if (SIsaUseTimer) {

Wire [`Swidth-1`:0] srCCount；

wire ccountWEn；

//--------------------------------------------------------------

// CCOUNT Register

//---------------------------------------------------------------

Assign ccountWEn=srWEn_W && (srWrAdr_W==`SRCCOUNT)；

Xtflop # (`Swidth`) srccntreg-(srCCount, (ccountWEn？SrDataIn_W:

SrCCount+1), CLK)；

；For (Si=0； Si<STimerNumber； $i++){

//--------------------------------------------------------------

// CCOMPARE Register

//--------------------------------------------------------------

-

Wire [`Swidth-1`:0] srCCompare` $ i`；

wire ccompWEn`$i`；

Assign ccompWEn`Si`=srWEn_W && (srWrAdr_W==`SRCCOMPARE` $ i`)；

xtenflop #(`Swidth`) srccmp`$i`reg

(srCCompare` $ i`, srDataIn_W, ccompWEn`Si`, CLK)；

Assign setCCompIntr` $ i`=(srCCompare` $ i`==srCCount)；

Assign clrCCompIntr` $ i`=ccompWEn` $ i`；

； }

； } ## IsaUseTimer

and the declarations

$ IsaUseTimer=1

$ TimerNumber=2

$ width=32

TPP generates

Wire [31:0] srCCount；

wire ccountWEn；

//--------------------------------------------------------------

// CCOUNT Register

//--------------------------------------------------------------

Assign ccountWEn=srWEn_W && (srWrAdr_W==`SRCCOUNT)；

Xtflop # (32) srccntreg (srCCount, (ccountWEn？SrDataIn_W:

SrCCount+1), CLK)；

//--------------------------------------------------------------

// CCOMPARE Register

//--------------------------------------------------------------

Wire [31:0] srCCompareO；

wire ccompWEnO；

Assign ccompWEnO=srWEn_W && (srWrAdr_W==SRCCOMPAREO)；

xtenflop #(32) srccmpOreg

(srCCompareO, srDataIn_W, ccompWEnO, CLK)；

Assign setCCompIntrO=(srCCompareO==srCCount)；

Assign clrCCompIntrO=ccompWEnO；

//--------------------------------------------------------------

// CCOMPARE Register

//--------------------------------------------------------------

Wire [31:0] srCComparel；

wire ccompWEnl；

Assign ccompWEnl=srWEn_W && (srWrAdr_W==`SRCCOMPARE1)；

xtenflop #(32) srccmplreg

(srCComparel, srDataIn_W, ccompWEnl, CLK)；

Assign setCCompIntrl=(srCComparel==srCCount)；

Assign clrCCompIntrl=ccompWEnl；

The HDL so produced describes 114 hardware being used to synthesize for realizing processor, such as in brick 122 Use the DesignCompiler made by Synopsys company^TM.Then, use such as public by Cadence in brick 128 The Silicon Ensemble that department provides^TMOr by Avent！The Apollo that company provides^TMResult is laid out and connects up. The most each parts are complete by wiring, and in brick 132, use is such as provided by Synopsys company PrimeTime^TM, its result is used for reversely annotation and the timing checking of wiring.The product so processed is exactly a hardware spy Soliciting articles part 134, it can be used by the user to configure capture program 20 and provide further input, in order to further joined Put iteration.

As above in conjunction with illustrated by logic composite part 122, one of result of configuration processor 60 is exactly one The hdl file of group customization, by using any one of synthetics of multiple business, just can therefrom obtain special door one The embodiment of level.The Design Compiler that Synopsys company provides^TMJust it is a tool that.Correct in order to ensure And the embodiment of high performance door one-level, needed for present embodiments providing in user rs environment as making building-up process automatization Manuscript.When providing such manuscript, institute's facing challenges is exactly the enforcement supporting multiple synthetic methodology and different user Target.In order to meet the 1st kind of challenge, manuscript cutting is less and the most complete manuscript by the present embodiment.One so Example be just to provide one read manuscript, it can read and configure 60 relevant all hdl files with specific processor, and provide Unique timing demands that one timing constraint manuscript is arranged in processor 60, and a manuscript, it is can be used in The mode of the place and route of door one-level netlist writes out synthesis result.In order to meet the 2nd kind of challenge, the present embodiment is that each is real Execute target and a kind of manuscript is provided.One such example is just to provide a kind of manuscript in order to obtain the fastest circulation time, and one Plant the manuscript in order to obtain minimum silicon area, and a kind of manuscript in order to obtain lowest power consumption.

Also these manuscripts are used in other stages of processor configuration.Such as, the HDL model of processor 60 is once write Go out, it is possible to carry out the correct operation of verification process device 60 with one section of simulated program, as above in conjunction with illustrated by brick 132 Like that.Generally, by running multiple test program in simulated processor 60 or diagnotor completes this step.At quilt Running a kind of test program in the processor 60 of emulation and may need many steps, such as producing one of test program can perform Image, produce and can represent with a kind of of executable image that simulated program 112 reads, generate a temporary transient layout with Just collect simulation result, be provided with post analysis and be used, analyze simulation result, etc..In the prior art, multiple original text abandoned is used Originally this step was completed.These manuscripts have and include knowledge about simulated environment, such as, should include which hdl file in, In bibliographic structure, these files can be found in where, needs which file in testboard, etc..In current design, preferably Mechanism write a manuscript model replaced by parameter and configure out exactly.This configuration mechanism also uses TPP to produce The list of file required in simulations.

And, in the proof procedure of brick 132, it usually needs write other manuscript, in order to allow designer run A series of test program.Being usually used to run regression routine group, it makes designer believe given the changing in HDL model Change will not introduce new mistake.Due to return that manuscript has that many includes about filename, position etc. it is assumed that so they also Often it is dropped.As it has been described above, for a single test program, run manuscript to generate one, recurrence manuscript write Become a model.When configuration, configure this model by parameters being replaced into actual numerical value.

Last step that RTL describes the process being converted to hardware embodiments is used place and route (P& exactly R) abstract netlist is converted to geometric representation by software.The connectivity of P&R software analysis netlist and determine the location of each unit. Then it attempts going to draw the connection between all unit.Clock network is generally by special attention and as last Individual step connects up.This process may be under the help of provides some information to each instrument, such as, wish to be drawn close by which unit Together (referred to as software cluster), the relative position of each unit, it is desirable to which net has little propagation delay, etc..

In order to make this process become easier to and ensure compliance with required performance objective circulation time, area, merit Consumption configuration mechanism is that P&R software generation one is solicited contributions originally or input file.These manuscripts possibly together with all if desired for how many power supplys With ground link, how these lines should be distributed along border, etc..By inquiry, one data base produces these original texts This, in this data base, containing being related to generate how much software cluster, and which unit should include them in, which Net is important in timing, etc..These parameters according to which option the most chosen change.These manuscripts must be Configured according to the various instruments being prepared for being laid out and connecting up.

Alternatively, this configuration mechanism can ask more information from user there, and is sent to P&R manuscript.Such as, Interface can should insert how many buffer stages to the aspect ratio needed for user requires final layout in clock trees, input and Which face output pin should be arranged on, the position that these pins are relative or absolute, power supply and the width of earth bus and position Put, etc..Then these parameters will be sent to P&R manuscript, the layout needed for producing.

Can use more complicated manuscript, it supports the most complicated clock trees.A kind of in order to reduce power consumption Clock signal is gated by common prioritization scheme exactly.But, it is relatively difficult owing to balance the time delay of all branches , so this makes the synthesis of clock trees become a more difficult problem.Configuration interface can require correct each to user Unit is used for clock trees, and carries out all or part of clock trees synthesis.By being informed in each gated clock position in this design In where, and the time delay that assessment is from buffered gate (qualifying gate) to the input end of clock of each trigger, with regard to energy Accomplish this step.Then, it clock tree synthesis tool will be provided an item constraint condition, the i.e. time delay of clock buffer will be with respectively The time delay of door control unit matches.In the ongoing illustrated embodiment, a general Perl manuscript this step is completed.This original text The gated clock information that this reading is selected according to which option by Configuration Agent business and produces.Once this design is by layout Complete with wiring, and before final clock trees has synthesized, just run Perl manuscript.

Above-mentioned special handling process can also be made further improvement.Particularly, we will describe a kind of process, By it, user just can almost instantaneously obtain similar hardware characteristics information, runs those without taking several hours Cad tools.This process has several step.

The 1st step during this is exactly the group that the set of all config options is divided into each orthogonal option, makes Obtain an option in a group follows each option in any other group unrelated on the impact of hardware characteristics.Such as, MAC16 Unit is unrelated with any other option on the impact of hardware characteristics.So, an option only having MAC16 option it is formed for Group.Owing to the impact of hardware characteristics to be depended on the particular combination of these options, so more complicated example is exactly containing in each Disconnected option, each high level interrupt option and an option group of Timer Options.

2nd step is exactly to characterize each option group impact on hardware characteristics.By obtaining in this set, each option The various combinations impact on hardware characteristics, realize this sign.For each combines, use a kind of prior description Process obtains this feature, in above process, derives an actual embodiment and measures its hardware characteristics.Such Information is stored among an assessment data base.

Last step is exactly to derive special formula, with curve matching and interpositioning, calculates in each option group, The impact on hardware characteristics of the particular combination of each option.According to the character of each option, use different formula.Such as, due to often One additional interrupt vector all adds roughly the same logic to hardware, and we use linear function to simulate it to hardware Impact.In another example, there is the timer units needing high level interrupt option, accordingly, with respect to Timer Options to firmly The formula of the impact of part relates to the condition formula of several option.

How selection with regard to architecture affects the size offer running time performance and code of application program quickly Feedback is useful.Selected from several groups of benchmarks of multiple applications.For each field, build in advance A vertical data base, when how it affects the operation of each application program in this field to different architecture Design decision-makings Between performance and code size make assessment.Along with user changes the design of architecture, for the application that user is interested Or for multiple fields, data base is inquired about.Assessment result is fed to user so that she can become with hardware in software benefit One estimation of compromise upper acquisition between Ben.

Easily RES(rapid evaluation system) can be extended, in order to make processor further with regard to how revising a kind of configuration Ground optimizes advises.Each config option is connected by such example exactly with set of number, above-mentioned numeral Represent the impact on the such as increase of area, time delay and power of various cost metrics of this option.Use RES(rapid evaluation system) makes Calculate a kind of given option and the impact increasing cost is become easy.It only relates to calling for twice, wherein assessment system Once there is option, once there is no option.The cost variance of this twice assessment represents the impact on increasing cost of this option.Such as, By the area cost of two kinds of configurations (with and without MAC16 option) is estimated, calculate MAC16 option to increasing area Impact.Difference during MAC16 option is shown subsequently in interactive mode configuration system.Such a system can guide user to lead to Cross a series of single step and improve the solution arriving a kind of optimization.

Turning now to software one side of automatic processor configuration process, this embodiment of the present invention is configured with software development Instrument 30 so that they are special for this processor.Configuration process starts from SDK 30, and this instrument can be promoted It is applied to multiple different system and instruction set architecture.Such retargetable instrument be extensively studied and Known to industry.This embodiment uses the instrument of GNU race, and this is a kind of free software, compiles journey including such as GNU C Sequence, GNU assembly program, GNU debugging routine, GNU chain program, GNU tracing program, and various utility program.Then, pass through Directly describe each several part producing software from ISA, and by using TPP that each several part of hand-written software is modified, come Automatically configure these instruments 30.

GNU C compiler can be configured according to several distinct methods.After providing the description of core I SA, in compilation journey In sequence, many logics depending on machine can use hand-written.In the instruction set of configurable processor, this of compiler Individual part is common, and carrys out repurposing with hands and allow to be to obtain optimum to carry out fine tuning.But, even if to compiling For this handwritten portions of program, some code remains automatically generation from ISA describes.Particularly, ISA describes and determines The set of each constant value of justice, they may be used for each immediate field of various instruction.For each immediate field, all produce A raw discriminant function, in order to check a most specific constant value can be encoded.When for processor During 60 generation code, compiler just uses these discriminant functions.This aspect configuring compiler carries out automatization and disappears Except based on ISA describe and compiler between inconsistent chance occurs, as long as and it make with minimum effort with regard to energy Change the constant in ISA.

Process TPP carries out pretreatment, if the just configuration of the stem portion of compiler is good.For selecting to control by parameter For each config option of system, parameters corresponding in compiler is all arranged via TPP.Such as, compiler tool There is an indexed variable, in order to represent that target processor 60 uses uimsbf unsigned integer most significant bit first order or low level formerly order, and use Article one, this variable is automatically configured by TPP order, and mentioned order reads sequence parameter from configuration instruction 100.TPP is also It is used to according to whether encapsulation corresponding in configuration instruction 100 is activated, enables conditionally or the hands of anergy compiler Work coded portion, this part produces and encapsulates for optional each ISA.Such as, if configuration instruction only includes the option 90 of MAC16, In compiler, then only include the code producing every multiplication/accumulated instruction.

Compiler is also configured as supporting the instructions that the designer specified via TIE language defines.This support There are two levels.In lowest level, the instructions of designer's definition can be used for grand, intrinsic function, or is compiled Code in online (outside) function.This embodiment of the present invention produces a C language header file, and it will be at line function It is defined as " in-line assembly " code (standard feature of GNU C compiler).Be given designer definition operation code and After the TIE explanation of operations number, the process generating header file is namely converted to the in-line assembly sentence of GNU C compiler The flat-footed process of one of method.A kind of alternate embodiment generates containing each of C preprocessor that grand (they refer to Determine the instructions of in-line assembly) header file.Another alternative plan uses TPP directly to be added by intrinsic function Among compiler.

Use the chance of instructions by allowing compiler automatically identify, the every finger to designer's definition is provided 2nd layer of support of order.Can directly be defined these TIE instruction by user or automatically generate during configuration.Compiling Before translating user application, TIE code is automatically watched, and is converted into the C language function of equivalence.The step for Sample is used to every TIE instruction is carried out high-speed simulation.The C language function of equivalence is partly compiled as compiler and is used Based on tree-shaped intermediate representation.For each TIE instruction, this expression is stored among a data base.When with When family application program is compiled, a part for compilation process is exactly a stage mode matcher.User application is compiled as Based on tree-shaped intermediate representation.In user program, pattern matcher all starts scanning to every one tree from bottom.In scanning Each step in, what pattern matcher inspection was planted in current point indicates whether that be matched with in data base appoints immediately What TIE instruction.If there is coupling, then this coupling is registered.After completing the scanning to every one tree, farthest mate Gather selected.In this tree, maximum match is all replaced into the TIE instruction of equivalence each time.

Above-mentioned algorithm uses the chance of stateless every TIE instruction by automatically identifying.Can also use various additional Scheme automatically identify use have every TIE of state to instruct chance.One previous part describes for automatically Ground selects the algorithm with possible every TIE instruction of state.Identical algorithm is used to automatically use answers at C or C++ Instruct with the every TIE in program.When a TIE coprocessor is defined as more depositor, but the most limited During computing set, just each code region is scanned, to watch whether they there will be register spilling, and those regions The most only use the set of available computing.If such region is found, code the most in those regions will automatically It is changed to use instructions and each depositor 98 of coprocessor.The border in region produces conversion operation, in order to Data are sent into or sends coprocessor 98.Similarly, if a TIE coprocessor is defined as different size of whole Number carries out computing, and the most each code region is examined, and is the most all accessed with all data watched in this region, because its tool There is different sizes.For each region of coupling, its code is changed, and glue code is added on border.Class As, if a TIE coprocessor is defined as realizing the abstract data type of a kind of C Plus Plus, then in that data All computings in type are all replaced into the instructions of TIE coprocessor.

It should be noted that automatically suggestion TIE instruction and automatically use TIE instruction both of which are the most useful 's.Via inherent mechanism, user can artificially use proposed every TIE to instruct, and the algorithm that can will be used It is applied to every TIE instruction or each coprocessor 98 artificially designed.

How the instructions no matter designer defines produces, or via each at line function or by means of automatically Identifying, compiler is required for knowing the potential flanking effect of the instructions that designer defines so that these can be referred to by it Order is optimized and dispatches.In order to improve performance, traditional compiler optimization personal code work, in order to make required every spy Property, such as run time performance, code size or power consumption, be optimised.That as known to the professional person that same position is proficient in Sample, such optimization includes such as rearranging each instruction, or is other semantically equivalent instructions by some Instruction Replacement. In order to be optimized well,

Compiler should appreciate that each instruction is the different piece how affecting machine.Article two, to machine state The instruction that different piece carries out reading and writing can freely be reordered.Article two, a same part for machine state is conducted interviews Instruction be generally not capable of being reordered.For traditional processor, carried out the reading of state by different instructions and/or write By hardware connection, sometimes through form, enter compiler.In one embodiment of the invention, every TIE instruction is protected It is set as with keeping all states of processor 60 are read and write.This makes compiler can produce correct code, but limits Make compiler the ability that when TIE instructs is optimized code occurs.In another embodiment of the present invention, a kind of Instrument automatically reads TIE definition, and finds that any state is read or write by described instruction for each TIE instruction. Then, the amendment of this instrument is compiled the form that the optimization program of program is used, in order to accurately simulate each TIE instruction Effect.

As compiler, the part depending on machine of assembly program 110 includes part and the use automatically generated The manual coding part of TPP configuration.Some feature that all configurations are common supported by the code of hand-coding.But, collect journey The main task of sequence 110 is to encode machine instruction, and can automatically generate from ISA describes the coding of instruction with Decoding software.

Owing to, in several different software tools, coding and the decoding of instruction are all useful, so this of the present invention Software is concentrated in together by individual embodiment, in order to perform these tasks in an independent software library.Use in ISA describes Information automatically generate this storehouse.Enumerating for one of each operation code of this storehouse definition, a function, it is by operation code mnemonics Character string efficient mapping is this member enumerated (StringToOpcode), and is each group of operation code designated order length Form (InstructionLength), the number of operand, (numberOfOperands), operand field, operand class Type (that is, depositor or immediate) (operandType), binary coding (encodeOpcode), and memonic symbol string (opcodeName).For each operand field, this storehouse provides accessor's function, in order to corresponding each in coding line Bit carries out encoding (fieldSetFunction) and decoding (fieldGetFunction).All these information in ISA description It is all readily available；Produce library software and only convert this information into executable C language code.Such as, instructions Coding be recorded among a C aray variable, wherein, each row is both for the coding of a specific instruction, passes through Each opcode field is set to the numerical value specified for this instruction in ISA describes and produces above-mentioned coding； EncodeOpcode function is only one group of given operation code and returns the numerical value of this array.

This storehouse also provides for a function, in order to be decoded the operation code in binary command (decodeInstruction).This function is generated as a sequence of the switch statement of nesting, wherein, outermost Pairs of switches is tested in the sub-opcode field of the top layer of operation code hierarchical structure, and, nested switch statement pair Test in the middle-level each sub-opcode field being gradually lowered of operation code hierarchical structure.Therefore, generate for this function Code there is the structure identical with operation code hierarchical structure itself.

Being given after this storehouse of coding and decoding, the realization of assembly program 110 just becomes to be easy to.Such as, exist Instruction encoding logic in assembly program is foolproof:

AssembleInstruction (String mnemonic, int arguments [])

begin

Opcode=stringToOpcode (mnemonic)；

If (opcode==UNDEFINED)

Error(″Unknown opcode″)；

Instruction=encodeOpcode (opcode)；

NumArgs=numberOfOperands (opcode)；

For i=0, numArgs-1 do

begin

SetFun=fieldSetFunction (opcode, i)；

SetFun (instruction, arguments [i])；

end

return instruction；

end

(binary command is converted to one and closely reconfigures assembly code by this program to realize disassembler 110 Readable form) be flat-footed too:

DisassembleInstruction(BinaryInstruction instruction)

begin

Opcode=decodeInstruction (instruction)；

InstructionAddress+=instructionLength (opcode)；

print opcodeName(opcode)；

//Loop through the operands, disassembling each

NumArgs=numberOfOperands (opcode)；

For i=0, numArgs-1 do

begin

Type=operandType (opcode, i)；

GetFun=fieldGetFunction (opcode, i)；

Value=getFun (opcode, i, instruction)；

if(i！=O) print ", "；//Commaseparateoperands

//Print based on the type of the operand

switch(type)

Case register:

PrintregisterPrefix (type), value；

Case immediate:

print value；

Case pc_relative_label:

print instructionAddress+value；

//etc.for more different operand types

end

This disassembler algorithm is used for the disassembler instrument of a kind of brilliance, and is also used for debugging routine 130, to support the debugging of machine code.

With compiler is compared with assembly program 110, chain program is that ratio is less sensitive to configuration.Most chain programs It is all standard, and the part even depending on machine also depends primarily on core I SA and describes, and can be a kind of special Fixed core I SA carries out manual coding.TPP is used from configuration instruction 100, the such as such parameter of order to be configured.Target The memorizer mapping of processor 60 is other aspects of the configuration needed for chain program.With as before, specifying with TPP The parameters that memorizer maps is inserted among chain program.In this embodiment in accordance with the invention, by one group of chain program Manuscript drives GNU chain program, and these chain program manuscripts contain memory map information just.One of this scheme excellent Point is exactly, if the memorizer of goal systems maps the memorizer being different from processor 60 specified when configuration and maps, then adds Chain program manuscript can generate afterwards, processor 60 need not be reconfigured, without rebuild chain program.Therefore, originally Embodiment includes a kind of instrument, it configures new chain program manuscript by different memory mapped parameter.

Debugging routine 130 provides following various mechanism: the state of observation program in running, at following period of time list Step performs an instruction, introduces each breakpoint, performs the debugging task of other standards.Debugged program can run on configured The hardware embodiments of processor, or run on ISS126.In the case of the most any, debugging routine all to Family provides identical interface.When running this program on a hardware embodiments, one section of little monitoring programme is included into Among goal systems, in order to control the execution of user program, and communicate with debugging routine via a serial port.When imitative When running this program on proper program 126, simulated program 126 inherently performs those functions.Debugging routine 130 depends in several ways Lai Yu configures.It catches up with the instruction encoding/decoding storehouse link stated, to support, from debugging routine 130, machine code is carried out dis-assembling. Find out which depositor by scanning ISA description to be present among processor 60, produce in debugging routine 130 for showing Show the part of the buffer status of processor, and monitoring programme part and the ISS126 of information are provided to debugging routine 130.

Other SDKs 30 are all standards, and are not necessary for the configuration of each processor and are changed. Observation of characteristics program and various application program all come within the category.Once run on and shared by all configurations of processor 60 Time on the file of binary format, these instruments may need to repurposing, but they had not both relied on ISA and have described, Also not dependent on other parameters in configuration instruction 100.

Configuration instruction is also used to configure the one section of simulated program being referred to as ISS126 being shown in Figure 13.ISS126 is one section Software application, the behaviour of its simulation configurable processor instruction collection.Be different from such as Synopsys VCS and Processor hardware model opposed as the Verilog XL of Cadence and NC simulated program, ISS HDL model is that CPU exists One when performing instruction is abstract.Owing to it need not to simulate each of in whole processor designs each and depositor Next state changes, so ISS126 can run than simulation hardware faster.

ISS126 allows the program generated for configured processor 60 to be performed on a host computer.Its essence Really reproduce reset and the interruption behavior of this processor, these behaviors allow to such as device driver and setup code this The lower-level program of sample is developed.When local code is transformed to built-in application program, this is useful especially.

ISS126 can be used to identify potential problem, and such as architecture, it is assumed that memory order consideration etc., is used not The target having been inserted into downloading code to reality.

In the present embodiment, the language of a kind of C of being similar to is used to express ISS with carrying out teaching type semantic, to set up C operator Building block, instruction is converted to function by it.This language can be used to carry out the basic function of simulation interruption, such as, interrupt depositing Device, position is arranged, interrupt level, vector etc..

Configurable ISS126 is used as following 4 kinds of purposes or the mesh of a part for system design and proof procedure Mark:

Debugging software application program before hardware becomes can use；

Debugging systems soft ware (such as, compiler and operating system parts)；

HDL emulation with verifying for hardware designs compares.ISS is used as quoting of ISA and realizes at processor During design verification, ISS and processor HDL is diagnotor and application program runs, and from the track quilt of the two Relatively；And

(this is probably a part for configuration process, or is selecting processor to analyze software application performance Configuration after, it can be used for further application program adjust).

All of target is desirable that ISS126 can be to the program produced with configurable assembly program 110 and chain program It is loaded and decodes.They also require ISS to perform instruction semantically and are equivalent to corresponding hardware execution and equivalence Expectation in compiler.Because these reasons, ISS126 leads from the ISA file in order to define hardware identical with systems soft ware Go out its decoding and execution behavior.

For listed above the 1st and last target, for ISS126, it is important that be reached as quickly as possible Required precision.Therefore, ISS126 allows dynamically to control the level of detail of emulation.Such as, unless requested, at a high speed The details of buffer storage does not emulates, and the simulation of cache memory can dynamically close or turn on.Additionally, Before ISS126 is compiled, each parts (such as, cache memory and pipeline model) of ISS126 are configured such that In at runtime, ISS126 seldom makes the action selection depending on configuration.So, from other each portions relating to system The configurable behavior of all ISS is derived in the source defined divided.

For listed above the 1st and the 3rd target, for ISS126, it is important that when operating system OS not yet When providing service for the system (target) in design, provide operating system service for application program.For these service, equally It is essential that when this relevant portion being debugging process, target OS provide these to service.So, system carries For one design, for transmitting these services between ISS main frame and simulation objectives neatly.Current design depends on ISS Dynamically control (trap SYSCALL instruction can be switched on and close) and use special SIMCALL instruction to remove requesting host Operating system service combination of the two.

Last target call ISS126 goes some aspect of analog processor and system action, and these aspects are less than The level that ISA specifies.Particularly, by for the model from Perl manuscript (it extracts parameters from configuration database 100) Produce C language code, build the cache memory model of ISS.Additionally, the details of the streamline behavior of instructions (interlocking such as used based on depositor and functional unit effectiveness require) also derives from configuration database 100.Currently Embodiment in, a special streamline describes file according to being similar to the syntax of LISP to specify this information.

3rd target call centering line-break is for being accurately controlled.For this purpose it is proposed, in ISS126 one is special Non-architectural depositor is used to suppress various interruption to enable.

ISS126 provides several interfaces to support the different target that pin is used for:

One errorlevel or command mode (generally combine the 1st and last target uses)；

One order circulation pattern, it provides is-not symbol debugging capability, such as, breakpoint, monitoring point, step equifrequency Numerously for all 4 targets；And

One jack interface, it allow ISS126 by software debugging aid as one perform rear end use (this should When being configured to the buffer status of selected particular configuration be read and writes).

One interface that can describe with manuscript, it allows the most detailed debugging and performance evaluation.Particularly, this Interface can be used to compare different configuration of application behavior.Such as, on any breakpoint, from the fortune of a kind of configuration Row state can follow the running status from another kind configuration to compare, or transfers to latter state.

Simulated program 126 also has manual coding and automatically generates this two parts.Manual coding part is conventional, except Beyond instruction decoding and execution, both of which is to describe the form that language produces from ISA to generate.Hold from waiting by starting from The basic operation code that finds in the coding line of row, instruction is decoded by these tables, with the value of field index a form it In, proceed down, until finding a leaf operation code, (that is, one is not defined according to the pattern of other operation codes Operation code) till.Then this form provides a pointer to the code come from the conversion of TIE code, and above-mentioned TIE code is according to pin The semantic description of this instruction is specified.This group code is performed, in order to emulate this instruction.

ISS126 can follow the tracks of the execution of simulated program alternatively.This tracking uses a kind of known to industry Program counter (PC) Sampling techniques.On the interval of rule, simulated program 126 is to just at the program meter of simulated processor Number device is sampled.It sets up a rectangular histogram according to the hits of each code region.Simulated program 126 is also to adjusting The number of times being performed with each edge in figure counts, and its method is, when a call instruction is simulated, and order counting Device adds 1.When simulation process completes, simulated program 126 writes an output file, including rectangular histogram and call figure edge Counting, its form is can be by read-out by the tracking observation program of a standard.Owing to simulated program 118 need not use instrument Device mode (as among the tracking technique of a kind of standard) is modified, so following the tracks of expense do not affect simulation result, and And this tracking is entirely without damage.

Preferably, system carries out effective hardware processor emulation and software processor emulation.For this purpose it is proposed, this enforcement Example provides one piece of emulation board.As shown in Figure 6, emulation board 200 uses a compound PLD 202.Such as Altera Flex 10K200E is emulation processor configuration 60 from hardware.The processor netlist once produced by this system is entered Row programming, this CPLD device is the most functionally equivalent to last ASIC product.It provides such benefit, i.e. processor 60 Physics realization is feasible, and it runs faster than other emulation modes (such as ISS or HDL), and is accurate on the cycle 's.But, it can not reach every high frequency target that final ASIC can reach.

This block plate enables the designer to assess various processor config option, and the design cycle earlier stage just Proceed by software development and debugging.It can be also used for the functional verification of this kind of processor configuration.

Emulation board 200 has some resources thereon so that the exploitation of software, debugging and verification become easy.These bags Include CPLD device 202 itself, EPROM 204, SRAM 206, synchronize SRAM 208, flash memory 210 and two RS232 Serial-port 212.Serial-port 212 provides a communication link leading to UNIX or PC main frame, in order to downloads and debugs user Program.The configuration of processor 60, with the form of CPLD netlist, special by a configuration port 214 leading to this device Serial link, or it is downloaded to CPLD 202 by special each configuration ROM 216.

Resource available on plate 200 be equally be configured into a certain degree of.It is can be held by one owing to mapping Change places what the PLD (PLD) 217 changed completed, so the memorizer of the most various memory element maps all Can easily be changed.Equally, by using the storage component part of relatively big (capacity) and suitably determining token bus 222 With the size of 224 (being connected to cache memory 218 and 228), the speed buffering that just processor core can be made to be used is deposited

Reservoir

218 and 228 becomes extendible.

Use this plate to assess a kind of specific processor configuration and relate to several step.1st step is to obtain one group to retouch State the RTL file of the particular configuration of processor.Next step is to use any one of multiple commercially available synthetic instrument, from RTL synthesizes the netlist of a gate leve in describing.One such example is exactly the FPGA EXPRESS from Synopsys company. Then, obtaining a kind of CPLD embodiment by the netlist of gate leve, the program uses the various works typically provided by distributor Tool.A kind of such instrument is exactly the Maxplus2 from altera corp.Last step uses exactly and is sold by CPLD The programmable device that business provides again, downloads to this embodiment on the CPLD chip on emulation board.

One of purposes due to emulation board is to support the rapid prototyping embodiment for debugging purpose, thus important It is that CPLD implementation process cited in paragraph above is automatic.In order to reach this target, by by all relevant File focus among a single catalogue, customize the various files being supplied to user.Subsequently, it is provided that one the most fixed The synthesis manuscript of system, the configuration of specific processor can be synthesized in the specific FPGA device that client is selected by it.Warp The embodiment manuscript of the Complete customization that the various instruments of pin business are used generates the most simultaneously.Such synthesis and embodiment original text This correct embodiment functionally ensureing there is optimum performance.By suitable order is brought in manuscript, with Just read in specific processor configures relevant all RTL file, by including suitable order in, in order to based at processor I/O signal in configuration distributes chip pin position, and by including various order in, in order to obtain for processor logic The special logical implementations of some pith (such as gated clock), reach correctness functionally.This manuscript Also by the timing constraint condition detailed to the distribution of all of processor I/O signal, and by the spy to some signal of interest Different process, improves the performance of this embodiment.One example of timing constraint condition is exactly, by considering this letter onboard Number time delay, distribute specific input time delay to signal.The example that signal of interest processes is exactly, to the special overall situation Wiring distribution clock signal, in order to obtain low clock delay difference on CPLD chip.

Preferably, system also configures a proving program group for configured processor 60.Most of picture microprocessors that The checking of the composite design of sample includes following flow process:

Set up a testboard, in order to emulate this design, and output is compared, compare and can enter in testboard OK, it is possible to use an external model as ISS126；

Write diagnotor, to produce stimulus；

Scheme as the row of finite state machine covers is used to measure the covering of checking, including covering HDL, reduction Error rate, the number etc. of the vector run in this design；And

If covering insufficient, writing more diagnotor, and using various instrument, produce various diagnosis journey Sequence, in order to put into practice this design further.

The present invention uses the flow process that some is similar with this, but in view of the configurability of the design, all portions of this flow process Part is all modified.This methodology comprises the following steps:

Specifically configure for one and set up a testboard.The configuration of this testboard uses and is similar to retouch for HDL The scheme stated, and support total Options and the extension wherein supported, i.e. cache memory (capacity) size, bus connect Mouth, clock, interruption generation etc.；

A kind of particular configuration of HDL is run self-diagnostic procedure.Diagnotor itself is configurable, in order to pin They are cut out by one specific fragment of hardware.Select which section diagnotor to run and also rely on configuration；

Run the diagnotor produced in a pseudo-random fashion, and after performing each instruction, by processor shape State compares with ISS 126；And

That measures checking covers the covering instrument using measurement function and row to cover.Equally, monitoring programme and inspection Program is also run together with diagnotor, to monitor illegal various states and various situation.All these specific to one For configuration instruction, it is all configurable.

All each verification component all can verify that.TPP is used to realize configurability.

Testboard is a Verilog of the system wherein containing configured processor 60^TMModel.Feelings in the present invention Under condition, testboard includes:

Cache memory, EBI, external memory storage；

External interrupt and bus errors produce；And

Clock produces.

Owing to similar all of above-mentioned characteristic is all configurable, so testboard itself needs to support configurability. So, such as, size and the width of cache memory, and the number of external interrupt are automatically adjusted according to configuration Mesh.

Testboard provides stimulus to being device under processor 60.The assembly level of memorizer it is preloaded onto by offer This point is accomplished in instruction.It also produces to control the behavior of processor 60 such as, the various signals of various interruptions.With Sample, the frequency of these signals and timing are all controllable by testboard, and are automatically produced by the latter.

Diagnotor has two kinds of configurability.First, diagnotor TPP determines what is tested.Such as, Through writing a kind of diagnotor in order to test software interrupt.This diagnotor it is to be appreciated that there is how many kinds of software interrupt, with Just correct assembly code is produced.

Secondly, processor configuration system 10 should determine that any diagnotor is applicable to this configuration.Such as, it is encoded The processor 60 of this unit just it is not suitable for not containing in order to test the diagnotor of MAC unit.In the present embodiment, pass through Use a data base containing the information being related to each diagnotor to complete this step.This data base can include for The following message of each diagnotor:

Use this diagnotor, if certain option is the most selected；

If diagnotor can not go to run with various interruptions；

If diagnotor operationally, needs various special storehouse or various handle；And

If diagnotor can not run in the case of ISS126 collaborative simulation.

Preferably, processor hardware describes the testing tool including 3 types: test generator instrument, monitoring programme and Covering instrument (or the program of inspection), and a kind of collaborative simulation mechanism.Test generator instrument is to generate one with aptitude manner to be The various instruments of column processor instruction.They are the sequences of various pseudo random testing generator.The present embodiment is internal uses two kinds The one that type is developed specially is referred to as RTPG, another kind of referred to as VERA (VSG) based on external tool.Both of which have around They and the configurability set up.Based on the effective instruction for a kind of configuration, they will produce a series of instruction.These works Tool also makes the instruction of these new definition be produced randomly to test by processing the various instructions newly defined from TIE Raw.The present embodiment includes monitoring programme and inspection program, in order to the level of coverage of measuring and design checking.

Monitoring programme and covering instrument run along with once returning operation.Covering instrument monitoring, diagnosing program is done What, and the HDL put into practice is functionally and logically.Return run whole during collect all these information, and It is analyzed afterwards, in order to obtaining which part about this logic needs the prompting of test further.The present embodiment uses Several configurable functional coverage instruments.Such as, for a specific finite state machine, configuring according to one, it is not Including all of state.Therefore, for that configuration, functional coverage instrument is not required to trial and goes to check those states or jumping Become.By enabling this instrument TPP to configure, this step just can be completed.

Similarly, the most various monitoring programmes, in order to check the various illegal states occurred in HDL simulation process.This A little illegal states can be expressed as various mistake.Such as, in one group of 3 state bus, two drivers should not be simultaneously in high electricity Position.These monitoring programmes are whether configurable basis includes a kind of specific logic under this kind of configuration in, increase or take Disappear some inspection projects.

HDL is linked together by collaborative simulation mechanism with ISS126.It is used to check when order fulfillment at HDL and In ISS126, the state of processor is the most identical.Know at it and incorporate which feature for each configuration and need which In the range of a kind of state compares this, it is also configurable.So, such as, the breakpoint feature (causing) of data Increase a special depositor.This mechanism needs to know how that the special register to this is new compares.

The instruction semantic illustrated via TIE can be converted into functionally equivalent C language function, in order to is used for ISS126, and allow system designer for testing and verifying.In configuration database 106, the semanteme of an instruction is various Instrument is converted to C language function (this instrument uses the syntactic analysis instrument of standard to set up a syntax tree), then along this Syntax tree, checks whether grammaticality, and exports the corresponding expression formula write as by C language.This conversion needs The most pre-pass, in order to all expression formulas distribution bit width and rewrite syntax tree and make some conversion be simplified.With it He compares by converse routine (such as HDL to C or C are to assembler language compiler), and these converse routines are relatively simple, And can be proceeded by by professional person write from TIE and C language description.

Use the compiler and compilation/disassembler 100, benchmark test application source configured by configuration file 100 Code 118 is compiled and collects, and, using sample data set 124, it is simulated to obtain software features file 130, this article Part is also sent to user and configures capture program to user feedback.

Selecting for any configuration of hardware and software price/benefit feature that capable acquisition selects for any configuration parameter Select and open the chance being optimized system by designer further.Particularly, this will make designer select optimal configuration parameter, These parameters optimize whole system according to some evaluation function.A kind of possible processing procedure is plan based on a kind of greediness Slightly, i.e. by being repeatedly selected or do not select a kind of configuration parameter.In each step, all select whole system performance and Price has those parameters of optimal impact.The step for repeat, until can not find the performance that can also improve system always Till the single parameter of price.Other extensions include watching one group of configuration parameter attentively simultaneously, or use more complicated search Algorithm.

In addition to obtaining optimal configuration parameter and selecting, this processing procedure can be utilized to build optimum processor Various extensions.Owing to there is substantial amounts of probability in the various extensions of processor, it is important that limit extension candidate Number.Wherein, a kind of technology is exactly application software for XRF analysis and only watches those instructions that can improve systematic function or price attentively Extension.

After having said the operation of the automatic processor configuration system according to the present embodiment that is over, now will be to source The example of reason device macro-architecture structure configuration.1st example represents advantage when applying the present invention to compression of images.

Locomotion evaluation is a pith of many image compression algorithms (including MPEG video and 263 conference applications). Video image compression attempts using the similarity from a frame to another frame, to reduce the memory capacity needed for each frame.? In the case of simplest, each block of image to be compressed can be carried out with the corresponding blocks (identical X, Y location) of reference picture Relatively (leading or subsequently the image being only close to is compressed).The compression of the image difference between each frame is with indivedual The compression of image is compared, and the former is generally of higher bit efficiency.In the video sequence, unique characteristics of image is not generally It is moved between at same frame, so the immediate concordance between each piece of different frame is frequently not exactly at identical X, on Y location, but have some to deviate.If some pith of image is moved between different frame, then it is necessary Before these differences are calculated, identify and compensate this motion.This fact means by between continuous print figure Difference (including the feature to various uniquenesses, and the X in the subimage for calculated difference, Y deviates) between Xiang Encode, just can obtain the expression that contrast is the strongest.For calculate the deviation on the position of image difference be referred to as motion vow Amount.

In this class compression of images, the heaviest calculating task determines optimal motion vector for each piece exactly. The common method selecting motion vector is exactly in each block of image compressed and the set of each candidate blocks of previous frame image Between, find out the vector of the mean difference between pixel and pixel with minimum.Each candidate blocks is around the block compressed Position on the set of all of each contiguous block.The size of image, the size of block, and the size of each contiguous block, all affect The operation time to motion estimation algorithm.

Each frame subimage of image to be compressed is carried out by simple block-based estimation with a frame reference picture Relatively.In the video sequence, reference picture can lead over or follow thematic map picture.At each occurrence, at thematic map Before decompressed, this reference picture decompressed system should be considered effective.One block of image to be compressed is with reference Comparative descriptions between each candidate blocks of image is as follows.

Around correspondence position in a reference image, once search for for each piece of the image that is the theme.Generally, to image Each of chrominance component (such as YUV) be analyzed individually.Sometimes, only to a kind of component, such as brightness, it is analyzed. Between each possible block of the region of search of theme block and reference picture, calculate the mean deviation between pixel and pixel Different.This difference is exactly the absolute value of the difference of the size of pixel number.Meansigma methods follow in the antithetical phrase of each piece N2 pixel it Be directly proportional (dimension that here, N is this block).Produce minimum average B configuration pixel difference reference picture block definition thematic map as The motion vector of this block.

Example below represents a kind of simple form of motion estimation algorithm, and then using TIE is a little special merit Can its algorithm of unit optimization.This optimizes the acceleration effect producing more than 10 times so that applied compression based on processor is in being permitted Many Video Applications.It illustrates one to be readily able to by high-level language and is programmed the place that combines with the efficiency of specialized hardware The function of reason device.

This example uses two matrix OldB and NewB, represents old image and new images respectively.The size of image is by really It is set to NX and NY.Block size is confirmed as BLOCKX and BLOCKY.Therefore, this image is multiplied by NY/BLOCKY by NX/BLOCKX Block forms.Region of search around a block is confirmed as SEARCHX and SEARCHY.Optimum movement vector and numerical value are stored Among VectX, VectY, and VectB.The optimum movement vector calculated by basic (reference) embodiment and numerical value quilt It is stored among BaseX, BaseY, and BaseB.These numerical value are used to check by the use instruction extension calculating of this embodiment Each vector out.In following C code section, it is possible to obtain these basic definitions:

#define NX 64/* Image width*/

#define NY 32/* Image height*/

#define BLOCKX 16/* Block width*/

#define BLOCKY 16/* Block height*/

#define SEARCHX 4/* search region

Width*/

#define SEARCHY 4/* search region

Height*/

unsigned char OldB[NX][NY]；/ * old Image*/

unsigned char NewB[NX][NY]；/ * new Image*/

unsignedshort VectX[NX/BLOCKX][NY/BLOCKY]；/ * Xmotionvector

*/

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]；/ * Ymotion vector

*/

unsigned short VectB[NX/BLOCKX][NY/BLOCKY]；/ * absolute

Difference*/

unsigned short BaseX[NX/BLOCKX][NY/BLOCKY]；/ * Base X motion

Vector*/

unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]；/ * Base Y motion

Vector*/

unsigned short BaseB[NX/BLOCKX][NY/BLOCKY]；/ * Base absolute

Difference*/

#define ABS(x) (((x)<0)？(-(x)): (x))

#define MIN (x, y) (((x) < (y))？(x): (y))

#define MAX (x, y) (((x) > (y))？(x): (y))

#define ABSD (x, y) (((x) > (y))？((x) one (y)): ((y)-(x)))

Locomotion evaluation algorithm includes 3 nested circulations:

1. each source block in pair old image.

2. pair in each object block of the new images around block region, source.

3. calculate the absolute difference between every a pair pixel.

The complete code of this algorithm is listed below.

Reference software embodiment

void

motion_estimate_base()

{

Int bx, by, cx, cy, x, y；

Int startx, starty, endx, endy；

Unsigned diff, best, bestx, besty；

For (bx=0；bx<NX/BLOCKX；bx++){

For (by=0；by<NY/BLOCKY；by++){

Best=bestx=besty=UINT_MAX；

Startx=MAX (0, bx*BLOCKX-SEARCHX)；

Starty=MAX (0, by*BLOCKY-SEARCHY)；

Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX)；

Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY)；

For (cx=startx；cx<endx；cx++){

For (cy=starty；cy<endy；cy++){

Diff=0；

For (x=0；x<BLOCKX；x++){

For (y=0；y<BLOCKY；y++){

Diff+=ABSD (OldB [cx+x] [cy+y],

NewB [bx*BLOCKX+x] [by*BLOCKY+y])；

}

if(diff<best){

Best=diff；

Bestx=cx；

Besty=cy；

}

BaseX [bx] [by]=bestx；

BaseY [bx] [by]=besty；

BaseB [bx] [by]=best；

}

Basic embodiment is simple, and it can not use the more inherence in the comparison between this piece and block parallel Property.Configurable processor architecture provides two kinds of important instruments, can significantly speed up the execution of this application program.

First, this instruction set architecture includes strong funneling displacement primitive, it is allowed to the most quickly take out Take out-of-alignment field.This allow pixel ratio compared with internal ring from memorizer, effectively take out the group of adjacent each pixel.This ring can To be rewritten, make it to simultaneously run on 4 pixels (byte).Particularly, in order to reach the purpose of this example, people Wish to define a new instruction, in order within the same time, calculate the absolute difference of 4 pixels pair.But, defining this Before new instruction, it is necessary to be again carried out this algorithm, to utilize such instruction.

The appearance of this instruction allows to obtain such improvement in internal ring mathematic interpolation, i.e. opening of ring becomes same Noticeable.The C language code of internal ring is rewritten, in order to utilizes new absolute difference summarizing instruction and effectively shifts.Ginseng A part for 4 the overlapping blocks examining image just can compare in same ring.(x y) corresponds to be added SAD The new intrinsic function of instruction.(x, y) moves to right SRC to the chain of x and y, and its displacement is stored among SAR depositor.

Use the immediate mode of the estimation of SAD instruction

/

void

motion_estimate_tie()

{

Int bx, by, cx, cy, x；

Int startx, starty, endx, endy；

Unsigned diff0, diff1, diff2, diff3, best, bestx, besty；

Unsigned*N, N1, N2, N3, N4, * O, A, B, C, D, E；

For (bx=0；bx<NX/BLOCKX；bx++){

For (by=0；by<NY/BLOCKY；by++){

Best=bestx=besty=UINT_MAX；

Startx=MAX (0, bx*BLOCKX-SEARCHX)；

Starty=MAX (0, by*BLOCKY-SEARCHY)；

Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX)；

Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY)；

For (cy=starty；cy<endy；Cy+=sizeof (long))

For (cx=startx；cx<endx；cx++){

Diff0=diff1=diff2=diff3=0；

For (x=0；x<BLOCKX；x++){

N=(unsigned*) & (NewB [bx*BLOCKX+x]

[by*BLOCKY])；

N1=N [0]；

N2=N [1]；

N3=N [2]；

N4=N [3]；

O=(unsigned*) & (OldB [cx+x] [cy])；

A=O [0]；

B=O [1]；

C=O [2]；

D=O [3]；

E=O [4]；

Diff0+=SAD (A, N1)+SAD (B, N2)+

SAD (C, N3)+SAD (D, N4)；

SSAI(8)；

Diff1+=SAD (SRC (B, A), N1)+

SAD (SRC (C, B), N2)+SAD (SRC (D, C),

N3)+SAD (SRC (E, D), N4)；

SSAI(16)；

Diff2+=SAD (SRC (B, A), N1)+

SAD (SRC (C, B), N2)+SAD (SRC (D, C),

N3)+SAD (SRC (E, D), N4)；

SSAI(24)；

Diff3+=SAD (SRC (B, A), N1)+

SAD (SRC (C, B), N2)+SAD (SRC (D, C),

N3)+SAD (SRC (E, D), N4)；

O+=NY/4；

N+=NY/4；

}

if(diff0<best) {

Best=diff0；

Bestx=cx；

Besty=cy；

}

if(diff1<best) {

Best=diff1；

Bestx=cx；

Besty=cy+1；

}

if(diff2<best) {

Best=diff2；

Bestx=cx；

Besty=cy+2；

}

if(diff3<best) {

Best=diff3；

Bestx=cx；

Besty=cy+3；

}

VectX [bx] [by]=bestx；

VectY [bx] [by]=besty；

VectB [bx] [by]=best；

}

The present embodiment uses following SAD function to assess final new instruction:

The absolute difference summation of 4 bytes

/

static inline unsigned

SAD (unsigned ars, unsigned art)

{

Return ABSD (ars > > 24, art > > 24)+

ABSD ((ars > > 16) & 255, (art > > 16) & 255)+

ABSD ((ars > > 8) & 255, (art > > 8) & 255)+

ABSD (ars & 255, art & 255)；

}

In order to debug this new embodiment, use following test program, by by new embodiment and use base Two kinds of motion vectors and numerical value that the present embodiment is calculated are compared:

Main test program

/

int

Main (int argc, char**argv)

{

int passwd；

#ifndef NOPRINTF

Printf (" Block=(%d, %d), Search=(%d, %d), size=(%d, %d) N ",

BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY)；

#endif

init()；

motion_estimate base()；

motion_estimate_tie()；

Passwd=check ()；

#ifndef NOPRINTF

printf(passwd？" TIE version passed n ": " UTIE version

failed\n″)；

#endif

return passwd；

}

In whole development process, all will use this simple test program.Here, it should the routine followed is just Being when a mistake being detected, mastery routine should return 0, otherwise, returns 1.

TIE is used to allow the quick explanation of new instruction.Configurable processor generator can hardware embodiments with And SDK these two aspects realizes these instructions fully.The optimum integration of new function is generated to hardware by hardware synthesis Among data path.The software environment of configurable processor is supported to adjust at C and C++ compiler, assembly program, symbol completely New instruction in examination program, tracing program and Cycle accurate instruction set simulation program.The rapid regeneration of hardware and software makes Special instruction becomes a kind of quick and reliable instrument accelerated for application program.

This example uses TIE to realize a simple instruction, in order to be performed in parallel the pixel of 4 pixels ask difference, Take absolute value and add up.This one-byte instruction can carry out 11 kinds of elementary operations (in conventional processing procedure, it may be necessary to many The instruction of bar independence), as same atomic operation.Complete description be presented herein below:

//define a new opcode for Sum of Absolute Difference(SAD)

//from which instruction decoding logic is derived

Opcode SAD op2=4 ' b0000 CUSTO

//define a new instruction class

//from which compiler, assembler, disassembler

//routines are derived

Iclass sad (SAD}{out arr, in ars, in art}

//semantic definition from which instruction-set

//simulation and RTL descriptions are derived

semantic sad_logic(SAD){

Wire [8: 0] diff01, diff11, diff21, diff31；

Wire [7: 0] diff0r, diff1r, diff2r, diff3r；

Assign diff01=art [7: 0]-ars [7: 0]；

Assign diff11=art [15: 8]-ars [15: 8]；

Assign diff21=art [23: 16]-ars [23: 16]；

Assign diff31=art [31: 24]-ars [31: 24]；

Assign diff0r=ars [7: 0]-art [7: 0]；

Assign diff1r=ars [15: 8]-art [15: 8]；

Assign diff2r=ars [23: 16]-art [23: 16]；

Assign diff3r=ars [31: 24]-art [31: 24]；

Assign arr=

(diff01[8]？Diff0r:diff01)+

(diff11[8]？Diff1r:diff11)+

(diff21[8]？Diff2r:diff21)+

(diff31[8]？Diff3r:diff31)；

}

This description is expressed as defining the minimal steps needed for a new instruction.First of all it is necessary to be that this new instruction is fixed One group of new operation code of justice.In this case, new operation code SAD is defined as the child-operation code of CUSTO.As noted above As, CUSTO is predefined as:

Opcode QRST op0=4 ' b0000

Opcode CUSTO op1=4 ' b0100 QRST

It is easy to see that QRST is top layer operation code.CUSTO is the child-operation code of QRST, and SAD is again CUSTO Child-operation code.This hierarchical structure tissue of operation code allows logic groups and the management of opcode space.To be remembered one Important thing is exactly the opcode space that CUSTO (and CUST1) is defined as retaining, in order to user adds new instruction.Best It is that user rests on distributed opcode space, to ensure following re-usability that TIE describes.

The 2nd step in TIE describes is one new instruction class of definition, and it contains and newly instructs SAD.Here it is SAD The place that each operand of instruction is defined.In this case, SAD includes 3 register operand, destination register arr, Source register ars and art.As noted earlier, arr is defined as the depositor indexed by field r of this instruction, Ars and art is defined as field s with this instruction and the depositor of t index.

Last block in description is that SAD instruction provides formal semantical definition.This description uses Verilog HDL language A subset, in order to describe combination logic.How SAD instruction will be emulated by this block accurately regulation ISS just, with And how to synthesize an adjunct circuit and be added in configurable processor hardware go to support new instruction.

Secondly, TIE is described and carries out debugging and verification by the various instruments that use above describes.Just describing at checking TIE Really after property, next step is exactly that assessment newly instructs hardware size and the impact of performance.As set forth above, it is possible to use such as Design Compiler^TMComplete this step.After Design Compiler completes work, it is defeated that user can watch it attentively Go out, in order to obtain detailed area and speed report.

Checking TIE be described as correct and effective after, here it is configure and construction one also support what new SAD instructed The time of configurable processor.As it has been described above, use graphical user interface GUI to complete this step.

Again, locomotion evaluation code is compiled as the code for configurable processor, and configurable processor uses instruction Collection simulated program carrys out the correctness of proving program, it is often more important that measure its performance.This step is completed: run by 3 steps Use the test program of simulated program；Run basic embodiment to obtain instruction count；And run new embodiment with Obtain instruction count.

The simulation data of 2nd step be presented herein below:

Block=(16,16), Search=(4,4), size=(32,32)

TIE version passed

Simulation Completed Successfully

Time for Simulation=0.98 seconds

Events Number Number

per 100

instrs

Instructions 226005(100.00)

Unconditional taken branches 454(0.20)

Conditional branches 37149(16.44)

Taken 26947(11.92)

Not taken 10202(4.51)

Window Overflows 20(0.01)

Window Underflows 19(0.01)

The simulation data of last step be presented herein below:

Block=(16,16), Search=(4,4), size=(32,32)

TIE version passed

Simulation Completed Successfully

Time for Simulation=0.36 seconds

Events Number Number

per 100

instrs

Instructions 51743(100.00)

Unconditional taken branches 706(1.36)

Conditional branches 3541(6.84)

Taken 2759(5.33)

Not taken 782(1.51)

Window Overflows 20(0.04)

Window Underflows 19(0.04)

From these two parts reports it can be seen that have been achieved for the acceleration of about 4 times.It should be noted that configurable processor Instruction set simulation program is also provided that other useful informations more.

After verifying this program correctness and performance, next step uses Verilog as above to imitate exactly Proper program carrys out testing results program.Professional person can find from the makefile of appendix C that the details of this process is (relevant Each file is also shown in appendix C).The purpose of this emulation is exactly to verify the correctness of new embodiment further, and And, it is often more important that so that this section of test program becomes the regression tested part for this configured processor.

Finally, it is possible to use such as Design Compiler^TMCarry out synthesis processor logic, and use such as A pollo^TMIt is laid out and connects up.

In order to illustrate simple and clear and simple for the sake of, video compress and locomotion evaluation have been made the sight once simplified by this example Examine.It practice, in standard-compression algorithm, there is many additional nuances.Such as, MPEG2 typically divides with sub-pixel Resolution carries out locomotion evaluation and compensation.Two adjacent row and columns of each pixel can be averaged, to generate one group of pixel, interior It is inserted on an ideal position in the imagination between two row or two row.Here, due to only with 3 or 4 row TIE codes just One group of parallel pixel average algorithm can be easily achieved.So the user of configurable processor defines instruction and again becomes useful 's.Pixel in a row averagely reuses the effective alignment function of the standard instruction set of this processor.

Therefore, the absolute value summarizing instruction including a simple difference in only increases hundreds of door, but to locomotion evaluation The improvement of performance is more than 10 times.This acceleration expression is markedly improved in final system in terms of cost lattice and power-efficient.

And, the seamless extension of SDK (including new locomotion evaluation instruction in) allow quick prototype development, Delivering of performance evaluation and complete software application solution.The solution of the present invention makes application specific processor Configuration is simple, reliable and complete, and provides at the aspect such as cost, performance, function and power-efficient of final system product and draw The improvement that people gazes at.

Focus on the example adding a hardware function units as one, it is considered to the basic configuration shown in Fig. 6, wherein wrap Include processor and control function, program counter (PC), branching selection, command memory or cache memory and instruction decoding Device, and basic integer data path, including main register file, bypass multiplexer, pipeline register, arithmetic Logical block ALU, address generator and the data storage for cache memory.

Occur conditionally multiplier logic (when arranging " multiplier " parameter time) while write HDL, and such as Fig. 7 Shown in, multiplier unit is added as new pipeline stages and (if desired supports accurate exclusions, then require to be transformed into remove Outer situation processes).Certainly, the various instructions of use multiplier are preferably added along with new unit.

As the 2nd example, as shown in Figure 8, a full coprocessor can be added to basic configuration, be used as such as to take advantage of Digital signal processor as method/summing elements.This just serves change to the control band of processor, and for example, multiplication tires out Add computing and add various decoding control signals, be decoded including to the content of the source and target depositor from extended instruction； Suitable streamline time delay is added for each control signal；Extended register target logic；It it is a depositor bypass multiplexing Device adds and controls, in order to send number from accumulating register, and includes a multiply-accumulator in, as an instruction Perform the possible source of result.Additionally, it also needs to add a multiply-accumulator, the latter brings additional each cumulative Depositor, a multiply-accumulate array and source for master register source select multiplexer.Equally, coprocessor is added Bringing the extension of the depositor bypass multiplexer from accumulating register, it takes out a source from accumulating register, and And extended loading/alignment multiplexer, in order to from multiplier results, take out a source.Further, in order to actual hardware Being used together new functional unit, native system preferably increases some instructions.

Combine with digital signal processor and seem that another option useful especially is exactly a floating point unit.Such one The functional unit of individual enforcement such as IEEE754 single-precision floating point computing standard can be together with the instructions for accessing it Add.Floating point unit can be used for the application scenario of such as Digital Signal Processing, such as audio compression and decompression.

Another example as the motility of native system, it is considered to 4KB memory interface as shown in Figure 9.Use this Bright configurability, each depositor and each data path of coprocessor can be wider than main integer register file and data path A bit or narrowly, and the width of local storage can change so that memory width processes equal to the widest processor or association The width (memorizer addressing when reading and writing the most correspondingly is adjusted) of device.Such as, Figure 10 represents one for processor Local memory system, loading and the storage of 32 to a processor/coprocessor combination supported by this processor.Above-mentioned group It is combined in identical array addressing, but loading and the storage of 128 supported by this coprocessor.This can come real with TPP code Existing

Function memory (Select, A1, A2, DI1, DI2, W1, W2, DO1, DO2)

；SB1=config_get_value (" width_of_port_1 ")；SB2=

config_get_value(″width_of_port_2″)；

；$ Bytes=config_get_value (" size_of_memory ")；

；$ Max=max ($ B1, $ B2)；$ Min=min ($ B1, $ B2)；

；$ Banks=$ Max/SMin；

；$ Wide1=($ Max==$ B1)；$ Wide2=($ Max==$ B2)；

；$ Depth=$ Bytes/ (log2 ($ Banks) * log2 ($ Max))；

Wire [` $ Max`*8-1: 0] Data1=` $ Wide1`？DI1:(` $ Banks`{DI1}}；

Wire [` $ Max`*8-1: 0] Data2=` $ Wide1`？DI2:{` $ Banks`{DI2}})；

Wire [` $ Max`*8-1: 0] D=Select？Data1:Data2；

Wire Wide=Select？Wide1:wide2；

Wire [log2 (` $ Bytes`)-1: 0] A=Select？A1:A2；

Wire [log2 (` $ Bytes`)-1: 0] Address=A [log2 (` $ Bytes`)-

1:log2 (` $ Banks`)]:

Wire [log2 (` $ Banks`)-1: 0] Lane=A [log2 (` $ Banks`)-1: 0]；

；For ($ i=0；$i<$Banks；$i++){

Wire WrEnable (i}=Wide | (Lane==(i})；

Wire [log2 (` $ Min`)-1: 0] WrData` $ i`=D [(i}+1) * ` $ Min`*8-

1:{i) * ` $ Min`*8]

Ram (RdData` $ i`, Depth, Address, WrData` $ i`, WrEnable` $ i`)；

；}

Wire [` $ Max`*8-1: 0] RdData={

；For ($ i=0；$i<$Banks；$i++){

RdData` $ i`,

；}

}

Wire [` $ B1`*8-1: 0] DO1=Wide1？RdData:RdData [(Lane+1) * B1*8-

1:Lane*B1*8]；

Wire [` $ B2`*8-1: 0] DO2=Wide2？RdData:RdData [(Lane+1) * B2*8-

1:Lane*B2*8]；

Here, $ Bytes is total memory size, under the control of write signal W1, on the byte ground of data/address bus D1 At the A1 of location, access with width B1, or use corresponding parameter B2, A2, D2 and W2.In a given cycle, only It is movable for having one group of signal defined by Select.Memorizer is embodied as a set of memory pool by TPP code.Each The width in individual pond is multiplied by the maximum ratio with minimum access width by the number of minimum access width and pond and is given.One for Circulation is used to illustrate each memory pool and relevant write signal thereof, i.e. writes enable and writes data.2nd for follows Ring is used to collect the data read from all each ponds, and is sent to one group of single bus.

Figure 11 represents the example that user-defined instructions is brought into basic configuration.As it can be seen, can With with being similar to as arithmetic logic unit alu timing and simple instruction is added in processor pipeline by interface.With The instructions that this mode is added should not produce hang-up or exclusions, does not contains state, only uses two common sources Register value and coding line are as input, and produce a single output numerical value.But, if TIE language has appointment The regulation of processor state, the most such constraints is exactly unnecessary.

Figure 12 represents another example realizing user's definition unit in this system.Function list shown in figure Unit, a 8/16 parallel cell of data extension of ALU, produce from following ISA code:

Instruction {

Opcode ADD8_4 CUSTOM op2=0000

Opcode MIN16_2 CUSTOM op2=0001

Opcode SHIFT16_2 CUSTOM op2=0002

Iclass MY 4ADD8,2MIN16, SHIFT16_2

A<t, a<s, a>t_

}

Implementation{

Input [31: 0] art, ars；

input[23∶0]inst；

Input ADD8_4, MIN16_2, SHIFT16_2；

output[31∶0]arr；

Wire [31: 0] add, min, shift；

Assign add=(art [31: 24]+ars [31: 24], art [23: 16]+art [23: 16],

Art [15: 8]+art [15: 8], art [7: 0]+art [7: 0] }；

Assign min [31: 16]=art [31: 16] < ars [31: 16]？Art [31: 16]:

ars[31∶16]；

Assign min [15: 0]=art [15: 0] < ars [15: 0]？Art [15: 0]:

ars[15∶0]；

Assign shift [31: 16]=art [31: 16] < < ars [31: 16]；

Assign shift [15: 0]=art [15: 0] < < ars [15: 0]；

Assign arr={32{ADD8_4}}& add | 32{MIN16_2}}& min |

{32{SHIFT16_2}}& shift；

}

In another aspect of the present invention, it is of particular interest that designer defines instruction execution unit 96, TIE defines Instructions, including those amendment processor states instruction, it is simply that be decoded in this unit and perform.In the present invention This aspect, multiple building blocks are added among language, make it possible to what explanation can be read by new instruction and write Additional processor state." state " statement is used to the processor state that explanation is additional.This explanation starts from keyword state.The next part of state statement illustrates size and the number of each bit of this state, and each bit of this state be as What is indexed.Thereafter part is state name, in order to identify the state in other declaratives.Last of state statement Part is a list of the attribute relevant with this state.Such as,

State [63: 0] DATA cpn=0 autopack

State [27: 0] KEYC cpn=1 nopack

State [27: 0] KEYD cpn=1

Define 3 kinds of new processor states, DATA, KEYC and KEYD.State DATA is 64 bit widths, its each bit quilt Index is for from 63 to 0.KEYC and KEYD both of which is the state of 28 bits.DATA has a kind of coprocessor number attribute cpn, Represent which coprocessor is data DATA belong to.

Attribute " autopack " represents that some being automatically mapped in user register file is deposited by state DATA Device so that the numerical value of DATA can be read by various software tools and write.

User_register part is defined as expression and state is mapped to respectively depositing in user register file Device.User_register part starts from a keyword user_register, which is followed by one and represents register number The numeral of code, and using an expression formula representing each status bits to depositor to be mapped as ending.Such as,

user_register 0 DATA[31∶0]

user_register 1 DATA[63∶32]

user_register 2 KEYC

user_register 3 KEYD

User_register 4 { X, Y, z}

The low word specifying DATA is mapped to the 1st user register file, and high-word is mapped to the 2nd user and posts Register file.Thereafter two user register file lines are used to preserve the numerical value of KEYC and KEYD.It is clear that at this Status information used in part should be consistent with the state holding that used of part.Here it is possible to by one section of computer Program automatically checks this concordance.

In another embodiment of the present invention, use packing-box design (bin-packing) algorithm automatically by each state Bit is assigned to each row of user register file.In yet another embodiment, it is possible to use such as artificial and automatic distribution Combination ensures compatibility upwards.

Instruction field statement field is used to improve the readability of TIE code.Each field is to be aggregated and use Each chain each subset of other fields that name is quoted.In an instruction, the complete set of each bit is five-star superset Field inst, and this field can be divided into less each field.Such as,

field xinst[11:8]

field yinst[15:12]

Fieldxy [x, y]

By two 4 bit field x and y, be defined as highest field inst son field (be respectively bit 8-11 and 12-15), and by 8 bit field xy it is defined as the chain of x and y field.

Statement opcode is coding specific field defining operation code.Intend to specify the instruction field of operand, such as, prepare The depositor that used by the operation code that so defines or immediately constant, it should first defined with field statement, then use Operand statement is defined.

Such as,

Opcode acs op2=4 ' b0000 CUSTO

Opcode adse1 op2=4 ' b0001 CUSTO

Operation code CUSTO based on the predefined binary constant 0000 of 4 bit long (4 ' b0000 represent) is determined Two groups of new operation codes acs of justice and adse1.The preferably TIE of core I SA describes has following statement

field op0 inst[3:0]

field op1 inst[19:16]

field op2 inst[23P:20]

Opcode QRST op0=4 ' b0000

Opcode CUSTO op1=4 ' b0100 QRST

Inst [23: 0]=0,000 0110 xxxx xxxx xxxx 0000

Inst [23:0]=0,001 0110 xxxx xxxx xxxx 0000

The instruction operands statement operand each depositor of mark and immediately constant.But, it is being one by a field definition Before individual operand, it should be defined as a field as above in advance.If this operand be one the most normal Number, then can produce the numerical value of this constant, or it can be taken out from the constant table of a predefined from this operand, The definition of constant table will describe below.Such as, in order to an immediate operand is encoded, TIE code

field offset inst[23:6]

operand offests4 offset{

Assign offsets4={{14{offset [17] } }, offset} < < 2；

}{

wire [31∶0]t；

Assign t=offsets4 > > 2；

Assign offset=t [17: 0]；

}

Define one 18, the field of entitled offset, it preserves a signed number and an operand Offsets4, the latter is stored in 4 times of the number in offset field.As professional person is understood, operand language The last part of sentence is actually described in the Verilog for describing combinational circuit^TMOne son of HDL is concentrated in order to carry out The circuit calculated.

Here, wire statement defines the logic wiring that a group name is t, and its width is 32.After wire statement 1assign statement specify drive logic wiring logical signal be offsets4, and 2assign statement specify t low by 18 Position is put into offset field.The value of 1assign statement directly assigned operation number offsets4 be offset and it Sign bit (position 17) and follow behind to move to left of 14 parts of two copies chain.

For a constant table operand, TIE code

table prime16{

2,3,5,7,9,11,13,17,19,23,29,31,37,41,43,47,

53

}

operand prime_s s{

Assign prime_s=prime [s]；

}{

Assign s=prime_s==prime [0]？4 ' b0000:

Prime_s==prime [1]？4 ' b0001:

Prime_s==prime [2]？4 ' b0010:

Prime_s==prime [3]？4 ' b0011:

Prime_s==prime [4]？4 ' b0100:

Prime_s==prime [5]？4 ' b0101:

Prime_s==prime [6]？4 ' b0110:

Prime_s==prime [7]？4 ' b0111:

Prime_s==prime [8]？4 ' b1000:

Prime_s==prime [9]？4 ' b1001:

Prime_s==prime [10]？4 ' b1010:

Prime_s==prime [11]？4 ' b1011:

Prime_s==prime [12]？4 ' b1100:

Prime_s==prime [13]？4 ' b1101:

Prime_s==prime [14]？4 ' b1110:

4′b1111；

(following the numeral after table name is each element in table to utilize table statement to define constant array prime Number), and use these operands as enter this table prime index, in order to encode a number for operand prime_s Value (notes when index of definition, Verilog^TMThe use of statement).

Operation code and operand are linked together in a kind of common format by instruction class statement iclass.At one All instructions defined in iclass statement all have identical form and operand usage.Before one instruction class of definition, First its each member must be defined as field, is then defined as operation code and operand.Such as, set up at determining above On the basis of the example of justice operation code acs and adse1, additional statement

Operand art t { assign art=AR [t]；}{}

Operand ars s { assign ars=AR{s}；}{}

Operand arr r { assign AR [r]=arr；}{}

Use operand statement to define 3 register operand art, ars and arr (to note again that in definition Verilog^TMThe use of statement).Then, iclass statement

Iclass viterbi [adse1, acs] [out arr, in art in ars]

Assigned operation number adse1 and acs belongs to a common class of instruction viterbi, and it takes two register operand Art and ars is as input, and output is written in a register operand arr.

In the present invention, instruction class statement iclass is modified to allow to carry out the conditional access information of each instruction Explanation.It starts from keyword " iclass ", which is followed by the name of this instruction class, belongs to the row of the operation code of this instruction class Table and a list of operand access information, and end at a new list that define, for conditional access information. Such as,

Iclass lddata { LDDATA} { out arr, in imm4} { in DATA}

Iclass stdata { STDATA} { in ars, inart} { out DATA}

Iclass stkey { STKEY} { in ars, in art} { out KEYC, out KEYD}

Iclass des { DES} { out arr, in imm4} { inout KEYC, inout

DATA, inout KEYD}

Define several instruction class and how various new instruction accesses various state.Keyword " in ", " out " and " inout " is used to refer to this state by respectively instructing reading, write or revise (read and write) in iclass.At this example In, state " DATA " is commanded " LDDATA " and reads, and state " KEYC " and " KEYD " are commanded " STKEY " and write, " KEYC ", " KEYD " and " DATA " is commanded " DES " amendment.

Instruction semantic statement semantic describes the behavior of one or more instruction, and these instructions use for operand Carry out the Verilog encoded^TMSame subsets.By in a plurality of instruction defined in a single semantic statement, some Common expression formula can be shared, and hardware embodiments can become more efficient.Semantic statement allows to make Variable be at each operand for each operation code defined in the opcode list of this statement, and arrange in this operation code The single-bit variable specified for each group of operation code in table.This variable has the name identical with operation code, and when being somebody's turn to do When operation code is detected, it is 1 by valuation.It is used for calculating section (Verilog^TMSub-portion), corresponding in order to indicate The appearance of instruction.

//define a new opcode for BYTESWAP based on

// - a predefined instruction field op2

// - a predefined opcode CUST0

//refer to Xtensa ISA manual for descriptions of op2 and CUSTO

Opcode BYTESWAP op2=4 ' b0000 CUST0

//declare state SWAP and COUNT

state COUNT 32

state SWAP1

//map COUNT and SWAp to user register file entries

user_register 0 COUNT

user_register 1 SWAP

//define a new instruction class that

// - reads data from ars(predefined to be AR[s])

// - uses and writes state COUNT

// - uses state SWAP

Iclass bs { BYTESWAP}{outarr, inars}{inout COUNT, in

SWAp}

//semantic definition of byteswap

// COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP

semantic bs {BYTESWAP} {

Wire [31: 0] ars_swapped=

{ ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] }；

Assign arr=SWAP？Ars_swapped:ars；

Assign COUNT=COUNT+SWAP；

}

The part 1 of above-mentioned code is a group of operation code, referred to as BYTESWAP of new instruction definition.

//define a new opcode for BYTESWAP based on

// - a predefined instruction field op2

// - a prede fined opcode CUSTO

//refer to Xtensa ISA manual for descriptions of op2 and CUSTO

Opcode BYTESWAP op2=4 ' b0000 CUSTO

Here, new operation code is defined as one group of child-operation code of CUSTO.From " Xtensau described in detail below^TMRefer to Make architecture reference manual " in, it can be seen that CUSTO is defined as

Opcode QRST op0=4 ' b0000

Opcode CUSTO op1=4 ' b0100 QRST

Here, op0 and op1 is all referring to the field in order.Pattern typically in accordance with a kind of hierarchical structure is organized respectively Operation code.Here, QRST is top-level operation code, and CUSTO is the child-operation code of QRST, and BYTESWAP is again the son of CUSTO Operation code.This hierarchical structure tissue of operation code allows to carry out opcode space cluster and the management of logic.

Additional processor state needed for 2nd section of explanation expression BYTESWAP instruction:

//declare state SWAP and COUNT

state COUNT 32

state SWAP 1

Here, COUNT is illustrated as the state of a kind of 32 bits, and the state that SWAP is 1 bit.TIE language is specified Each bit in COUNT is indexed from 31 to 0, and wherein bit 0 is lowest order.

Xtensa^TMISA provides two instructions, RSR and WSR, for (by data) be stored in special system register and It is taken out.Similarly, it provides two other instructions, RUR and WUR (will be described in detail) below, for storage with extensive The multiple various states illustrated in TIE.In order to store and recover the various states illustrated in TIE, it is necessary at RUR and The user register document that WUR instruction is able to access that is fixed by the mapping relations of each state to each row.Above-mentioned code following Part specifies this mapping relations:

//map COUNT and SWAP to user register file entries

user_register 0 COUNT

user_register 1 SWAP

Make following each instruction that the numerical value of COUNT be saved in a2, and the numerical value of SWAP be saved in a5:

RUR a2,0；

RUR a5,1；

This mechanism is actually used in test program, in order to verify every diesel locomotive of each state.In C language In, above-mentioned two instructions have a following form:

X=RUR (0)；

Y=RUR (1)；

The nested parts that TIE describes is the definition of the new instruction class containing new instruction BYTESWAP:

//define a new instruction class that

// - reads data from ars(predefined to be AR[s])

// - uses and writes state COUNT

// - uses state SWAP

Iclass bs{BYTESWAP}{out arr, in ars}{inout COUNT, in

SWAP}

Here, iclass is keyword, and bs is the name of iclass.Next clause is listed in instruction class (BYTESWAP) instruction in.Thereafter instruction appointment at the operand used that respectively instructs of this apoplexy due to endogenous wind (is in this example An one input operand ars and output function number arr).Last clause in iclass defines specifies at this (in this example, state SWAP will be read the various states that apoplexy due to endogenous wind is accessed by this instruction by this instruction, enter state COUNT Row is read and writes).

Last block of above-mentioned code is that BYTESWAP instruction provides formal semantical definition:

//semantic definition of byteswap

// COUNT the number of byte-swapped words

// Return the swapped or un-swapped data depending on SWAP

semantic bs {BYTESWAP}{

wire [31∶0] ars_swapped

{ ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] }；

Assign arr=SWAP？Ars_swapped:ars；

Assign COUNT=COUNT+SWAP；

}

This description uses a subset of Verilog HDL to describe combination logic.This block accurately specifies to refer to just How order collection simulated program will emulate BYTESWAP instruction, and how adjunct circuit is synthesized and is added to Xtensa^TM Among processor hardware, to support new instruction.

In the present invention realizing various user's definition status, the state illustrated can be made as its dependent variable With, in order to access and be stored in the information in various state.Occur in status identifier instruction on the right of an expression formula from This state reads.By a numerical value or an expression formula are distributed to status identifier, just can complete to be written to a kind of shape State.Such as, semantic code segment table below shows one instructs how to read or write various state:

Assign KEYC=sr==8 ' d2？Art [27:0]: KEYC；

Assign KEYD=sr==8 ' d3？Art [27:0]: KEYD；

Assign DATA=sr==8 ' d0？DATA [63:32], art}:{art,

DATA[63:32]}；

In order to illustrate in configurable processor, the example of the various instructions can being performed as kernel instruction, with And become available each purpose the instructed, " Xtensa that Tensilica company publishes via the selection of each config option^TMInstruction Architecture (ISA) reference manual " revised edition 1.0 is incorporated into herein the most as a reference.Further, in order to illustrate can be by It is used for performing the various examples of the TIE sound instruction that such user defines instruction, is published by Tensilica company equally " instruction extension language (TIE) reference manual " revised edition 1.3 is incorporated into herein the most as a reference.

From TIE describes, it is possible to use such as, it is similar to shown in Appendix D section program and produces these instructions of execution Hardware embodiments.Annex E represents the header file needed for supporting new instruction as intrinsic function and uses Code.

Use configuration instruction, can automatically generate the following:

The instruction decoding logic of processor 60；

Illegal command detection logic for processor 60；

The ISA private part of assembly program；

The special support program of ISA for compiler；

The ISA private part (being used by debugging routine) of disassembler；And

The ISA private part of simulated program.

Figure 16 is a figure, represents how the ISA private part of these software tools produces.TIE syntactic analysis journey It is several sections of Program Generating C language codes that sequence 410 describes file 400 from the TIE that user generates, and each section in said procedure is all produced A raw file, this document can be accessed by one or more SDKs, in order to obtains defining instruction about user Information with state.Such as, program tie2gcc 420 generates a C language header file 470 being referred to as xtensa_tie.h, its Include the intrinsic function for new instruction.Program tie2isa 430 generates a dynamic link libraries (DLL) 480, wherein contains (in the patent application of Wilson discussed below et al., this is wherein to be begged for define the information of instruction format about user The coding of opinion and the efficient combination of decoding DLL).Program tie2iss 440 generates performance simulation program, and produces one containing referring to The DLL490 that order is semantic, as, as discussed in the patent application of Wilson et al., this instruction semantic is compiled by main frame Program is used for being produced as the simulated program DLL that this simulated program is used.Program tie2ver 450 is retouched with a kind of suitable hardware Predicate speech is that user-defined instructions produces necessary description 500.Finally, program tie2xtos 460 be produced as RUR and The preservation that WUR instruction is used and recovery code 510.

To the fine description of instructions and they how to access various state and make it possible to produce and effectively patrol Volume, this logic is inserted among the design of existing high-performance microprocessor.Describe in conjunction with this embodiment of the present invention Various method special disposal those from one or more status registers read or write every new instruction therein.Especially It is that the present embodiment represents in the sense that the implementation of microprocessor class, how to derive the hardware for each status register Logic, the implementation of above-mentioned microprocessor all uses streamline, as obtaining high performance a kind of technology.

In the such as streamline embodiment shown in Figure 17, a status register is typically replicated several times, often One illustrates and all represents the numerical value being in the state among a specific pipeline stages.In the present embodiment, a kind of state It is converted into multiple copies of the depositor consistent with preferential core processor embodiment.Meanwhile, again with preferentially The consistent mode of core processor embodiment produce additional bypass and forward direction logic.Such as, contain to aim at one Having 3 core processor embodiments performing the stage, one State Transferring is 3 depositors by the present embodiment, its connection side Formula is as shown in figure 18.In this embodiment, each depositor 610-630 represent 3 pipeline stages one of them The numerical value of state in.Ctrl-1, ctrl-2, and ctrl-3 are control signals, in order in corresponding trigger 610-630 Activate data latch function.

The work consistent with preferential processor embodiment carried out to make multiple copies of status register It is required additional logic and control signal." unanimously " means that state should show and at interruption, exclusions and flowing water When line hang-up, remaining various state of processor are the most identical.Typically, a kind of given processor embodiment Definition represents some signal of various pipeline condition.Require that such signal can make pipeline state depositor correctly carry out Work.

In a typical streamline embodiment, performance element includes multiple pipeline stages.At this streamline Multiple levels carry out the calculating of an instruction.Instruction stream flows through from streamline according to the sequence guided by control logic.? Any given time, the instruction of up to n bar the most all may be had to be performed.Here n is the number of level.Exceed standard at one In the processor of amount, it is possible to use the present invention realizes, and the number of instruction in a pipeline can be n × w, and wherein, w is The exit width of processor.

The effect controlling logic is to confirm that the dependency between each instruction is complied with, and between each instruction Any interference be all addressed.If one instruction uses the data calculated by a previous instruction, then need special Hardware in the case of not blocking streamline, data are delivered to after an instruction.If occurring interrupting, institute the most in a pipeline There is instruction to be required for being killed, re-execute the most again.When owing to not possessing its required input data or computing hardware and When making call instruction to perform, this instruction should be suspended.The cheap method hanging up one article of instruction is the 1st execution rank at it Section just kills it, and re-executes this instruction in next cycle.The result of this technology is exactly to generate in a pipeline One invalid level (bubble).This bubble instructs together with other, flows through this streamline.At the impaired flowing water of each instruction The end of line, these bubbles are abandoned.

Use the example of above-mentioned 3 level production lines, adding needed for the typical embodiment of such a processor state Logic and connection are shown in Figure 19.

Under normal circumstances, the numerical value calculated in one-level will be sent to next instruction immediately, without wait This numerical value arrives the end of streamline, in order to reduce factor data dependency and number of times that the streamline that introduces is hung up.By directly The output of the 1st trigger 610 is sent to semantic chunk by ground so that it can be used by next instruction immediately, just can complete this One step.In order to process such as interrupt and except etc. abnormal conditions, the present embodiment need following 3 kinds of control signal: Kill_1, Kill_all, and Valid_3.

Signal Kill_1 represents the data required owing to not possessing it, so being currently at the 1st pipeline stages 110 Instruction should be killed.Signal Kill_all represent due to before them one instruction produced a kind of exclusions or Once interrupting has occurred in person, so all instructions in a pipeline all should be killed.Signal Valid_3 represents current place Instruction among afterbody 630 is the most effective.This situation is typically to kill one article of instruction in the 1st pipeline stages 610 And the result of a bubble (illegal command) occurs in a pipeline." Valid_3 " represent simply the 3rd pipeline stages it In instruction be effective or a bubble.It is clear that the most effective instruction should be latched.

Figure 20 is expressed as realizing the additional logic needed for status register and connection.It illustrates how that building control patrols simultaneously Volume, to drive each signal " ctrl-1 ", " ctrl-2 " and " ctrl-3 " so that the embodiment of status register meet above-mentioned respectively Item requirement.The sample HDL code that in order to realize status register Figure 19 shown in automatically generate is presented herein below.

Module tie_enflop (tie_out, tie_in, en, clk)；

Parameter size=32；

output[size-1:0] tie_out；

input[size-1:0] tie_in；

input en；

input clk；

reg[size-1∶0] tmp；

Assigntie_out=tmp；

always@(posedge clk) begin

if (en)

Tmp≤#1 tie_in；

end

endmodule

Module tie_athens_state (ns, we, ke, kp, vw, clk, ps)；

Parameter size=32；

input[size-1∶0] ns；//next state

input we； //write enable

input ke； //Kill E state

input kp； //Kill Pipeline

input vw； //Valid W state

input clk； //clock

output[size-1∶0]ps；//present state

wire[size-1∶0]se； // state at E stage

wire[size-1∶0]sm； // state at M stage

wire[size-1∶0]sw； // state at W stage

wire[size-1∶0]sx； // state at X stage

wire ee； // write enable for EM register

wire ew； // write enable for WX register

Assign se=kp？Sx:ns；

Assign ee=kp | we &～ke；

Assign ew=vw &～kp；

Assign ps=sm；

Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),

\.clk(clk))；

Tie_enflop # (size) state_MW (.tie_out (sw) .tie_in (sm),

.en (1 ' b1) .clk (clk))；

Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),

\.clk(clk))；

endmodule

If semantic chunk specifies this state as its input, then use above-mentioned pipeline state register model, should The current state value of state is sent to semantic chunk as an input variable.If semantic chunk has produces new numerical value for a kind of state Logic, then generate one group of output signal.This output signal is used as next state, is input to pipeline state and deposits Device.

The present embodiment allows multiple semantic description blocks, and each all describes the behavior of a plurality of instruction.This not Under affined describing mode, it is possible to an only subset of each semantic chunk is that a kind of given state produces next state Output.Furthermore it is possible to a given semantic chunk depends on conditionally within one period of preset time, it performs any instruction And produce the output of next state.It is then desired to additional hardware logic goes the next state of combination from all semantic chunks Output, to form the input being sent to pipeline state depositor.In this embodiment in accordance with the invention, for each semantic chunk certainly Derive one group of signal, to represent that this block the most produces a new numerical value for this state dynamicly.In another embodiment, Such one group of signal can be left designer for and go explanation.

Figure 20 illustrates how to combine the next State-output of a kind of state from several semantic chunk sl-sn, and suitably Select one of them to be input to status register.In this part of figure, op1_1 and op1_2 is the behaviour for the 1st semantic chunk Making code signal, op2_1 and op2_2 is the operation code signal for the 2nd semantic chunk, etc..The next State-output of semantic chunk i It is si (if there being multiple status register, then have multiple next State-output for this block).This signal has represented this semantic chunk i Through producing a new numerical value for this state si_we.Signal s_we indicates whether that any semantic chunk is that this state produces one Individual new numerical value, and be used as write enable signal and be input to pipeline state depositor.

Even if the ability to express of multi-semantic meaning block is not more than single semantic chunk, it is typically still by relevant instruction set In to a single block, more structurized description is provided.Owing to performing these instructions in the range of being more confined from, So multi-semantic meaning block can also cause the simpler analysis to instruction effect.On the other hand, for a single semantic chunk, logical Often have reason to describe the behavior of a plurality of instruction.Being most commonly that, this is owing to these hardware embodiments instructed are the most prosperous public Common logic.A plurality of instruction normally results in more effective hardware designs described in the single semantic chunk.

Owing to interrupting and exclusions, for software, it is necessary to load the numerical value of various states to data storage, with And recover the numerical value of (taking-up) various state from which.Based on new state and the formal description of new instruction, it is possible to real estate automatically Raw such recovery and loading instruction.In one embodiment of the invention, it is used for recovering with the logic loaded by automatic real estate Life is two semantic chunks, and the latter can be recursively converted to the actual hardware just like any other block.Such as, from lower column-shaped In the explanation of state:

State [63: 0] DATA cpn=0 autopack

State [27: 0] KEYC cpn=1 nopack

State [27: 0] KEYD cpn=1

User_register 0=DATA [31: 0]；

User_register 1=DATA [63: 32]；

User_register 2=KEYC；

User_register 3=KEYD；

Can produce following semantic chunk, in order to by " DATA ", the numerical value of " KEYC " and " KEYD " reads in each general register:

Iclass rur{RUR}{out arr, in st} (in DATA, in KEYC, in KEYD}

semantic rur(RUR){

Wire sel_0=(st==8 ' d0)；

Wire sel_1=(st==8 ' d1)；

Wire sel_2=(st==8 ' d2)；

Wire sel_3=(st==8 ' d3)；

Assign arr={32{sel_0}} & DATA [31: 0]

{32{sel_1}} & DATA[64∶32]

{32{sel_2}} & KEYC

{32{sel_3}} & KEYD；

}

Figure 21 represents the block diagram of the logic corresponding to this class semantic logic.Input signal " st " carry out with various constants Relatively, to form various selection signal, they are used to consistent method to be described with user_register, from each state Depositor selects some position.Using previous state description, the position 32 of DATA is mapped to the position 0 of the 2nd user register.Cause This, the 2nd input of MUX should be connected to the 32nd of DATA state in this figure.

Following semantic chunk can be produced, by the numerical value write state " DATA " from each general register, " KEYC " and “KEYD”

Iclass wur { WUR}{in art, in sr}{out DATA.out KEYC, out KEYD}

semantic wur (WUR) {

Wire sel_0=(st==8 ' d0)；

Wire sel_1=(st==8 ' d1)；

Wire sel_2=(st==8 ' d2)；

Wire sel_3=(st==8 ' d3)；

Assign DATA={sel_1？Art:DATA [63: 32], sel_0？Art:

DATA[31∶0]}；

Assign KEYC=art；

Assign KEYD=art；

Assign DATA_we=WUR；

Assign KEYC_we=WUR & sel_2；

Assign KEYD_we=WUR & sel_3；

}

Figure 22 represents when being mapped to the kth position of i-th user register, the logic of the jth position of state S.At one In WUR instruction, if user_register number " st " is " i ", then the kth position of " ars " is loaded onto S [j] depositor；Otherwise, The raw value of S [j] is re-circulated.If additionally, reloaded in any position of state S, then signal S_we is activated.

TIE user_register explanation is specified from the Additional processor state defined by state description to by these RUR Mapping relations with the identifier that WUR instruction is used, in order to this state outside instructing independent of TIE is read With write.

Annex F represents the code for producing RUR and WUR instruction.

The task that the is mainly applicable to switching of RUR and WUR instruction, in a multitask environment, multi-tasks Software is common Enjoy the processor run according to some dispatching algorithm.When activated, the state duration of this task is at the depositor of processor Among.When dispatching algorithm determines to be switched to another task, the state among each depositor of processor that is stored in is deposited Enter among memorizer, and by among the state of another task depositor from memory loads to processor.Xtensa^TMRefer to Architecture (ISA) is made to include that RSR and WSR instructs, in order to state defined in ISA to be read and writes.Such as, following generation Code is the part that task " is stored in memorizer ":

//save special registers

Rsr a0, SAR

Rsr a1, LCOUNT

S32i a0, a3, UEXCSAVE+0

S32i a1, a3, UEXCSAVE+4

Rsr a0, LBEG

Rsr a1, LEND

S32i a0, a3, UEXCSAVE+8

S32i a1, a3, UEXCSAVE+12

；if(config_get_value(″IsaUseMAC16″)){

Rsr a0, ACCLO

Rsr a1, ACCHI

S32i a0, a3, UEXCSAVE+16

S32i a1, a3, UEXCSAVE+20

Rsr a0, MR_0

Rsr a1, MR_1

S32i a0, a3, UEXCSAVE+24

S32i a1, a3, UEXCSAVE+28

Rsr a0, MR_2

Rsr a1, MR_3

S32i a0, a3, UEXCSAVE+32

S32i a1, a3, UEXCSAVE+36

；}

And following code is the part that task " is recovered from memorizer ":

//restore special registers

132i a2, a1, UEXCSAVE+0

132i a3, a1, UEXCSAVE+4

Wsr a2, SAR

Wsr a3, LCOUNT

132i a2, a1, UEXCSAVE+8

132i a3, a1, UEXCSAVE+12

Wsr a2, LBEG

Wsr a3, LEND

；if(config_get_value(″IsaUseMAC16″)){

132i a2, a1, UEXCSAVE+16

132i a3, a1, UEXCSAVE+20

Wsr a2, ACCLO

Wsr a3, ACCHI

132i a2, a1, UEXCSAVE+24

132i a3, a1, UEXCSAVE+28

Wsr a2, MR_0

Wsr a3, MR_1

132i a2, a1, UEXCSAVE+32

132i a3, a1, UEXCSAVE+36

Wsr a2, MR_2

Wsr a3, MR_3

；}

Here, SAR, LCOUNT, LBEG, LEND are cores Xtensa^TMThe processor status register part of ISA, And ACCLO, ACCHI, MR_0, MR_1, MR_2 and MR_3 are MAC16Xtensa^TMA part for ISA option.(each depositor All stored with pair wise and recovered, to avoid pipeline interlock.)

When designer defines new state with TIE, it also must carry out task switching as above state.Right For designer, a kind of probability is exactly, write simply task switching code (a therein part is had already given above) with And add instruction RUR/S32I and L32I/WUR being similar to above-mentioned code.But, when software is automatically generated and at knot When being correct on structure, configurable processor will be maximally effective.Therefore, the present invention includes a kind of device, in order to automatically Increase task switching code.Following each tpp row is added in above-mentioned store tasks:

；My $ off=0；

；my $i；

；For ($ i=0；$i<$#user_registers；$ i+=2)

Rur a2, ` $ user_registers [$ i+0] `

Rur a3, ` $ user_registers [$ i+1] `

S32i a2, UEXCUREG+ ` $ off+0`

S32i a3, UEXCUREG+ ` $ off+4`

；$ off+=8；

；}

；if(@user_registers & 1){

； # odd number of user registers

Rur a2, ` $ user_registers [$ #user_registers] `

S32i a2, UEXCUREG+` $ off+0`

；$ off+=4；

；}

And following each row is added in above-mentioned recovery tasks:

；My $ off=0；

；my $i；

；For ($ i=0；$i<$#user_registers；$ i+=2)

132i a2, UEXCUREG+ ` $ off+0`

132i a3, UEXCUREG+ ` $ off+4`

Wur a2, ` $ user_registers [$ i+0] `

Wur a3, ` $ user_registers [$ i+1] `

；$ off+=8；

；}

；if(@user_registers & 1){

； # odd number of user registers

132i a2, UEXCUREG+` $ off+0`

Wur a2, ` $ user_registers [$ #user_registers] `

；$ off+=4；

；}

Finally, task status region in memory should have the additional sky distributing to user register storage Between, and this space is defined as assembly program constant from the side-play amount counted of base address of task storage pointer UEXCUREG.This memory area is defined #define UEXCREGSIZE (16*4) with following code in advance

#define UEXCPARMSIZE (4*4)

；if (& config_get_value(″IsaUseMAC16″)){

#define UEXCSAVESIZE (10*4)

；}else{

#define UEXCSAVESIZE (4*4)

；}

#define UEXCMISCSIZE (2*4)

#define UEXCpARM 0

#define UEXCREG(UEXCPARM+UEXCPARMSIZE)

#define UEXCSAVE(UEXCREG+UEXCREGSIZE)

#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)

#define UEXCWIN(UEXCMISC+0)

#define UEXCFRAME

(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE)

which is changed to

#define UEXCREGSIZE (16*4)

#define UEXCPARMSIZE (4*4)

；if(& config_get_value(″IsaUseMAC16″)){

#define UEXCSAVESIZE (10*4)

；}else{

#define UEXCSAVESIZE (4*4)

；}

#define UEXCMISCSIZE (2*4)

#define UEXCUREGSIZE `@user_registers*4`

#define UEXCPARM 0

#define UEXCREG(UEXCPARM+UEXCPARMSIZE)

#define UEXCSAVE(UEXCREG+UEXCREGSIZE)

#define UEXCMISC(UEXCSAVE+UEXCSAVESIZE)

#define UEXCUREG(UEXCMISC+UEXCMISCSIZE)

#define UEXCWIN(UEXCUREG+0)

#define UEXCFRAME\

(UEXCREGSIZE+UEXCPARMSIZE+UEXCSAVESIZE+UEXCMISCSIZE+UEXCUREGSIZE

)

This code depends on one tpp variable@user_register of existence, and it has a user register number List, this is simply the list that a the 1st independent variable from each user_register statement generates.

In the microprocessor embodiment that some is more complicated, a kind of shape can be calculated in different pipeline states State, processing this step needs process as described herein is made some extensions (although simple extension).First, language is described Speech needs extension, enables a semantic chunk with a pipeline stages links together.Can use in the middle of several method One completes this step.In one embodiment, relevant pipeline stages can be specified significantly with each semantic chunk.? In another embodiment, it can be a scope of each semantic chunk appointment pipeline stages.In yet another embodiment, according to Required computation delay, can be that a given semantic chunk automatically derives pipeline stages.

The 2nd task that status of support produces in different pipeline stages processes various interruption exactly, various except feelings Condition and various hang-up.This is usually directed under the control of Pipeline control signal, increases suitable bypass and forward direction logic.One In individual embodiment, a standard drawing can be produced, in order to indicate this state when to produce and when it is used therebetween Relation.Based on applied analysis, it is possible to achieve suitable forward direction logic, to process common situation, and interlocking can be produced Logic, for the various situations not processed by forward direction logic, hangs up streamline.

The algorithm that this basic processing unit is used is depended on for revising the method for the instruction outlet logic of basic processing unit. But, in general, for great majority instruct, it is no matter single outlet or superscale, regardless of being for one-cycle instruction Or multi-cycle instructions, instruction outlet logic all only relies upon and is test for instruction, is used for producing:

1. indicate this instruction that whether various states are used as the various signals in a source for each processor state element；

2. indicate this instruction that whether various states are used as the various letters of a target for each processor state element Number；

3. indicate whether this instruction uses the various signals of each functional unit for each functional unit；

These signals are used to execution and mail to streamline and the export inspection that intersects, and are used to depending on streamline Outlet logic in update the state of streamline.TIE contains all required information, in order to increase various letters for every new instruction Number and their equation.

First, TIE state description causes generating one group of new signal for instruction outlet logic.The 3rd illustrated at iclass Or the 4th in or inout operand listed in independent variable or state be the 1st group for appointed processor state element Instructions listed in 2nd independent variable of equation increases instruction decoding signal.

Secondly, listed in the 3rd or the 4th independent variable that iclass illustrates in or inout operand or state are pin Instructions listed by 2nd independent variable of the 1st prescription formula of appointed processor state element is increased instruction decoding Signal.

3rd, the logic generated from each TIE semantic chunk represents a new functional unit, thus generates one group New cell signal, and, each decoding signal of the every TIE instruction for specifying for this semantic chunk passes through logical "or" computing It is grouped together, to form the 3rd prescription formula.

When an instruction is issued, it should update the state of streamline for following sending determines.Further, be used for repairing The method of the command issuing logic changing basic processing unit depends on the algorithm that this basic processing unit is used.But, some is general Observation be possible.Pipeline state should be to sending the logic reversal following state of offer:

4. when this result can be used for bypass, for the various signals of each issued instruction instruction target；

5. indicate this functional unit to be the various signals that another instruction is got ready for each functional unit.

The embodiments described herein is a single outlet processor, and wherein, the instructions of designer's definition is limited in Within one monocycle of logical calculated.In this case, the problems referred to above are appreciably simplified.Need not functional unit enter Row checks or intersection export inspection, and does not also have an one-cycle instruction can make a processor state element for next Bar instruction performs the preparation that pipeline is ready.Therefore, exporter's formula becomes just

Issue=(～srcluse | srclpipeready) & (～src2use | src2pipeready)

& (～srcNuse | srcNpipeready)；

And wherein src [i] pipeline ready signal is not affected by each extra-instruction, and src [i] use be according to The 1st equation group illustrating described on and revising.In this embodiment, it is not necessary to the 4th and the 5th group of signal.To more than one For the flexible embodiment of outlet and multicycle, its TIE will be expanded with a kind of latency explanation for each instruction and describe, Be given and set up the number calculating the cycle needed for streamline.

By the instruction decoding signal of each instruction is carried out logical "or" computing, they are concentrated in together, thus Producing the 4th group of signal in each semantic chunk pipeline stages, according to explanation, the execution of instruction completes in the stage.

The logic produced by acquiescence all will be fully sent into streamline, and each function therefore produced by TIE Unit, after accepting an instruction, is the most all monocyclic.In this case, for the 5th group of letter of each semantic chunk of TIE Number generally it is established.When needing to reuse the logic in each semantic chunk on multiple cycles, one further Bright general was specified within how many cycles, and these instructions are by this functional unit of use.In this case, by each instruction Instruction decoding signal carry out logical "or" computing, they are concentrated in together, thus in each semantic chunk pipeline stages Producing the 5th group of signal, the execution of each instruction completes in this grade of specified cycle count.

Alternatively, in a different embodiment, it can allow designer specify knot as the extension to TIE Really ready signal and functional unit ready signal.

The example of code carrying out according to the present embodiment processing is shown in each annex.For simplicity, this will not be made in detail Explanation；But, after refering to above-mentioned reference manual, this all will have been understood by professional person.Annex G is to realize a use The example of the instruction of TIE language；Annex H represents that TIE compiler is by for using what the compiler of such code produces. Similarly, annex I represents what TIE compiler will produce for simulated program；Annex J represents that TIE compiler will be for one Section user program in extend TIE instruction grand generation what；Annex K represents what TIE compiler will produce, in order to emulate Every TIE instruction in local mode；Annex L represents what TIE compiler will produce, as to additional firmware Verilog HDL describes；And annex M represents what TIE compiler will produce, as optimizing above-mentioned Verilog HDL The Design Compiler manuscript described, in order to assess TIE instruction in terms of area and speed to CPU size and the shadow of performance Ring.

As indicated above, in order to start processor configuration process, user is via above-mentioned GUI, by selecting One basic processing unit starts.As a part for process, as it is shown in figure 1, SDK 30 is established and is carried Supply user.SDK 30 contains 4 vitals relating to another aspect of the present invention, refers to Fig. 6: compiling journey Sequence 108, assembly program 110, instruction set simulation program 112, and debugging routine 130.

As known to professional person, compiler is writing with high-level programming language as such as C or C++ User application be converted to the assembler language that processor is special.High-level programming language as such as C or C++ is designed to Allow the author of application program so that their form of describing subtly is to describe their application program.These are not the most each Plant processor language to understand.The author of application program does not needs all of special of the processor for being used Characteristic and worry about.Typically, identical C or C++ program can be revised without modification or a little just can be dissimilar in many Processor in use.

C or C++ Program transformation is assembler language by assembly program.Assembler language is closer to machine language, and processor is straight Ground connection supports machine language.Different types of processor has the assembler language of their own.Each assembly instruction is the most straight Ground connection represents a machine instruction, but the two is the most identical.Assembly instruction is designed readable character string of behaving.Each Instruction or operand are all presented a significant name or memonic symbol, allow people can read assembly instruction, are prone to simultaneously Understand which kind of operation machine will carry out.Assembler language is converted to machine language by assembly program.Effectively will be every by assembly program Article one, assembly instruction string encoding is one or more machine instruction, and the latter can directly and effectively be executed by processor.

Machine code can be run the most on a processor, but the processor of various physics is not to be the most all immediately Can use.The processor setting up various physics is the process of time-intensive, expensive.When selecting possible processor configuration, user can not The processor selecting all to set up a physics for each.The substitute is, provide a user with a kind of referred to as simulated program Software program.Run on the simulated program on common computer and can emulate the user on user configured processor The effect of application program.Simulated program can imitate the semanteme of simulated processor, and can tell user actual place How soon reason device will have when running the application program of user.

Debugging routine is a kind of instrument, allow user with their software interactive formula search various problem.Debugging routine is permitted Their program is interactively run at family allowable.The execution that user can shut down procedure at any time, watches its C language attentively simultaneously Source code, obtained assembly code or machine code.User can also watch or revise on a breakpoint she (any or All) each variable or hardware register numerical value.Then user can continue executing with one statement of execution the most every time, the most often One machine instruction of secondary execution, perhaps forwards the new breakpoint that user selects to.

All 4 parts 108,110,112 and 130 are required for knowing user-defined instruction 750 (see Fig. 3), and emulate Program 112 and debugging routine 130 must also know user-defined state 752 by way of parenthesis.System allows user via being added The intrinsic call of C and C++ application program to user accesses user-defined instruction 750.Compiler 108 should for The instruction 750 of family definition, is converted to assembly language directive 738 by intrinsic call.Assembly program 110 should take out new compilation language No matter speech instruction 738, be directly to be write by user or changed by compiler 108, and encode that as corresponding to user Each machine instruction 740 of each instruction 750 of definition.User-defined each machine instruction 740 should be solved by simulated program 112 Code.It should simulate the semanteme of instructions, and it should simulate the performance of the instructions on configured processor. Simulated program 112 also should the numerical value that contained of analog subscriber definition status and performance.Debugging routine 130 should allow user to go Display assembly language directive 738, defines instruction 750 including user.It should allow user to watch and revise user's definition The numerical value of state.

In this aspect of the invention, user enables a kind of instrument, i.e. TIE compiler 702, processes the most possible User-defined every improvement 736.TIE compiler 702 is different from compiler 708, and user application is changed by the latter For assembler language 738.TIE compiler 702 sets up some parts, and it makes the basic software system 30 (compiling having built up Program 708, assembly program 710, simulated program 712 and debugging routine 730) go to use new, user-defined every improvement 736.Each element of software system 30 uses the most different set of each parts.

Figure 24 is a figure, illustrates how the TIE specified portions of these software tools produces.TIE compiler 702 Defining extension file 736 from user is some Program Generating C language codes, and each of which section all produces a file, one Or various software developing instrument can access this file, in order to obtain the information defining instruction and state about user.Example As, program tie2gcc 800 produces a C language header file 842 being referred to as xtensa-tie.h and (will make specifically below Bright), it contains the intrinsic function definition for new instruction.Program tie2isa 810 produces a dynamic link libraries (DLL) 844/ 848, it defines the information of instruction format and (encoding D LL 844 is described more detail below and decodes DLL's 848 containing being related to user Combination).Program tie2iss 840 produces for performance simulation and the C language code 870 of instruction semantic, as discussed below As, it is used for producing the simulated program DLL849 that simulated program is used, below by right by a main frame compiler 846 This makees narration in detail.Program tie2ver 850 defines instruction with suitable hardware description language for user and produces necessary description 850.Finally, program tie2xtos 860 preserves and recovers code 810, switches for scene？？Preserve and recover user and define shape State.In the application program of above-mentioned Wang et al., the additional information of realization about user's definition status can be found.

Compiler 708

In the present embodiment, the intrinsic call in user application is converted to assembly language directive by compiler 708 738, for user-defined improvement 736.Compiler 708 realizes this mechanism and in-line assembly mechanism at grand top, Such mechanism in compiler as such as GNU compiler it can be seen that.About the more information of these mechanism, Can be found in such as, " GNU and C++ compiler is user guided ", EGCS version 1.0.3.

Considering that a user wishes to generate a new instruction foo, it runs on two depositors, and a result is returned 3rd depositor.Instruction description is put into user and is defined among a specific catalogue of command file 750 by user, and enables TIE Compiler 702.TIE compiler 702 generates the file 742 with standard name as such as xtensa-tie.h.Should File contains the following definition of foo.

#define foo (ars, art)

({int arr；Asm volatile (" foo %0, %1, %2 ": "=a " (arr):

" a " (ars), " a " (a rt))；})

When user enables compiler 708 in her application program, she passes through command-line option or environmental variable, Tell that compiler 708 has user and defines the directory name improving 736.This catalogue also comprises xtensa-tie.h file 742.Compile File xtensa-tie.h is automatically included in C language or the C Plus Plus application program that user is compiling by translator program 708, just As the definition that user oneself has been written that foo.User includes intrinsic call in instruction in the application program of oneself foo.Due to the definition included in, so compiler 708 regards those intrinsic calls as calling the definition included in. The grand mechanism of standard provided according to compiler 708, compiler 708 processes when calling of grand foo, just looks like that user directly compiles Assembly language directive 738 rather than macro-call are write.It is to say, according to the in-line assembly mechanism of standard, compiler 708 To call and be converted to single assembly instruction foo.Such as, perhaps user has one to comprise the letter calling internal foo Number.

Int fred (int a, int b)

{

Return foo (a, b)；

}

Compiler utilizes user-defined instruction foo, and function is converted to following assembly language subprogram.

Fred:

.frame sp, 32

Entry sp, 32

#APP

Foo a2, a2, a3

#NO_APP

retw.n

When user creates one group of new user-defined improvement 736, it is not required to write new compiler.TIE compiles Translator program 702 simply creates file xtensa_tie.h742, and this document is automatically included in user by the compiler pre-build Application program.

Assembly program 710

In this embodiment, assembly program 710 uses code database 744 to encode assembly instruction 750.Enter this storehouse 744 include such as minor function:

The operation code that operation code mnemonics character string is converted to inside represents；

For the opcode field in a machine instruction 740, for often organizing the bitmap that operation code provides to be generated；With And

Operand value for the operand of each instruction encodes, and by the bitmap of encoded operand It is inserted in the operand field of machine instruction 740.

For example, it is contemplated that the example that we are above-mentioned calls the user function of internal foo.Assembly program may accept to refer to Make " foo a2, a2 a3 ", be then converted into the machine instruction represented by hexadecimal number 0 × 62230, wherein, high-order 6 Representing the operation code of foo together with

low level

0,2,2 and 3 represent 3 depositors a2, a2 and a3 respectively.

It is the combination based on form and intrinsic function that the inside of these functions realizes.Form can be by TIE compiler 702 Easily produce, but their ability to express is the most limited.When needs greater flexibility, such as when expressing operand volume During code function, TIE compiler 702 just can generate random C language code, and be included among storehouse 744.

Again imagine the example of " foo a2, a2, a3 ".Each register field is simply compiled with the number of depositor Code.TIE compiler 702 creates lower array function, and this function checks legal register value, if numerical value is legal, just The number of return register.

xtensa_encode_result encode_r(valP)

u_int32_t*valp；

{

U_int32_t val=*valp；

if((val>>4)！=0)

return xtensa_encode_result_too_high；

* valp=val；

return xtensa_encode_result_ok；

)

If whole codings is the simplest, avoid the need for any encryption function, as long as a form is sufficient to.So And, user can select more complicated coding.Following coding TIE language is write, by the value of operand divided by 1024 Each operand is encoded by business.Such coding is that the numerical value that is often coded of of the multiple of 1024 is for those requirements Highly useful.

Operand t×10t{t<<10}{t×10>>10}

Operand Coding and description is converted to following C language function by TIE compiler.

xtensa_encode_result encode_tx10(valp)

U_int32_t*valp；

{

U_int32_t t, tx10；

Tx10=*valp；

T=(tx10 > > 10) & 0 × f；

Tx10=decode_t × 10 (t)；

if(t×10！=* valp)

return xtensa_encode_result_not_ok；

} else{

* valp=t；

}

return xtensa_encode_result_ok；

}

Because for operand, possible span is very big, so can not carry out such with a form Coding.Form will have to the biggest.

In an embodiment of code database 744, the memonic symbol character string maps of operation code is internal by a form Operation code represents.In order to improve efficiency, this form may be sorted, or it is probably a hash table, or allows to carry out Effectively other data structures of retrieval.Another part of form closes often organizing operation code with the model foundation of a machine instruction maps System, is initialized as the suitable bitmap of this operation code by opcode field.There is identical operand field and operand coding Operation code be grouped together.For each operand in these groups, storehouse comprises a function operand value is encoded Becoming bitmap, these bitmaps are inserted among the suitable field of machine instruction by another function.A independent inside table will be every Individual instruction operands is mapped as these functions.Imagining an example, the number of result register is encoded as the bit of instruction 12…15.TIE compiler 702, will the bit 12 of instruction by lower for generation array function ... 15 values being set to result register (number Code):

Void set_r_field (insn, val)

xtensa_insnbuf insn；

u_int32_t val；

{

Insn [0]=(insn [0] & 0 × ffff0fff) | (val < < 12) & 0 × f000)；

In order to just user-defined instruction, code database 744 can be changed in the case of need not again writing assembly program 710 It is implemented as a dynamic link libraries (DLL).DLLs is the standard mode allowing program dynamically extend its function.Process DLLs's Details is different in different host operating systems, but basic conception is the same.DLL is as the expansion of program code Fill, be dynamically loaded among active program.Operation time linker solves between DLL and mastery routine and DLL And the symbolic reference between other DLLs loaded.For code database or DLL744, the sub-fraction of code is static Be connected to assembly program 710.This code is responsible for loading DLL, by the information in DLL and the instruction system 746 pre-build Existing coding information (may load from an independent DLL) is combined, and makes this information can pass through as above institute The each interface function stated conducts interviews.

When user creates new improvement 736, she enables TIE compiler on the basis of the description improving 736 descriptions 702.The C language code definition that TIE compiler 702 generates realizes inside table and the function of encoding D LL.TIE compiler Then 702 enable the native compiler 746 of host computer system, and (code of its compiling runs on main frame rather than is being configured Processor on run), in order to create encoding D LL144 for user-defined instruction 750.User, in its application program, uses Mark or environmental variable enable the assembly program 710 write in advance, and these marks or environmental variable point to and define containing user The catalogue of every improvement 736.The assembly program 710 write in advance dynamically opens DLL744 in catalogue.For each For bar assembly instruction, the assembly program 710 write in advance uses encoding D LL744 to carry out search operation code memonic symbol character string, seeks Look for opcode field bitmap in machine instruction, and each instruction operands is encoded.

Such as, when assembly program 710 finds TIE instruction " foo a2, a2, a3 ", assembly program 710 is by a form Finding, " foo " operation code is converted to be in the numeral 6 of bit position 16 to 23.From table, it finds volume for each depositor Code function.A2 is encoded to numeral 2 by function, and another a2 is encoded to numeral 2, and a3 is encoded to numeral 3.From table, it is looked for To suitable, function is set.Result value 2 is put into the bit location 12 of this instruction by Set_r_field ... 15.Similar arranges letter Suitable place is also put in other 2 and 3 by number.

Simulated program 712

Simulated program 712 interacts with user-defined every improvement 736 in several ways.For machine instruction 740 For, instruction must be decoded by simulated program 712, say, that is operation code and operand unit by Command Resolution.With Every improvement 736 of family definition is decoded by a function of decoding DLL748 that (encoding D LL744 and decoding DLL748 are real Same DLL it is probably) on border.For example it is assumed that user defines three operation codes: foo1, foo2 and foo3, at the ratio of instruction The coding of special 16 to 23 is respectively 0 × 6,0 × 16 and 0 × 26, and is 0 at bit 0 to 3,.Under TIE compiler 702 generates The decoding functions of row, operation code compares by it with all user-defined instructions 750:

int decode_insn(const xtensa_insnbuf insn)

{

If ((insn [0] & 0 × ff000f)==0 × 60000) return xtensa_fool_op；

If ((insn [0] & 0 × ff000f)==0 × 160000) return

xtensa_foo2_op；

If ((insn [0] & 0 × ff000f)==0 × 260000) return

xtensa_foo3_op；

return XTENSA_UNDEFINED；

}

When user-defined instruction number is a lot, operation code is carried out with all possible user-defined instruction 750 Relatively it is probably time-consuming, so the switch statement group that TIE compiler can use separately level replaces.

switch(get_op0_field(insn)){

Case 0 × 0:

switch(get_op1_field(insn)){

Case 0 × 6:

switch(get_op2_field(insn)){

Case 0 × 0:return xtensa_fool_op；

Case 0 × 1:return xtensa_foo2_op；

Case 0 × 2:return xtensa_foo3_op；

Default:return XTENSA_UNDEFINED；

}

Default:return XTENSA_UNDEFINED；

}

Default:return XTENSA_UNDEFINED；

}

In addition to being decoded instruction operation code, decoding DLL748 also includes for being decoded instruction operands Function.The mode completed is identical with in encoding D LL744 encoding operand.First, the function of DLL748 is decoded Selection operation digital section from machine instruction.Continuing above-mentioned example, TIE compiler 702 generates following function, from one 12 to 15 bits of instruction choose a numerical value:

u_int32_t get_r_field (insn)

xtensa_insnbuf insn；

{

return((insn[0] & 0×f000)>>12)；

}

TIE includes coding and the description of decoding to the description of an operand, so in view of encoding D LL744 uses operation Number encoder describes, and decoding DLL748 uses operand decoding to describe.Such as, the description of TIE operand is:

Operand t×10t{t<<10}{t×10>>10}

Generate following operand decoding functions:

u_int32_t decode_t×10(val)

u_int32_t val；

{

U_int32_t t, t × 10；

T=val；

T × 10=t < < 10；

return t×10；

}

When user enables simulated program 712, she tells that simulated program 712 is containing user-defined every improvement 736 The catalogue of decoding DLL748.Simulated program 712 opens suitable DLL.Whenever an instruction is decoded by simulated program 712 Time, if this instruction is successfully decoded not over the decoding functions of the instruction system write in advance, then simulated program 712 just enable the decoding functions in DLL748.

After providing a decoded instruction 750, simulated program 712 must to instruction 750 semanteme explain and Simulation.This completes with function fashion.Every instruction 750 has the functions of correspondence, allows the simulated program 712 language to this instruction 750 Justice is simulated.Whole states of the processor being modeled are kept following the tracks of by simulated program 712 in inside.Simulated program 712 has Fixing interface is for updating or the state of query processor.As it has been described above, user-defined every improvement 736 is hard by TIE Part describes what language was write as, and this language is a subset of Verilog.Hardware description language is converted to by TIE compiler 702 C language function, simulated program 712 utilizes above-mentioned C language function to simulate new improvement 736.Hardware description language operator Directly be converted to the C language operator of correspondence.The operation of read states or write state is converted into the interface of simulated program, is used for Processor state is updated or inquires about.

As an example in the present embodiment, it is assumed that have a user to create an instruction, deposit in order to increase by two Device.This example is selected to be intended merely to simplicity.The semanteme increased can be done description below with hardware description language by user:

Semantic add{add}{assign arr=ars+art；}

Output register is represented by internal name arr, and it has been assigned the sum of two input registers, and the two is defeated The internal name entering depositor is ars and art respectively.TIE compiler 702 takes this description, and generates simulated program 712 The semantic function used:

Void add_func (u32 _ OPND0_, u32_OPND1_, u32_OPND2_, u32

_OPND3_)

{

Set_ar (_ OPND0_, ar (_ OPND1_)+ar (_ OPND2_))；

pc_incr(3)；

}

Hardware computation symbol "+" be directly converted to correspondence C language operator "+".The reading of hardware register ars and art Take calling of function " ar " that be converted into simulated program 712.The write of hardware register arr is converted into simulated program Function " the set ar " of 712 calls.Because every the content of program counter pc is the most impliedly added this instruction by instruction Size, so TIE compiler 702 also generates, simulated program 712 function is called, makes simulated pc increase by 3, i.e. add Method instruction size.

When TIE compiler 702 is activated, create a semanteme as above for each user-defined instruction Function, the most also creates a form, and whole operation code names is mapped among relevant semantic function by it.Use standard Compiler 746 form function is compiled in simulated program DLL749.When user enables simulated program 712, she Tell the simulated program 712 catalogue containing user-defined every improvement 736.Simulated program 712 opens suitable DLL.Whenever When enabling simulated program 712, instruction all of in program is decoded by it, and creates a form, wherein contains every finger Make the mapping relations to relevant each semantic function.When setting up mapping relations, simulated program 712 opens DLL, retrieves suitable language Justice function.When emulating the semanteme of user-defined instruction 736, simulated program 712 directly enables the letter in DLL Number.

In order to how long the time needed for telling user to run application program on simulated hardware has, simulated program 712 Need the implementation effect of emulator command 750.Simulated program 712 employs pipeline model for this.Every instruction is in several cycles Upper execution.In each cycle, instruction uses the different resource of machine.Simulated program 712 begins attempt to be performed in parallel to be owned Instruction.If a plurality of instruction uses identical resource in the same cycle, instruction the most below is suspended, to wait that resource is risen Out.If the state write in the cycle below of the instruction above is read in instruction below, instruction the most below is just hung Rise, to wait that this numerical value is written into.Simulated program 712 uses function interface to simulate the effect of each instruction.For each The instruction of type all creates a function.These functions include calling simulated program interface, this interface analog processor Performance.

For example it is assumed that there are simple 3 register instruction foo.TIE compiler may create following emulation journey Order function:

Void foo_sched (u32 op0, u32 op1, u32 op2, u32 op3)

pipe_use_i fetch(3)；

Pipe_use (REGF32_AR, op1,1)；

Pipe_use (REGF32_AR, op2,1)；

Pipe_def (REGF32_AR, op0,2)；

pipe_def_ifetch(-1)；

}

Calling pipe_use_ifetch and tell that simulated program 712 claims, needs are taken 3 bytes by this instruction.Right Twice of pipe_use calls tells that simulated program 712 claims, and two input registers will read in the cycle 1.To pipe_def's Calling and tell that simulated program 712 claims, output register will be written in the cycle 2.Pipe_def_ifetch is called and tells Simulated program 712 claims, and this instruction Bu Shiyige branch, therefore next instruction can be removed at next cycle.

The pointer of these functions is placed on in a form together with each semantic function.Function is inherently as semantic function one Sample is compiled among DLL749.When enabling simulated program 712, it creates instruction and runs the mapping relations of function.When When setting up mapping relations, simulated program 712 opens DLL749, retrieves suitable performance function.When to user-defined instruction 736 Implementation status when emulating, simulated program 712 directly enables the function in DLL749.

Debugging routine 730

Debugging routine interacts with user-defined every improvement 750 in two ways.First, user can show pin Assembly language directive 738 to user-defined instructions 736.In order to accomplish this point, debugging routine 730 must be by machine Sound instruction 740 is decoded as assembly language directive 738.The principle used when instruction is decoded by this with simulated program 712 is Identical, and the DLL that uses of debugging routine 730 preferably DLL with using when simulated program 712 decodes is identical.Except Outside being decoded instructions, decoded instruction must also be converted to character string by debugging routine.To this end, decoding DLL748 Including a function, the operation code of each inside is represented and is mapped as corresponding memonic symbol character string by it.This can be by portion Simple table realizes.

User can use mark or environmental variable to enable the debugging routine write in advance, these mark or environmental variables Point to the catalogue containing user-defined every improvement 750.The debugging routine pre-build dynamically opens suitable DLL748.

Debugging routine 730 also interacts with user-defined state 752.Debugging routine 730 allows for reading and repairing Change state 752.To this end, debugging routine 730 communicates with simulated program 712.It has many to simulated program 712 state of inquiring Greatly, and what the name of state variable is.Whenever debugging routine 730 be required show User Status numerical value time, it just as Inquire that predefined state is the same and inquire this numerical value to simulated program 712.Similarly, in order to revise the state of user, debugging Program 730 tells that state is arranged to a set-point by simulated program 712.

Thus, it will be seen that the embodiment user-defined instruction set and state supported according to the present invention, can To use the module of definition user function to complete, these modules are embedded among kernel software developing instrument.Therefore, exploitation During one system, specific user-defined every embedding module of improving can use as the one of internal system group, in order to In tissue and operation.

Additionally, kernel software developing instrument may be exclusively used in specific kernel instruction set and processor state, and use The set of the single embedding module of every improvement of family definition, may be with resident many kernel software developing instruments in systems Combine and be evaluated.

Adnexa A

#Xtensa configuration database explanation

# Id:Definition, v1.651999/02/04 15:30:45adixit Exp.

These encoded instructions of #, statement, and computer program are Tensilica company

The Proprietary Information of the secrecy of #, in the case of not obtaining the written consent that Tensilica company is prior, must not Open to third party, or carry out all or part of copy in any form

#

# this be configuration parameter define file.

The all configurations being supported of # all must be illustrated in this document

The instrument of all analysis configuration of # all should check the correctness of this file

# should keep minimum to the change of this file, and processes carefully

#

# UNC

The name of # most parameters is all with a class name beginning in list:

# Addr address and conversion parameter

# Build ？

# Cad target CAD environment

# DV every design verification parameter

In # Data the following one:

# DataCache data caching parameter

# DataRAM data RAM parameter

# DataROM data ROM parameter

# Debug debugging routine option parameter

# Impl embodiment the objectives

In # Inst the following one:

# InstCache instruction cache parameter

# InstRAM instructs RAM parameter

# InstROM instructs ROM parameter

# Interrupt interrupt parameters

# Isa instruction set architecture parameter

# Iss instruction set simulation program parameter

# PIF processor interface parameter

# Sys systematic parameter (such as memorizer mapping)

# TIE special instruction parameter

# Test production test parameter

# Timer cycle count/compare option

# Vector reset/exclusions/interrupt vecter address

# many parameters end up with a suffix, be given they measured time unit used:

# Bits

# Bytes (i.e. 8)

# Count is used as general " number " suffix

# Entries is similar to Count

The absolute path name of # Filename file

# Interrupt interrupts mark (0 ... 31)

# Level interrupt level (1 ... 15)

# Max maximum

# Paddr physical address

Enumerate for one of the possible numerical value of # Type

# Vaddr virtual address

The form of this document:

Row 1: configuration parameter name

Row 2: the default value of parameter

Row 3: the perl of the effectiveness of verification value represents

# Xtensa Configuration Database Specification

# SId:Definition, v1.65 1999/02/04 15:30:45adixit Exp $

□

# These coded instructions, statements, andcomputer programs are

# Confidential Proprietary Information of Tensilica Inc.and may not

be

# disclosed to third parties or copied in any form, in whole or in

Part,

# without the prior written consent of Tensilica Inc。

#

# This is the configuration parameter definition file。

# -All supported configurations must be declared in thisfile

# -All tools parsing configurations must check against this file for

validity

# -Changes to this file must be kept minimum and dealt with care

#

# Naming Conventions

# Most parameter names begin with a category name from the following

# list:

# Addr Addressing and translation parameters

# Build ？

# Cad Target CAD environment

# DV Design Verification parameters

# Data One of the following:

# DataCache Data Cache parameters

# DataRAM Data RAM parameters

# DataROM Data ROM parameters

# Debug Debug option parameters

# Impl Implementation goals

# Inst One of the following:

# InstCache Instruction Cache parameters

# InstRAM Instruction RAM parameters

# InstROM Instruction ROM parameters

# Interrupt Interrupt parameters

# Isa Instruction Set Architecture parameters

# Iss Instruction Set Simulator parameters

# PIF Processor Interface parameters

# Sys System parameters (e.g.memory map)

# TIE Application-specific instruction parameters

# Test Manufacturing Test parameters

# Timer Cycle count/compare option parameters

# Vector Reset/Exception/Interrupt vector addresses

# Many parameters end in a suffix giving the units in which they

# are measured:

# Bits

# Bytes (i.e. 8 bits)

# Count used as a generic″number of″suffix

# Entries similar to Count

# Filename absoluate pathname of file

# Interrupt interrupt id (0..31)

######################################################################

#

ISA option

#

######################################################################

######

IsaUseClamps 0 0|1

IsaUseMAC16 0 0|1

IsaUseMul16 0 0|1

IlsaUseException 1 1

IsaUseInterrupt 0 0|1

IsaUseHighLevelInterrupt 0 0|1

IsaUseDebug 0 0|1

IsaUseTimer 0 0|1

IsaUseWindowedRegisters 1 1

IsaMemoryOrder LittleEndian LittleEndian|BigEndian

IsaARRegisterCount 32 32|64

######################################################################

######

# address and conversion

######################################################################

######

AddrPhysicalAddressBits 32 1[6-9]|2[0-9]|3[0-2]

AddrVirtualAddressBits 32 1[6-9]|2[0-9]|3[0-2]

######################################################################

######

# data caching/RAM/ROM

######################################################################

######

DataCacheBytes 1k 0k|1k|2k|4k|8k|16k

DataCacheLineBytes 16 16|32|64

DataRAMBytes 0k 0k|1k|2k|4k|8k|16k

DataROMBytes 0k 0k|1k|2k|4k|8k|16k

DataWriteBufferEntries 4 4|8|16|32

DataCacheAccessBits 32 32|64|128

######################################################################

#

Instruction cache/RAM/ROM

#

######################################################################

######

InstCacheBytes 1k 0k|1k|2k|4k|8k|16k

InstCacheLineBytes 16 16|32|64

InstRAMBytes 0k 0k|1k|2k|4k|8k|16k

InstROMBytes 0k 0k|1k|2k|4k|8k|16k

InstCacheAccessBits 32 32|64|128

######################################################################

##

Processor interface

#

######################################################################

######

PIFReadDataBits 32 32|64|128

PIFWriteDataBits 32 32|64|128

PIFTracePort 0 0|1

######################################################################

##

System

#

######################################################################

######

SysAppStartVAddr 0×40001000 0×[0-9a-fA-F]+

SysDefaultCacheAttr 0×fff21122 0×[0-9a-fA-F]+

SysROMBytes 128k [0-9]+(k|m)

SysROMPAddr 0×20000000 0×[0-9a-fA-F]+

SysRAMBytes 1m [0-9]+(k|m)

SysRAMPAddr 0×40000000 0×[0-9a-fA-F]+

SysStackBytes 16k [0-9]+(k|m)

SysXMONBytes 0×0000fd00 0×[0-9a-fA-F]+

SysXMONVAddr 0×20000300 0×[0-9a-fA-F]+

SysXTOSBytes 0x00000c00 0x[0-9a-fA-F]+

SysXTOSVAddr 0×40000400 0×[0-9a-fA-F]+

######################################################################

#″″″″

Vector address

#

#####################################################################

######

VectorResetVAddr 0×20000020 0×[0-9a-fA-F]+

VectorUserExceptionVAddr 0×40000214 0×[0-9a-fA-F]+

VectorKernelExceptionVAddr 0×40000204 0×[0-9a-fA-F]+

VectorWindowBaseVAddr 0×40000000 0×[0-9a-fA-F]+

VectorLevel2InterruptVAddr 0×40000224 0×[0-9a-fA-F]+

VectorLevel3InterruptVAddr 0×40000234 0×[0-9a-fA-F]+

######################################################################

######

Interrupt option

#

######################################################################

######

InterruptCount 1 [1-9]|1[0-9]|2[0-9]|3[0-2]

InterruptLevelMax 1 [1-3]

Interrupt0Type External External|Internal|Software

InterruptlType External External|Internal|Software

Interrupt2Type External External|Internal|Software

Interrupt3Type External External|Internal|Software

Interrupt4Type Externa External|Internal|Software

Interrupt5Type External External|Internal|Software

Interrupt6Type External External|Internal|Software

Interrupt7Type External External|Internal|Software

Interrupt8Type External External|Internal|Software

Interrupt9Type External External|Internal|Software

Interrupt10Type External External|Internal|Software

Interrupt1lType External External|Internal|Software

Interrupt12Type External External|Internal|Software

Interrupt13Type External External|Internal|Software

Interrupt14Type External External|Internal|Software

Interrupt15Type External External|Internal|Software

Interrupt16Type External External|Internal|Software

Interrupt17Type External External|Internal|Software

Interrupt18Type External External|Internal|Software

Interrupt19Type External External|Internal|Software

Interrupt20Type External External|Internal|Software

Interrupt21Type External External|Internal|Software

Interrupt22Type External External|Internal|Software

Interrupt23Type External External|Internal|Software

Interrupt24Type External External|Internal|Software

Interrupt25Type External External|Internal|Software

Interrupt26Type External External|Internal|Software

Interrupt27Type External External|Internal|Software

Interrupt28Type External External|Internal|Software

Interrupt29Type External External|Internal|Software

Interrupt30Type External External|Internal|Software

Interrupt31Type External External|Internal|Software

Interrupt0Level 1 [1-3]

InterruptlLevel 1 [1-3]

Interrupt2Level 1 [1-3]

Interrupt3Level 1 [1-3]

Interrupt4Level 1 [1-3]

Interrupt5Level 1 [1-3]

Interrupt6Level 1 [1-3]

Interrupt7Level 1 [1-3]

Interrupt8Level 1 [1-3]

Interrupt9Level 1 [1-3]

Interrupt10Level 1 [1-3]

InterruptllLevel 1 [1-3]

Interrupt12Level 1 [1-3]

Interrupt13Level 1 [1-3]

Interrupt14Level 1 [1-3]

Interrupt15Level 1 [1-3]

Interrupt16Level 1 [1-3]

Interrupt17Level 1 [1-3]

Interrupt18Level 1 [1-3]

Interrupt19Level 1 [1-3]

Interrupt20Level 1 [1-3]

Interrupt21Level 1 [1-3]

Interrupt22Level 1 [1-3]

Interrupt23Level 1 [1-3]

Interrupt24Level 1 [1-3]

Interrupt25Level 1 [1-3]

Interrupt26Level 1 [1-3]

Interrupt27Level 1 [1-3]

Interrupt28Level 1 [1-3]

Interrupt29Level 1 [1-3]

Interrupt30Level 1 [1-3]

Interrupt31Level 1 [1-3]

######################################################################

Other processor component options processor Timer Options

#

######################################################################

######

TimerCount 0 [0-3]

Timer0Interrupt 0 [0-9]|1[0-9[12[0-9]|3 [0-1]

Timer1Interrupt 0 [0-9]|1[0-9]12[0-9]|3 [0-1]

Timer2Interrupt 0 [0-9]|1[0-9]12[0-9]|3 [0-1]

######################################################################

######

Debugging routine option

#

######################################################################

######

DebugDataVAddrTrapCount 0 [0-2]

DebugInstVAddrTrapCount 0 [0-2]

DebugInterruptLevel 2 [2-3]

DebugUseOnChipDebug 0 0|1

######################################################################

######

Instruction set simulation program

#

#######################################################################

######

ISSArgcPAddr 0×00012000 0×[0-9a-fA-F]+

ISSArgvPAddr 0×00012004 0×[0-9a-fA-F]+

#####################################################################

######

Design verification

#

######################################################################

######

DVMagicLocPAddr 0×00010000 0×[0-9a-fA-F]+

DVSerialRXADataPAddr 0×00011000 0×[0-9a-fA-F]+

DVSerialRXBDataPAddr 0×00011010 0×[0-9a-fA-F]+

DVSerialRXStatusPAddr 0×00011020 0×[0-9a-fA-F]+

DVSerialRXRequestPAddr 0×00011030 0×[0-9a-fA-F]+

DVCachedVAddr 0×60000000 0×[0-9a-fA-F]+

DVNonCachedVAddr 0×80000000 0×[0-9a-fA-F]+

######################################################################

######

Test option

#

######################################################################

######

TestFullScan 0 0|1

TestLatchesTransparent 0 0|1

######################################################################

##

Processor embodiment configures

#

######################################################################

######

ImplTargetSpeed 250 [1-9] [0-9] *

ImplTargetSize 20000 [1-9] [0-9] *

ImplTargetPower 75 [1-9] [0-9] *

ImplSpeedPriority High High|Medium|Low

ImplPowerPriority Medium High|Medium|Low

ImplSizePriority Low High|Medium|low

ImplTargetTechnology 25m

18m|25m|35m|cx3551|cx3301|acb25typ|acb25wst|t25typical| t25worst|

t35std|lss3g|ibm25typ|ibm25wc|vst_tsmc25tym

ImplOperatingCondition Typical Worst|Typical

######################################################################

######

CAD option

######################################################################

######

CadParUseApollo 1 0|1

CadParUseSiliconEnsembl 0 0|1

CadSimUseVCS 1 0|1

CadSimUseVerilogXL 1 0|1

CadSimUseVerilogNC 1 0|1

CadSimUseVantage 0 0|1

CadSimUseMTI 0 0|1

CadStvUseMotive 0 0|1

CadStvUsePrimeTime 1 0|1

CadSynUseBuildGates 0 0|1

CadSynUseDesignCompiler 1 0|1

######################################################################

#

TIE command file.It must be absolute path name

#

######################################################################

######

TIE filename/.* |-

######################################################################

######

####################################################################

######

#

Following program segment is only used for inside.To any inner parameter is up sent, PLSCONFM

#

The all product component of # can support it.

######################################################################

######

######################################################################

######

#Constants for Athens implementation

IsaUseAthensCacheTest 1 0|1

IsaUseSpeculation 0 0

IsaUseCoprocessor 0 0

IsaUseFloatingPoint 0 0

IsaUseDSP 0 0

IsaUseDensityInstruction 1 1

IsaUse32bitMulDiv 0 0

IsaUseAbsdif 0 0

IsaUseCRC 0 0

IsaUsePopCount 0 0

IsaUseLeadingZeros 0 0

IsaUseMinMax 0 0

IsaUseSignExtend 0 0

IsaUseSynchronization 0 0

DataCacheIndexLock 0 0

DataCacheIndexType physical physical

DataCacheMaxMissCount 1 1

DataCacheMissStart 32 32

DataCacheParityBits 0 0

DataCacheSectorSize 16 16

DataCacheTagParityBits 0 0

DataCacheTagType physical physical

DataCacheWayLock 0 0

InstCacheIndexLock 0 0

InstCacheIndexType physical physical

InstCacheMaxMissCount 1 1

InstCacheMissStart 32 32

InstCacheParityBits 0 0

InstCacheSectorSize 16 16

InstCacheTagParityBits 0 0

InstCacheTagType physical physical

InstCacheWayLock 0 0

######################################################################

######

# Build mode...for Web customers.They can run a limited number of

# production builds, but as many eval builds as they like.

#UserCID is used for fingerprinting

######################################################################

######

BuildMode Evaluation Evaluation|Production

BuildUserCID 999 [0-9]+

#####################################################################

######

######################################################################

######

#Values used by the GUI-basically persistent state

######################################################################

######

######################################################################

######

SysAddressLayout Xtos Xtos|Manual

Accessories B

#！/usr/xtensa/tools/bin/perl

# Tensilica PreProcessor

# SId:tpp, v 1.15 1998/12/17 19:36:03 earl Exp $

# Modified:Kaushik Sheth

# The original code was taken from Iain McClatchie。

# perl preprocessor

warrantee implied。

# Author:Iain McClatchie

# You can redistribute and/or modify this software under the terms ofthe

# GNU General Public License as published by the Free SoftwareFoundation；

# either version 2, or (at your option) any later version.

use lib″@xtools@/lib″；

package tpp；

# Standard perl modules

use strict；

use Exporter()；

use Getopt::Long；

# Module stuff

@tpp::ISA=qw (Exporter)；

@tpp::EXPORT=qw (

include

error

)；

@tpp::EXPORT_OK=qw (

include

gen

error

)；

%tpp::EXPORT_TAGS=()；

use vars qw(

$debug

$lines

@incdir

$config

$output

@global_file_stack

)；

#Main program

{

S::myname=' tpp '； # for error messages

# parse command line

$ debug=0； # -debug command line option

$ lines=0； # -linescommand lineoption

@incdir=()； # -I command line options

$ config="； # -c command line option

$ output=undef； # -o command line option

My@eval=()；

if(！GetOptions(

″debug！"=> $ debug,

″lines！"=> Slines,

" I=s@"=>@incdir,

" c=s "=> $ config,

" o=s "=> $ output,

" eval=s@"=>@eval)

‖@ARGV≤0)

# command line error

print STDERR<<″END″；

tpp[args]file

Applies a perl preprocessor to the indicated file, and any files

included therein；the output of the preprocessor is written to

stdout.Perl is embedded in the source text by one of two means.

Whole lines of perl can be embedded by preceding them with a

semicolon(you would typically do this for looping statments or

Subroutine calls) .Alternatively, perl expressions can be embedded

into the middle of other text by escaping them with backticks。

-debug Print perl code to STDERR, so you can figure out why your

embedded

perl statements are looping forever。

-lines Embed ' #line 43 " foo.w " ' directives in output, for

more

comprehensible error and warning messages from later

tools。

-I dir search for include files in directory dir

-o output_file Redirect the output to afile rather than astdout。

-c config_file Read the specified config file。

-e eval Eval eval before running program

NOTE:

the lines with only″；″and″；//″will go unaltered.

END

exit(1)；

}

#Initialize

Push (@INC ,@incdir)；

@global_file_stack=()；

#Read configuration file

tppcode::init($config)；

# Open the output file

if($output){

Open (STDOUT, " > $ output ")

‖ die (" $:: myname:$！, opening ' $ output ' n ")；

}

# Process evals

foreach(@eval){

tppcode::execute(S_)；

}

# Process the input files

foreach (@ARGV){

include($_)；

}

# Done

exit(0)；

}

sub include{

My ($ file)=@_；

My ($ buf, $ tempname ,@chunks, $ chunk, $ state, $ lasttype)；

If ($ file=～m | ^/|)

if(！Open (INP, " < $ file "))

Error ($ file, " $！, opening $ file ")；

}

}else{

my $path；

Foreach path (". ", incdir)

If (open (INP, " < $ path/ $ file "))

$ file=" $ PATH/ $ FILE "；

last；

}

Error ($ file, " Couldn ' t find $ file in@INC ")

If tell (INP)==-1；

}

$ lasttype=" "；

while(<INP>){

If (/ ^ s*；(.*) $ /)

My $ l=$ 1；

if($lasttype ne″perl″){

$ lasttype=" perl "；

}

If ((/ ^ s*；S*///) ‖ (and/^ s*；S* $ /))

$ buf.=" print STDOUT " $ _ "；\n″；

}else{

$ buf.=$ 1. " n "；

}

}else{

if($lines and $lasttype ne″text″){

$ buf.=" print STDOUT " #line. " file " n "；\n″；

$ lasttype=" text "；

}

chomp；

if(m/^$/){

$ buf.=" print STDOUT " n "；\n″；

next；

}

@chunks=split (" ` ")；

$ state=0；

$ tempname=" 00 "；

foreach $chunk(@chunks){

If ($ state==0)

$ chunk=quotemeta ($ chunk)；

$ state=1；

} else{

If ($ chunk=～m/^ W/) { #Perl expression

$ buf.=" $ temp $ tempname=$ chunk；\n″；

$ chunk=" $ { temp $ tempname } "；

$tempname++；

$ state=0；

} else{ # Backquoted something

$ chunk=" ` " .quotemeta ($ chunk)；

$ state=1；

}

# check if the line ends with a backquote

if(m/\`$/){

$ state=1-$ state；

}

Error ($ file, " Unterminated embedded perl expression, line

$.″)

If ($ state==0)；

$ buf.=" print STDOUT " " .join (" " ,@chunks).

″\\n\″；\n″；

}

close(INP)；

print STDERR $buf if($debug)；

Push (@global_file_stack, $ file)；

Tppcode::execute ($ buf)；

pop(@global_file_stack)；

if($@){

chomp($@)；

Error ($ file, $@)；

}

sub gen{

print STDOUT(@_)；

}

sub error{

My ($ file, $ err)=@_；

Print STDERR " $:: myname:Error ($ err) while preprocessing file

\″$file\″\n″；

my $fn；

foreach $fn(@global_file_stack){

print STDERR″included from\″$fn\″\n″；

}

exit(1)；

}

# This package is used to execute the tpp code

package tppcode；

no strict；

use Xtensa::Config；

sub ppp_require{

Print STDERR (" tpp:Warning:ppp_require used instead of

tpp::include\n″)；

tpp::include(@_)；

}

sub init(

My ($ cfile)=@_；

config_set($cfile)；

}

sub execute{

My ($ code)=@_；

eval($code)；

}

#

# Local Variables:

# mode:perl

# perl-indent-level:4

# cperl-indent-level:4

# End:

Adnexa C

# Change XTENSA to point to your local installation

XTENSA=/usr/xtensa/awang/s8

#

# No need to change the rest

#

GCC=/usr/xtensa/stools/bin/gcc

XTCC=$ (XTENSA)/bin/xt-gcc

XTRUN=$ (XTENSA)/bin/xt-run

XTGO=$ (XTENSA)/Hardware/scripts/xtgo

MFILE=$ (XTENSA)/Hardware/diag/Makefile.common

All:run-base run-tie-cstub run-iss run-iss-old run-iss-new run-ver

#

# Rules to build various versions of me

#

Me-base:me.c me_base.c me_tie.c src.c sad.c

$ (GCC)-o me-base-g-O2-DNX=64-DNY=64 me.c

Me-tie-cstub:me.c me_base.c me_tie.c src.c sad.c

$ (GCC)-o me-tie-cstub-g-O2-DTIE-DNX=64-DNY=64me.c

Me-xt:me.c me_base.c me_tie.c src.c sad.c

$ (XTCC)-o me-xt-g-O2-DXTENSA-DNX=32-DNY=32me.c

Me-xt-old:me.c me_base.c me_tie.c src.c sad.c

$ (XTCC)-o me-xt-old-g-O3-DOLD-DXTENSA-DNX=32-DNY=32

me.c

Me-xt-new:me.c me_base.c me_tie.c src.c sad.c

$ (XTCC)-o me-xt-new-g-O3-DNEW-DXTENSA-DNX=32-DNY=32

me.c

Me-xt.s:me.c me_base.c me_tie.c src.c sad.c

$ (XTCC)-o me-xt.s-S-O3-DNOPRINTF-DXTENSA-DNX=16-DNY= 16

me.c

#

# Rules for various runs of me

#

Run-base:me-base

me-base；exit 0

Run-tie-cstub:me-tie-cstub

me-tie-cstub；exit 0

Run-iss:me-xt

$(XTRUN)me-xt

Run-iss-old:me-xt-old

$(XTRUN)--verbose me-xt-old

Run-iss-new:me-xt-new

$(XTRUN)--verbose me-xt-new

Run-ver:me-xt.s testdir

cp me-xt.s testdir/me-xt

$(XTGO)-vcs -testdir `pwd`/testdir -test me-xt>run-ver.out

2>&1

grep Status run-ver.out

Testdir:

mkdir-p testdir/me-xt

@echo ' all:me-xt.dat me-xt.bfd ' > testdir/me-xt/Makefile

@echo″include $(MFILE)″>>testdir/me-xt/Makefile

Clean:

Rm-rf me-**.out testdir results

APPENDIX I:TEST PROGRAM

#include<stdio.h>

#include<stdlib.h>

#include<limits.h>

#ifndef NX

#define NX 32/* image width*/

#endif

#ifndef NY

#define NY 32/* image height*/

#endif

#define BLOCKX 16/* block width*/

#define BLOCKY 16/* block height*/

#define SEARCHX 4/* search region Width*/

#define SEARCHY 4/* search region Height*/

unsigned char OldB[NX][NY]；/ * old image*/

unsigned char NewB[NX][NY]；/ * new image*/

unsigned short VectX[NX/BLOCKX][NY/BLOCKY]；/ * X motion vector*/

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]；/ * Y motion vector*/

unsigned short VectB[NX/BLOCKX][NY/BLOCKY]；/ * absolute Difference*/

unsigned short.BaseX[NX/BLOCKX][NY/BLOCKY]；/ * Base X motion vector*/

unsigned short BaseY[NX/BLOCKX][NY/BLOCKY]；/ * BaseY motion Vector*/

unsigned short BaseB[NX/BLOCKX][NY/BLOCKY]；/ * Base absolute

difference*/

#define ABS(x) (((x)<0)？(-(x)): (x))

#define MIN (x, y) (((x) < (y))？(x): (y))

#define MAX (x, y) (((x) > (y))？(x): (y))

#define ABSD (x, y) (((x) > (y))？((x)-(y)): ((y)-(x)))

^L

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

In order to 01dB and NewB array is initialized by test purpose

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

void init()

{

Int x, y, x1, y1；

For (x=0；x<NX；x++){

For (y=0；y<NY；y++)(

OldB [x] [y]=x^y；

}

For (x=0；x<NX；x++){

For (y=0；y<NY；y++){

X1=(x+3) %NX；

Y1=(y+4) %NY；

NewB [x] [y]=OldB [x1] [y1]；

}

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Every result comparison full-colored data is checked

unsigned check()

{

Int bx, by；

For (by=0；by<NY/BLOCKY；by++){

For (bx=0；bx<NX/BLOCKX；bx++){

if(VectX[bx][by]！=BaseX [bx] [by]) return0；

if(VectY[bx][by]！=BaseY [bx] [by]) return0；

if(VectB[bx][by]！=BaseB [bx] [by]) return0；

}

return1；

}

The various embodiments of locomotion evaluation

#include″me_base.c″

#inClude″me_tie.c″

Main test program

int

Main (int argc, char^*argv)

{

int passed；

#ifndef NOPRINTF

Printf (" Block=(%d, %d), Search=(%d, %d), size=(%d, %d) n ",

BLOCKX, BLOCKY, SEARCHX, SEARCHY, NX, NY)；

#endif

init()；

#ifdef OLD

motion_estimate base()；

Passed=1；

#elif NEW

motion_estimate_tie()；

Passed=1；

#else

motion_estimate_base()；

motion_estimate_tie()；

Passed=check ()；

#endif

#ifndef NOPRINTF

printf(passed？" TIE version passed n ": " * * TIE version

failed\n″)；

#endif

return passed；

}

APPENDIX II:ME_BASE.C

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

The embodiment of reference software

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

void

motion_estimate_base()

{

Int bx, by, cx, cy, x, y；

Int startx, starty, endx, endy；

Unsigned diff, best, bestx, besty；

For (bx=0；bx<NX/BLOCKX；bx++){

For (by=0；by<NY/BLOCKY；by++){

Best=bestx=besty=UINT_MAX；

Startx=MAX (0, bx*BLOCKX-SEARCHX)；

Starty=MAX (0, by*BLOCKY-SEARCHY)；

Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX)；

Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY)；

For (cx=startx；cx<endx；cx++){

For (cy=starty；cy<endy；cy++){

Diff=0；

For (x=0；x<BLOCKX；x++){

For (y=0；y<BLOCKY；y++){

Diff+=ABSD (OldB [cx+x] [cy+y],

NewB [bx*BLOCKX+x] [by*BLOCKY+y])；

}

if (diff<best) {

Best=diff；

Bestx=cx；

Besty=cy；

}

BaseX [bx] [by]=bestx；

BaseY [bx] [by]=besty；

BaseB [bx] [by]=best；

}

APPENDIX III:ME_TIE.C

#include″src.c″

#include″sad.c″

Use the quick styles of the locomotion evaluation of SAD instruction

void

motion_estimate_tie()

{

Int bx, by, cx, cy, x；

Int startx, starty, endx, endy；

Unsigned diff0, diff1, diff2, diff3, best, bestx, besty；

Unsigned*N, N1, N2, N3, N4, * O, A, B, C, D, E；

For (bx=0；bx<NX/BLOCKX；bx++){

For (by=0；by<NY/BLOCKY；by++){

Best=bestx=besty=UINT_MAX；

Startx=MAX (0, bx*BLOCKX-SEARCHX)；

Starty=MAX (0, by*BLOCKY-SEARCHY)；

Endx=MIN (NX-BLOCKX, bx*BLOCKX+SEARCHX)；

Endy=MIN (NY-BLOCKY, by*BLOCKY+SEARCHY)；

For (cy=starty；cy<endy；Cy+=sizeof (long))

For (cx=startx；cx<endx；cx++){

Diff0=diff1=diff2=diff3=0；

For (x=0；x<BLOCKX；x++){

N=(unsigned*)

& (NewB [bx*BLOCKX+x] [by*BLOCKY])；

N1=N [0]；

N2=N [1]；

N3=N [2]；

N4=N [3]；

O=(unsigned*) & (OldB [cx+x] [cy])；

A=O [0]；

B=O [1]；

C=O [2]；

D=O [3]；

E=O [4]；

Diff0+=SAD (A, N1)+SAD (B, N2)+

SAD (C, N3)+SAD (D, N4)；

#ifdef BIG_ENDIAN

SSAI(24)；

Diff1+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)

+

SAD (SRC (C, D), N3)+SAD (SRC (D, E),

N4)；

SSAI(16)；

Diff2+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)

+

SAD (SRC (C, D), N3)+SAD (SRC (D, E),

N4)；

SSAI(8)；

Diff3+=SAD (SRC (A, B), N1)+SAD (SRC (B, C), N2)

+

SAD (SRC (C, D), N3)+SAD (SRC (D, E),

N4)；

#else

SSAI(8)；

Diff1+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)

+

SAD (SRC (D, C), N3)+SAD (SRC (E, D),

N4)；

SSAI(16)；

Diff2+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)

+

SAD (SRC (D, C), N3)+SAD (SRC (E, D),

N4)；

SSAI(24)；

Diff3+=SAD (SRC (B, A), N1)+SAD (SRC (C, B), N2)

+

SAD (SRC (D, C), N3)+SAD (SRC (E, D),

N4)；

#endif

O+=NY/4；

N+=NY/4；

}

if(diff0<best){

Best=diff0；

Bestx=cx；

Besty=cy；

}

if(diff1<best){

Best=diff1；

Bestx=cx；

Besty=cy+1；

}

if(diff2<best){

Best=diff2；

Bestx=cx；

Besty=cy+2；

}

if(diff3<best){

Best=diff3；

Bestx=cx；

Besty=cy+3；

}

VectX [bx] [by]=bestx；

VectY [bx] [by]=besty；

VectB [bx] [by]=best；

}

APPENDIX IV:SAD.C

#if defined(XTENSA)

#include <machine/Customer.h>

#elif defined(TIE)

#include″../dk/me_cstub.c″

#else

The absolute difference sum of 4 bytes

static inline unsigned

SAD (unsigned ars, unsigned art)

{

Return ABSD (ars > > 24, art > > 24)+

ABSD ((ars > > 16) &255, (art > > 16) &255)+

ABSD ((ars > > 8) &255, (art > > 8) &255)+

ABSD (ars & 255, art & 255)；

}

#endif

APPENDIX V:SRC.C

If object code is source code, then a global variable is used to store the position of SSAI

Shifting amount.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Directly access and move to right chain instruction.Displacement should be loaded individually with SSAI ()

Depositor

Direct access to the Shift Right Concatenate Instruction.

The shift amount register must be loaded separately with SSAI()。

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

static　inline unsigned

SRC (unsigned ars, unsigned art)

{

unsigned arr；

#ifndef XTENSA

Arr=(ars<<(32-sar)) | (and art>>sar)；

#else

Asm volatile (" src t%0, %1, %2 ": "=a " (arr): " a " (ars), " a "

(art))；

#endif

return arr；

}

Displacement depositor is set

static inline void

SSAI(int count)

{

#ifndef XTENSA

Sar=count；

#else

switch(count){

Case 8:

asm volatile(″ssai\t8″)；

break；

Case 16:

asm volatile(″ssai\t16″)；

break；

Case 24:

asm volatile(″ssai\t24″)；

break；

Default:

exit(-1)；

}

#endif

}

APPENDIX VI:SOURCE CODE

/ *

Block Motion Estimation:

The purposeof motion estimation is to find the unaligned 8×8 block of

an existing (old) image that most closely resemblesan aligned 8×8

block.The search here is at any byte offset in+/- 16 bytes in×and

+/- 16 bytes in y.The search is a set of six nested loops。

OldB is pointer to a byte array of old block

NewB is pointer to a byte array of base block

*/

#define NY 480

#define NX 640

#define BLOCKX 16

#define BLOCKY 16

#define SEARCHX 16

#define SEARCHY 16

unsigned char OldB[NX][NY]；

unsigned char NewB[NX][NY]；

unsigned short VectX[NX/BLOCKX][NY/BLOCKY]；

unsigned short VectY[NX/BLOCKX][NY/BLOCKY]；

#define MIN (x, y) ((x < y)？x∶y)

#define MAX (x, y) ((x > y)？x∶y)

#define ABS(x) ((x<0)？(-x): (x))

/ * initialization with reference image data for test purposes*/

void init()

{

Intx, y；

For (x=0；x<NX；X++) for (y=0；y<NY；y++){

OldB [x] [y]=x^y；

NewB [x] [y]=x+2*y+2；

}

main()

{

Int by, bx, cy, cx, yo, xo；

Unsigned short best, bestx, besty, sumabsdiff0；

init()；

For (by=0；by<NY/BLOCKY；by++){

For (bx=0；bx<NX/BLOCKX；Bx++) {/* for each 8 × 8 block in the

Image*/

Best=0 × ffff；/ * look for the minimum difference*/

For (cy=MAX (0, (by*BLOCKY)-SEARCHY)；

Cy < MIN (NY-BLOCKY, (by*BLOCKY)+SEARCHY)；

Cy++)/* for the old block at each line*/

For (cx=MAX (0, (bx*BLOCKX)-SEARCHX)；

Cx < MIN (NX-BLOCKX, (bx*BLOCKX)+SEARCHX)；

cx++){

/ * test the N × N block at (bx, by) against NxN blocks*/

/ * at (cx, cy) */

Sumabsdiff0=0；

For (yo=0；yo<BLOCKY；Yo++) {/* for each of N rows in block

*/

For (xo=0；xo<BLOCKX；Xo++) {/* for each of N pixels in

Row*/

Sumabsdiff0+=

ABS(OldB[cx+xo][cy+yo]-

NewB [bx*BLOCKX+xo] [by*BLOCKY+yo])；

}

if(sumabsdiff0<best){

Best=sumabsdiff0；Bestx=cx；Besty=cy；}

}

VectX [bx] [by]=bestx；

VectY [bx] [by]=besty；

}

Annex VII: optimize C code with TIE

Pixel number is packaged as 4/every word

OldW is directed to the pointer of a word array of old piece

NewW is directed to the pointer of a word array of matrix

#define NY 480

#define NX 640

#define BLOCKX 16

#define BLOCKY 16

#define SEARCHX 16

#define SEARCHY 16

#define MIN (x, y) ((x < y)？x∶y)

#define MAX (x, y) ((x > y)？x∶y)

unsigned long OldW[NY][NX/sizeof(long)]；

unsigned long NewW[NY][NX/sizeof(long)]；

unsigned short VectX[NY/BLOCKY][NX/BLOCKX]；

unsigned short VectY[NY/BLOCKY][NX/BLOCKX]；

void init()

{

Int x, y；

For (x=0；x<NX/sizeof(long)；X++) for (y=0；y<NY；y++){

OldW [y] [x]=((x < < 2) ^y) < < 24 | (((x < < 2)+1) ^y) < < 16 | (((x < < 2)+2) ^y) < < 8

|((x<<2)+3)^y；

NewW [y] [x]=((x < < 2)+2*y+2) < < 24 | (((x < < 2)+1)+2*y+2) < < 16 |

(((x < < 2)+2)+2*y+2) < < 8 | ((x < < 2)+3)+2*y+2；

}

main()

{

Register int by, bx, cy, cx, yo, xo；

register unsigned short

Best, bestx, besty, sumabsdiff0, sumabsdiffl, sumabsdiff2, sumabsdiff3；

init()；

For (by=0；by<NY/BLOCKY；By++)

For (bx=0；bx<NX/BLOCKX；Bx++) {/* for each N × N block in the

Image*/

Best=0 × ffff；/ * look for the minimum difference*/

For (cy=MAX (0, (by*BLOCKY)-SEARCHY)；

Cy < MIN (NY-BLOCKY, (by*BLOCKY)+SEARCHY)；

Cy++)/* for the old block at each line*/

For (cx=MAX (0, (bx*BLOCKX-SEARCHX)/sizeof (long))；

Cx < MIN ((NX-BLOCKX-2)/sizeof (long), (bx*BLOCKX+SEARCHX)/

sizeof(long))；

Cx++)/* and each word (4byte) offset in line*/

/ * test the NxN block at (bx, by) against four N × N blocks*/

/ * at (cx, cy), (cx+1B, cy), (cx+2B, cy) (cx+3B, cy) */

Sumabsdiff0=sumabsdiff1=sumabsdiff2=sumabsdiff3=0；

For (yo=0；yo<BLOCKY；yo++){/*for each of the N lines in

The block*/

For (xo=0；xo<BLOCKX/8；Xo+=2)

Register unsigned long*N, N1, N2*O, A, B, C, W, X；

N=& NewW [by+yo] [bx*BLOCKX/sizeof (long)+xo]；

N1=*N；N2=* (N+1)；/ * 2words of subject image*/

O=& OldW [cy+yo] [cx+xo]；

A=*O；B=* (O+1)；C=* (O+2)；/ * 3words of

Reference*/

Sumabsdiff0+=sad (A, N1)+sad (B, N2)；

SHIFT (24)/* shiftA, B, C left by one byte into W, X*/

Sumabsdiff1+=sad (W, N1)+sad (X, N2)；

SHIFT (16)/* shift, B, C left by two bytes into W, X*/

Sumabsdiff2+=sad (W, N1)+sad (X, N2)；

SHIFT (8)/* shift A, B, C lft by three bytes into W, X

*/

Sumabsdiff3+=sad (W, N1)+sad (X, N2)；

}

if(sumabsdiff0<best){

Best=sumabsdiff0；Bestx=cx；Besty=cy；}

if(sumabsdiff1<best){

Best=sumabsdiffl；Bestx=cx+1；Besty=cy；}

if(sumabsdiff2<best){

Best=sumabsdiff2；Bestx=cx+2；Besty=cy；}

if(sumabsdiff3<best){

Best=sumabsdiff3；Bestx=cx+3；Besty=cy；}

}

VectX [bx] [by]=bestx；

VectY [bx] [by]=besty；

}

Adnexa D

/ *

* TIE to Verilog translation routines

*/

/ * SId:tie2ver_write.c, v 1.27 1999/05/11 00:10:18 awang Exp S*/

/ *

* These coded instructions, statements, and computer programs are

* Confidential Proprietary Information of Tensilica Inc.and may not

be

* disclosed to third parties or copied in any form, in whole or in

Part,

* without the prior written consent of Tensilica Inc.

*/

#include <math.h>

#include″tie.h″

#include″st.h″

#define COMMENTS″//Do not modify this automatically generated file.″

static void tie2ver_write_expression(

FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)；

#define tie2ver_program_foreach_instruction (_ prog, _ inst)

Tie_t*_iclass；\

Tie_program_foreach_iclass (_ prog, _ iclass)

if(tie_get_predefined(_iclass))continue； \

Tie_iclass_foreach_instruction (_ iclass, _ inst)

#define end_tie2ver_program_foreach_instruction \

}end_tie_iclass_foreach_instruction； \

} end_tie_program_foreach_iclass； \

}

#defineTIE_ENFLOP″\n\

Module tie_enflop (tie_out, tie_in, en, clk)；\n\

Parameter size=32；\n\

output[size-1∶0]tie_out；\n\

input[size-1∶0]tie_in；\n\

input en；\n\

input clk；\n\

reg[size-1∶0] tmp；\n\

Assign tie_out=tmp；\n\

always@(posedge clk)begin\n\

if(en)\n\

Tmp≤#1tie_in；\n\

end\n\

endmodule\n″

#define TIE_FLOP″\n\

Module tie_flop (tie_out, tie_in, clk)；\n\

Parameter size=32；\n\

output [size-1∶0] tie_out；\n\

input [size-1∶0] tie_in；\n\

input clk；\n\

reg [size-1∶0] tmp；\n\

Assign tie_out=tmp；\n\

always @(posedge clk)begin\n\

Tmp≤#1 tie_n；\n\

end\n\

endmodule\n″

#define TIE_ATHENS_STATE″\n\

Module tie athens_state (ns, we, ke, kp, vw, clk, ps)；\n\

Parameter size=32；\n\

input[size-1∶0]ns；//next state\n\

input we； //write enable\n\

input ke； //Kill E state\n\

input kp； //Kill Pipeline\n\

input vw； //Valid W state\n\

input clk； //clock\n\

output [size-1∶0]ps；//present state\n\

\n\

wire[size-1∶0]se； //state at E stage\n\

wire[size-1∶0]sm； //state at M stage\n\

wire[size-1∶0]sw； //state at W stage\n\

wire[size-1∶0]sx； //state at X stage\n\

wire ee； //write enable for EM register\n\

wire ew； //write enable for WX register\n\

\n\

Assign se=kp？Sx:ns；\n\

Assign ee=kp | we &～ke；\n\

Assign ew=vw &～kp；\n\

Assign ps=sm；\n\

\n\

Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),

.clk(clk))；\n\

Tie_flop # (size) state_MW (.tie_out (sw) .tie_in (sm) .clk (clk))；\n\

Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),

.clk(clk))；\n\

\n\

endmodule\n″

Set up and return global program → for the behaviour of operand of user-defined instructions

Count form.The form returned is not contained in each behaviour that predefined instructions is used

Count.

* * * * */

Static t_table*

Tie2ver_program_get_operand_table (tie_t*prog)

{

Static st_table*tie2ver_program_args=0；

Tie_t*inst；

Char*key, * value；

St_table*operand_table；

St_generator*gen；

If (tie2ver_program_args==0)

Tie2ver_program_args=st_init_table (strcmp, st_strhash)；

Tie2ver_program_foreach_instruction (prog, inst)

Operand_table=tie_instruction_get_operand_table (inst)；

St_foreach_item (operand_table, gen ， &key ， &value)

St_insert (tie2ver_program_args, key, value)；

}

}end_tie2ver_program_foreach_instruction；

}

return tie2ver_program_args；

}

* * * * * *

Print a wiring statement

* * * * */

static void

Tie2ver_write_wire (FILE*fp, tie_t*wire)

{

Int from, to, write_comma；

Tie_t*first, * second, * var；

First=tie_get_first_child (wire)；

ASSERT (tie_get_type (first)==TIE_INT)；

From=tie_get_integer (first)；

Second=tie_get_next_sibling (first)；

ASSERT (tie_get_type (second)==TIE_INT)；

To=tie_get_integer (second)；

Fprintf (fp, " wire ")；

if(！(from==0 && to==0))

Fprintf (fp, " [%d: %d] ", from, to)；

}

Write_comma=0；

Var=tie_get_next_sibling (second)；

while(var！=0)

if(write_comma){

Fprintf (fp, ", ")；

}else{

Write_comma=1；

}

Fprintf (fp, " %s ", tie_get_identifier (var))；

Var=tree_get_next_sibling (var)；

}

Fprintf (fp, "；\n″)；

}

* * * * * *

Print a unary expression formula

* * * * */

static void

tie2ver_write_unary(

FILE*fp, const char*op, tie_t*exp, intlhs, st_table*is, st_table

* os)

{

Fprintf (fp, " %s (", op)；

Tie2ver_write_expression (fp, exp, lhs, is, os)；

Fprintf (fp, ") ")；

}

* * * * * *

Print a binary expression

* * * * */

static void

tie2ver_write_binary(

FILE*fp, const char*op, tie_t*exp1, tree_t*exp2,

Int lhs, st table*is, st_table*os)

{

Fprintf (fp, " (")；

Tie2ver_write expression (fp, exp1, lhs, is, os)；

Fprintf (fp, ") %s (", op)；

Tie2ver_write_expression (fp, exp2, lhs, is, os)；

Fprintf (fp, ") ")；

}

* * * * * *

Print an identifier

* * * * */

static void

tie2ver_write_identifier(

FILE*fp, tie_t*id, int lhs, st_table*is, st_table*os)

{

Tie_t*prog, * first, * second；

Char*name, * dummy；

Name=tie_get_identifier (id)；

if((is！=0) && st_lookup (is, name ， &dummy))

Fprintf (fp, " %s_%s ", name, lhs？" ns ": " ps ")；

}else if((os！=0) && st_lookup (os, name ， &dummy))

Fprintf (fp, " %s_%s ", name, lhs？" ns ": " ps ")；

}else{

Fprintf (fp, " %s ", name)；

}

First=tie_get_first_child (id)；

If (first==0)

return；

}

/ * detect whether this is a table access*/

Prog=tie_get_program (id)；

If (tie_program_get_table_by_name (prog, name)！=0)

switch(tie_get_type(first)){

CaseTIE_ID:

Fprintf (fp, " (%s) ", tie_get_identifier (first))；

break；

Case TIE_INT:

Fprintf (fp, " (%d) ", tie_get_integer (first))；

break；

Default:

DIE (" Error:expected type n ")；

}

return；

}

Second=tie_get_next_sibling (first)；

If (second==0)

Fprintf (fp, " [%d] ", tie_get_integer (first))；

return；

}

Fprintf (fp, " [%d: %d] ", tie_get_integer (first)

tie_get_integer(second))；

}

Print chain expression formula

* * * * */

static void

tie2ver_write_concatenation(

FILE*fp, tie_t*exp, intlhs, st_table*is, st_table*os)

{

Tie_t*comp；

int write_comma；

Write_comma=0；

Fprintf (fp, " { ")；

Tie_foreach_child (exp, comp)

if(write_comma){

Fprintf (fp, ", ")；

}else{

Write_comma=1；

}

Tie2ver_write_expression (fp, comp, lhs, is, os)；

}end_tie_foreach_child；

Fprintf (fp, " } ")；

}

Print conditions statement

* * * * */

static void

tie2ver_write_conditional(

FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)

{

Tie_t*cond_exp, * then_exp, * else_exp；

Cond_exp=tie_get_first_child (exp)；

Then_exp=tie_get_next_sibling (cond_exp)；

Else_exp=tie_get_next_sibling (then_exp)；

ASSERT (tie_get_last_child (exp)==else_exp)；

Fprintf (fp, " (")；

Tie2ver_write_expression (fp, cond_exp, lhs, is, os)；

Fprintf (fp, ")？(″)；

Tie2ver_write_expression (fp, then_exp, lhs, is, os)；

Fprintf (fp, "): (")；

Tie2ver_write_expression (fp, else_exp, lhs, is, os)；

Fprintf (fp, ") ")；

}

* * * * * *

Print copy statement

* * * * */

static void

tie2ver_write_replication(

FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)

{

Tie_t*num, * comp；

Num=tie_get_first_child (exp)；

Comp=tie_get_next_sibling (num)；

ASSERT (tie_get_last_child (exp)==comp)；

ASSERT (tie_get_type (num)==TIE_INT)；

Fprintf (fp, " %d{ ", tie_get_integer (num))；

Tie2ver_write_expression (fp, comp, lhs, is, os)；

Fprintf (fp, " } } ")；

}

* * * * * *

Print an expression formula

* * * * */

static void

tie2ver_write_expression(

FILE*fp, tie_t*exp, int lhs, st_table*is, st_table*os)

{

tie_type_ttype；

Tie_t*first, * second；

First=tie_get_first_child (exp)；

Second=first==0？0:tie_get_next_sibling (first)；

Switch (type=tie_get_type (exp)) (

Case TIE_ID:

Tie2ver_write_identifier (fp, exp, lhs, is, os)；

break；

Case TIE_INT:

Fprintf (fp, " %d ", tie_get_integer (exp))； break；

Case TIE_CONST:

Fprintf (fp, " %s ", tie_get_constant (exp))； break；

Case TIE_LOGICAL_NEGATION:

Tie2ver_write_unary (fp, "！", first, lhs, is, os)；break；

Case TIE_LOGICAL_AND:

Tie2ver_write_binary (fp, " && ", first, second, lhs, is, os)；

break；

Case TIE_LOGICAL_OR:

Tie2ver_write_binary (fp, " | | ", first, second, lhs, is, os)；

break；

Case TIE_BITWISE_NEGATION:

Tie2ver_write_unary (fp, "～", first, lhs, is, os)；break；

Case TIE_BITWISE_AND:

Tie2ver_write_binary (fp, " & ", first, second, lhs, is, os)；

break；

Case TIE_BITWISE_OR:

Tie2ver_write_binary (fp, " | ", first, second, lhs, is, os)；

break；

Case TIE_BITWISE_XOR:

Tie2ver_write_binary (fp, " ^ ", first, second, lhs, is, os)；

break；

Case TIE_BITWISE_XNOR:

Tie2ver_write_binary (fp, "～^ ", first, second, lhs, is, os)；

break；

Case TIE_ADD:

Tie2ver_write_binary (fp, "+", first, second, lhs, is, os)；

break；

Case TIE_SUB:

Tie2ver_write_binary (fp, "-", first, second, lhs, is, os)；

break；

Case TIE_MULT:

Tie2ver_write_binary (fp, " * ", first, second, lhs, is, os)；

break；

Case TIE_GT:

Tie2ver_write_binary (fp, " > ", first, second, lhs, is, os)；

break；

Case TIE_GEQ:

Tie2ver_write_binary (fp, " >=", first, second, lhs, is, os)；

break；

Case TIE_LT:

Tie2ver_write_binary (fp, " < ", first, second, lhs, is, os)；

break；

Case TIE_LEQ:

Tie2ver_write_binary (fp, "≤", first, second, lhs, is, os)；

break；

Case TIE_EQ:

Tie2ver_write_binary (fp, "==", first, second, lhs, is, os)；

break；

Case TIE_NEQ:

Tie2ver_write_binary (fp, "！=", first, second, lhs, is, os)；

break；

Case TIE_REDUCTION_AND:

Tie2ver_write_unary (fp, " & ", first, lhs, is, os)；break；

Case TIE_REDUCTION_OR:

Tie2ver_write_unary (fp, " | ", first, lhs, is, os)；break；

Case TIE_REDUCTION_XOR:

Tie2ver_write_unary (fp, " ^ ", first, lhs, is, os)；break；

Case TIE_SHIFT_LEFT:

Tie2ver_write_binary (fp, " < < ", first, second, lhs, is, os)；

break；

Case TIE_SHIFT_RIGHT:

Tie2ver_write_binary (fp, " > > ", first, second, lhs, is, os)；

break；

Case TIE_REPLICATION:

Tie2ver_write_replication (fp, exp, lhs, is, os)；

break；

Case TIE_CONCATENATION:

Tie2ver_write_concatenation (fp, exp, lhs, is, os)；

break；

Case TIE_CONDITIONAL:

Tie2ver_write_conditional (fp, exp, lhs, is, os)；

break；

Default:

Fprintf (stderr, " Wrong type:%d n ", type)；

DIE (" Error:wrong expression type n ")；

}

Print an assignment statement

* * * * */

static void

tie2ver_write_assignment(

FILE*fp, tie_t*assign, st_table*in_states, st_table*out_states)

{

Tie_t*lval, * rval；

ASSERT (tie_get_type (assign)==TIE_ASSIGNMENT)；

Lval=tie_get_first_child (assign)；

Rval=tie_get_last_child (assign)；

ASSERT (tie_get_next_-sibling (lval)==rval)；

ASSERT (tie_get_-prev_sibling (rval)==lval)；

Fprintf (fp, " assign ")；

Tie2ver_write_expression (fp, lval, 1, in_states, out_states)；

Fprintf (fp, "=")；

Tie2ver_write_expression (fp, rval, 0, in_states, out_states)；

Fprintf (fp, "；\n″)；

}

* * * * * *

Print a sentence list

* * * * */

static void

tie2ver_write_statement(

FIEE*fp, tie_t*statement, st_table*in_states, st_table*out_states)

{

Tie_t*child；

ASSERT (tie_get_type (statement)==TIE_STATEMENT)；

Tie_foreach_child (statement, child)

switch(tie_get_type(child)){

Case TIE_WIRE:

Tie2ver_write_wire (fp, child)；

break；

Case TIE_ASSIGNMENT:

Tie2ver_write_assignment (fp, child, in_states, out_states)；

break；

Default:

DIE (" Error:illegal program statement n ")；

}

}end_tie_foreach_child；

}

* * * * * *

Module definition is write for " iclass "

* * * * */

static void

Tie2ver_write_module_declaration (FILE*fp, tie_t*semantic)

{

St_table*operand_table, * state_table；

St_generator*gen；

Tie_t*ilist, * inst；

Char*c, * key, * value；

Fprintf (fp, " n ")；

Fprintf (fp, " module %s (", tie_semantic_get_name (semantic))；

C=" "；

Operand_table=tie_semantic_get_operand_table (semantic)；

St_foreach_item (operand_table, gen ， &key ， &value)

Fprintf (fp, " %s%s ", c, key)；

C=", "；

}

State_table=tie_semantic_get_in_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " %s%s_ps ", c, key)；

C=", "；

}

State_table=tie_semantic_get_out_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " %s%s_ns ", c, key)；

Fprintf (fp, " %s%s_we ", c, key)；

C=", "；

}

Ilist=tie_semantic_get_inst_list (semantic)；

Tie_inst_list_foreach_instruction (ilist, inst)

Fprintf (fp, ", %s ", tie_instruction_get_name (inst))；

}end_tie_inst_list_foreach_instruction；

Fprintf (fp, ")；\n″)；

St_foreach_item (operand_table, gen ， &key ， &value)

switch((tie_type_t)value){

Case TIE_ARG_IN:

Fprintf (fp, " input [31: 0] %s；N ", key)；break；

Case TIE_ARG_OUT:

Fprintf (fp, " output [31: 0] %s；N ", key)；break；

Case TIE_ARG_INOUT:

Fprintf (fp, " inout [31: 0] %s；N ", key)；break；

Default:

DIE (" Error:unexpected arg type n ")；

}

State_table=tie_semantic_get_in_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " input [%d: 0] %s_ps；N ", (int) value_1, key)；

}

State_table=tie_semantic_get_out_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " output [%d: 0] %s_ns；N ", (int) value-1, key)；

Fprintf (fp, " output%s_we；N ", key)；

}

Tie_inst_list_foreach_instruction (ilist, inst)

Fprintf (fp, " input %s；N ", tie_instruction_get_name (inst))；

}end_tie_inst_list_foreach_instruction；

}

* * * * * *

" form " is printed to a TIE file

* * * * */

static void

Tie2ver_write_table (FILE*fp, tie_t*table)

{

Int i, width, size, bits, ivalue；

Char*oname, * iname, * cvalue；

Tie_t*value；

Oname=tie_table_get_name (table)；

Iname=" index "；

Width=tie_table_get_width (table)；

Size=tie_table_get_depth (table)；

Bits=(int) ceil (log (size)/log (2))；

Fprintf (fp, " nfunction [%d: 0] %s；N ", width-1, oname)；

Fprintf (fp, " input [%d: 0] %s；N ", bits-1, iname)；

Fprintf (fp, " case (%s) n ", iname)；

I=0；

Tie table_foreach_value (table, value)

Fprintf (fp, " %d ' d%d:%s=", bits, i, oname)；

switch(tie_get_type(value)){

Case TIE_CONST:

Cvalue=tie_get_constant (value)；

Fprintf (fp, " %d ' b%s；N ", width,

tie_constant_get_binary_string(cvalue))；

break；

Case TIE_INT:

Ivalue=tie_get_integer (value)；

Fprintf (fp, " %d ' d%d；N ", width, ivalue)；

break；

Default:

DIE (" Internal Error:unexpected type n ")；

}

i++；

}end_tie_table_foreach_value；

Fprintf (fp, " default:%s=%d ' d0；N ", oname, width)；

Fprintf (fp, " endcase n ")；

Fprintf (fp, " endfunction n ")；

}

Enable logic is write for being write by each of " semantic " statement amendment state

* * * * */

static void

Tie2ver_semantic_write_we (FILE*fp, tie_t*semantic)

{

Tie_t*inst；

St_table*semantic_state_table, * inst_state_table；

St_generator*gen；

Char*key, * value, * c, * iname；

int found；

Semantic_state_table=tie_semantic_get_out_state_table (semantic)；

St_foreach_item (semantic_state_table, gen ， &key ， &value)

Fprintf (fp, " assign%s_we=", key)；

C=" "；

Tie_semantic_foreach_instruction (semantic, inst)

Iname=tie_instruction_get_name (inst)；

Inst_state_table=tie_instruction_get_state_table (inst)；

Found=st_lookup (inst_state_table, key ， &value)；

if (found && ((tie_type_t) value！=TIE_ARG_IN))

Fprintf (fp, " %s1 ' b1 & %s ", c, iname)；

}else{

Fprintf (fp, " %s1 ' b0 & %s ", c, iname)；

}

C=" n | "；

}end_tie_semantic_foreach_instruction；

Fprintf (fp, "；\n″)；

}

*

" semantic " statement is write TIE file

* * * * */

static void

Tie2ver_write_semantic (FILE*fp, tie_t*semantic)

{

Tie_t*table, * statement；

Ls_t*tables；

St_table*in_state_table, * out_state_table；

ASSERT (tie_get_type (semantic)==TIE_SEMANTIC)；

Tie2ver_write_module_declaration (fp, semantic)；

Statement=tie_semantic_get_statement (semantic)；

In_state_table=tie_semantic_get_in_state_table (semantic)；

Out_state_table=tie_semantic_get_out_state_table (semantic)；

Tie2ver_write_statement (fp, statement, in_state_table,

out_state_table)；

Tables=tie_expression_get_tables (statement,

tie_get_program(semantic))；

Ls_foreach data (tie_t*, tables, table)

Tie2ver_write_table (fp, table)；

}end_ls_foreach_data；

ls_free(tables)；

Tie2ver_semantic_write_we (fp, semantic)；

Fprintf (fp, " endmodule n ")；

}

Top module declaration is printed for combination semanteme

* * * * */

static void

Tie2ver_write_top_module (FILE*fp, tie_t*prog)

{

St_generator*gen；

Char*key, * value；

St_table*operand_table；

Tie_t*inst, * iclass；

Fprintf (fp, " n ")；

Fprintf (fp, " module UserInstModule (clk, out_E, ars_E, art_E,

inst_R″)；

Fprintf (fp, ", Kill_E, killPipe_W, valid_W ")；

Tie_program_foreach_iclass (prog, iclass)

if(tie_get_predefined(iclass))continue；

Tie_iclass_foreach_instruction (iclass, inst)

Fprintf (fp, ", %s_R ", tie_instruction_get_name (inst))；

}end_tie_iclass_foreach_instruction；

}end_tie_program_foreach_iclass；

Fprintf (fp, ", en_R)；\n″)；

Fprintf (fp, " input clk；\n″)；

Fprintf (fp, " output [31: 0] out_E；\n″)；

Fprintf (fp, " input [31: 0] ars_E；\n″)；

Fprintf (fp, " input [31: 0] art_E；\n″)；

Fprintf (fp, " input [23: 0] inst_R；\n″)；

Fprintf (fp, " input en_R；\n″)；

Fprintf (fp, " input Kill_E, killPipe_W, valid_W；\n″)；

Tie2ver_program_foreach_instruction (prog, inst)

Fprintf (fp, " input %s_R；N ", tie_instruction_get_name (inst))；

}end_tie2ver_program_foreach_instruction；

Tie2ver_program_foreach_instruction (prog, inst)

Fprintf (fp, " wire %s_E；N ", tie_instruction_get_name (inst))；

}end_tie2ver_program_foreach_instruction；

Operand_table=tie2ver_program_get_operand_table (prog)；

St_foreach_item (operand_table, gen ， &key ， &value)

if((tie_type_t)value！=TIE_ARG_IN)

Fprintf (fp, " wire [31: 0] %s_E；N ", key)；

}

One is write for each semantic chunk with for each each output selecting signal

Section wiring program

* * * * */

static void

Tie2ver_write_wire_declaration (FILE*fp, tie_t*prog)

{

Tie_t*semantic, * state；

St_table*operand_table, * global_operand_table；

St_table*state_table；

St_generator*gen；

Char*key, * value, * shame；

int width；

Global_operand_table=tie2ver_program_get_operand_table (prog)；

St_forsach_item (global_operand_table, gen ， &key ， &value)

If ((tie_type_t) value==TIE_ARG_IN)

If (strcmp (key, " art ")！=0 && strcmp (key, " ars ")！=0)

Fprintf (fp, " wire [31: 0] %s_R, %s_E；N ", key, key)；

}

Tie_program_foreach_state (prog, state)

if(tie_get_predefined(state))continue；

Sname=tie_state_get_name (state)；

Width=tie_state_get_width (state)；

Fprintf (fp, " wire [%d: 0] %s_ps, %s_ns；N ", width-1, sname,

sname)；

Fprintf (fp, " wire %s_we；N ", sname)；

} end_tie_program_foreach_state；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Sname=tie_semantic_get_name (semantic)；

Operand_table=tie_semantic_get_operand_table (semantic)；

St_foreach_item (operand_table, gen ， &key ， &value)

if((tie_type_t)value！=TIE_ARG_IN)

Fprintf (fp, " wire{31: 0] %s_%s；N ", sname, key)；

}

State_table=tie_semantic_get_out_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " wire [%d: 0] %s_%s_ns；N ", (int) value-1,

Sname, key)；

Fprintf (fp, " wire%s_%s_we；N ", sname, key)；

}

Fprintf (fp, " wire%s_select；N ", sname)；

}end_tie_program_foreach_semantic；

}

* * * * * *

Write a floating-point operation declarative statement

* * * * */

static void

Tie2ver_write_flop_instance (FILE*fp, char*name, int num)

{

Char*fmt；

Fmt=" tie_flop# (%d) f%s (.tie_out (%s_E) .tie_in (%s_R),

.clk(clk))；\n″；

Fprintf (fp, fmt, num, name, name, name)；

}

* * * * * *

Latch all command signals for R level

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * */

static void

Tie2ver_write_flop (FILE*fp, tie_t*prog)

{

Char*name；

Tie_t*inst；

Tie2ver_program_foreach_instruction (prog, inst)

Name=tie_instruction_get_name (inst)；

Tie2ver_write_flop_instance (fp, name, 1)；

}end_tie2ver_program_foreach_instruction；

}

* * * * * *

An example is write for each semantic chunk

* * * * */

static void

Tie2ver_write_semantic_instance (FILE*fp, tie_t*prog)

{

Tie_t*semantic, * ilist, * inst；

Const char*iname, * aname, * c；

St_table*operand_table, * state_table；

St_generator*gen；

Char*key, * value；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Iname=tie_semantic_get_name (semantic)；

Fprintf (fp, " %s i%s (", iname, iname)；

C=" "；

Operand_table=tie_semantic_get_operand_table (semantic)；

St_foreach_item (operand_table, gen ， &key ， &value)

If ((tie_type_t) value==TIE_ARG_IN)

Fprintf (fp, " %s n.%s (%s_E) ", c, key, key)；

}else{

Fprintf (fp, " %s n.%s (%s_%s) ", c, key, iname, key)；

}

C=", "；

}

State_table=tie_semantic_get_in_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " %s n.%s_ps (%s_ps) ", c, key, key)；

C=", "；

}

State_table=tie_semantic_get_out_state_table (semantic)；

St_foreach_item (state_table, gen ， &key ， &value)

Fprintf (fp, " %s n.%s_ns (%s_%s_ns) ", c, key, iname, key)；

Fprintf (fp, " %s n.%s_-we (%s_%s_we) ", c, key, iname, key)；

C=", "；

}

Ilist=tie_semantic_get_inst_list (semantic)；

Tie_inst_list_foreach_instruction (ilist, inst)

Aname=tie_instruction_get_name (inst)；

Fprintf (fp, ", n .%s (%s_E) ", aname, aname)；

}end_tie_inst_list_foreach_instruction；

Fprintf (fp, ")；\n″)；

}end_tie_program_foreach_semantic；

}

* * * * * *

An example is write for each state

* * * * */

static void

Tie2ver_write_state_instance (FILE*fp, tie_t*prog)

{

Tie_t*state；

Char*sname；

int width；

Tie_program_foreach_state (prog, state)

if(tie_get_predefined(state))continue；

Sname=tie_state_get_name (state)；

Width=tie_state_get_width (state)；

Fprintf (fp, " tie_athens_state # (%d) i%s (n ", width, sname)；

Fprintf (fp, " .ns (%s_ns), n ", sname)；

Fprintf (fp, " .we (%s_we), n ", sname)；

Fprintf (fp, " .ke (Kill_E), n ")；

Fprintf (fp, " .kp (killPipe_W), n ")；

Fprintf (fp, " .vw (valid_W), n ")；

Fprintf (fp, " .clk (clk), n ")；

Fprintf (fp, " .ps (%s_ps))；N ", sname)；

}end_tie_program_foreach_state；

}

* * * * * *

It is that an output compilation operation number selects logic

* * * * */

static void

Tie2ver_write_operand_selection_logic_one (FILE*fp, tie_t*prog, char

* name)

{

Tie_t*semantic；

Char*c, * dummy；

St_table*operand_table；

Fprintf (fp, " assign%s_E=", name)；

C=" "；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Operand_table=tie_semantic_get_operand_table (semantic)；

Fprintf (fp, " %s ", c)；

If (st_lookup (operand_table, name ， &dummy))

Fprintf (fp, " %s_ ", tie_semantic_get_name (semantic))；

Fprintf (fp, " %s & ", name)；

}else{

Fprintf (fp, " 32{1 ' b0}}& ")；

}

Fprintf (fp, " 32{%s_select}} ", tie_semantic_get_name (semantic))；

C=" n | "；

}end_tie_program_foreach_semantic；

Fprintf (fp, "；\n″)；

}

State selection logic is write for a kind of state

* * * * */

static void

tie2ver_write_state_selection_logic_one(

FILE*fp, tie_t*prog, char*name, int width)

{

Tie_t*semantic；

Char*c, * value, * sname；

St_table*state_table；

Fprintf (fp, " assign %s_ns=", name)；

C=" "；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Sname=tie_semantic_get_name (semantic)；

State_table=tie_semantic_get_out_state_table (semantic)；

Fprintf (fp, " %s ", c)；

If (st_lookup (state_table, name ， &value))

Fprintf (fp, " %s_%s_ns & ", sname, name)；

}else{

Fprintf (fp, " %d{1 ' b0}}& ", width)；

}

Fprintf (fp, " %d{%s_select}} ", width, sname)；

C=" n | "；

}end_tie_program_foreach_semantic；

Fprintf (fp, "；\n″)；

Fprintf (fp, " assign %s_we=", name)；

C=" "；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Sname=tie_semantic_get_name (semantic)；

State_table=tie_semantic_get_out_state_table (semantic)；

Fprintf (fp, " %s ", c)；

If (st_lookup (state_table, name ， &value))

Fprintf (fp, " %s_%s_we & ", sname, name)；

}else{

Fprintf (fp, " 1 ' b0 & ")；

}

Fprintf (fp, " %s_select ", sname)；

C=" n | "；

}end_tie_program_foreach_semantic；

Fprintf (fp, "；\n″)；

}

* * * * * *

Selection logic is write for top module

* * * * */

static void

Tie2ver_write_selection_logic (FILE*fp, tie_t*prog)

{

Tie_t*semantic, * ilist, * inst, * state；

Char*key, * value, * c, * sname；

St_table*global_operand_table；

St_generator*gen；

int width；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Ilist=tie_semantic_get_inst_list (semantic)；

Fprintf (fp, " assign %s_select=",

tie_semantic_get_name(semantic))；

C=" "；

Tie_inst_list_foreach_instruction (ilist, inst)

Fprintf (fp, " %s%s_E ", c, tie_instruction_get_name (inst))；

C=" n | "；

}end_tie_inst_list_foreach_instruction；

Fprintf (fp, "；\n″)；

}end_tie_program_foreach_semantic；

Global_operand_table=tie2ver_program_get_operand_table (prog)；

St_foreach_item (global_operand_table, gen ， &key ， &value)

if((tie_type_t)value！=TIE_ARG_IN)

Tie2ver_write_operand_selection_logic_one (fp, prog, key)；

Fprintf (fp, " assign out_E=%s_E；N ", key)；

}

Tie_program_foreach_state (prog, state)

if(tie_get_predefined(state))continue；

Sname=tie_state_get_name (state)；

Width=tie_state_get_width (state)；

Tie2ver_write_state_selection_logic_one (fp, prog, sname, width)；

}end_tie_program_foreach_state；

}

* * * * * *

Write a series of assignment statement, in order to from instruction, extract " field "

* * * * */

static void

Tie2ver_write_field_recur (FILE*fp, tie_t*prog, tie_t*field, char

* suffix)

{

Tie_t*subfield, * newfield；

Char*c, * name；

C=" "；

Fprintf (fp, " { ")；

Tie_field_foreach_subfield (field, subfield)

Fprintf (fp, " %s ", c)；

switch(tie_get_type(subfield)){

Case TIE_ID:

Name=tie_get_identifier (subfield)；

Newfield=tie_program_get_field_by_name (prog, name)；

If (newfield==0)

Fprintf (fp, " inst R ")；

}else{

Tie2ver_write_field_recur (fp, prog, newfield, suffix)；

}

break；

Case TIE_SUBFIELD:

Name=tie_subfield_get_name (subfield)；

Newfield=tie_program_get_field_by_name (prog, name)；

If (newfield==0)

Fprintf (fp, " inst_R ")；

}else{

DIE (" Error:unexpected subfield name (expect ' inst ') n ")；

}

Fprintf (fp, " [%d: ", tie_subfield_get_from_index (subfield))；

Fprintf (fp, " and %d] ", tie_subfield_get_to_index (subfield))；

break；

Default:

DIE (" Error:unexpected subfield type n ")；

}

C=", "；

}end_tie_field_foreach_subfield；

Fprintf (fp, " } ")；

}

* * * * * *

* * * * */

static void

Tie2ver_write_field (FILE*fp, tie_t*prog, tie_t*field, char*suffix)

{

Fprintf (fp, " assign %s%s=", tie_field_get_name (field), suffix)；

Tie2ver_write_field_recur (fp, prog, field, suffix)；

Fprintf (fp, "；\n″)；

}

* * * * * *

A module is write for " operand "

* * * * */

static void

Tie2ver_write_one_immediate (FILE*fp, tie_t*prog, tie_t*operand)

{

Tie_t*decoding, * field, * table；

Char*oname, * fname；

Ls_t*tables；

int width；

ASSERT (tie_get_type (operand)==TIE_OPERAND)；

Oname=tie_operand_get_name (operand)；

Fname=tie_operand_get_field_name (operand)；

Field=tie_program_get_field_by_name (prog, fname)；

Width=tie_field_get_width (field)；

Fprintf (fp, " n ")；

Fprintf (fp, " module %s (inst_R, %s)；N ", oname, oname)；

Fprintf (fp, " input [23: 0] inst_R；\n″)；

Fprintf (fp, " output [31: 0] %s；N ", oname)；

Fprintf (fp, " wire [%d: 0] %s；N ", tie_field_get_width (field)-1,

fname)；

Tie2ver_write_field (fp, prog, fieid, " ")；

Decoding=tie_operand_get_decoding_expression (operand)；

Fprintf (fp, " assign%s=", oname)；

Tie2ver_write_expression (fp, decoding, 0,0,0)；

Fprintf (fp, "；\n″)；

Tables=tie_expression_get_tables (decoding, prog)；

Ls_foreach_data (tie_t*, tables, table)

Tie2ver_write_table (fp, table)；

}end_ls_foreach_data；

ls_free(tables)；

Fprintf (fp, " endmodule n ")；

}

* * * * * *

A module is write for each immediate operation number decoder logic

* * * * */

static void

Tie2ver_write_immediate (FILE*fp, tie_t*prog)

{

St_table*operand_table；

Char*key, * value；

St_generator*gen；

tie_t*operand；

tie_t*field；

Operand_table=tie2ver_program_get_operand_table (prog)；

St_foreach_item (operand_table, gen ， &key ， &value)

If ((tie_type_t) value==TIE_ARG_IN)

If (strcmp (key, " art ")！=0 && strcmp (key, " ars ")！=0)

Operand=tie_program_get_operand_by_name (prog, key)；

if(operand！=0)

if(！tie_get_predefined(operand)){

Tie2ver_write_one_immediate (fp, prog, operand)；

}

}else{

Field=tie_program_get_fieid_by_name (prog, key)；

If (field==0)

Fprintf (stderr, " Error:invalidoperand %s N ", key)；

}

* * * * * *

For operand " write a module

* * * * */

static void

Tie2ver_write_one_operand_instance (FILE*fp, tie_t*prog, tie_t

* operand)

{

Char*oname；

ASSERT (tie_get_type (operand)==TIE_OPERAND)；

Oname=tie_operand_get_name (operand)；

Fprintf (fp, " %s i%s (.inst (inst_R) .%s (%s_R))；N ", oname, oname,

Oname, oname)；

Tie2ver_write_flop_instance (fp, oname, 32)；

}

* * * * * *

Write a statement, in order to from inst_R, extract " field name "

* * * * */

static void

Tie2ver_write_one_field_instance (FILE*fp, tie_t*prog, tie_t*field)

{

Char*name；

Tie2ver_write_field (fp, prog, field, " _ R ")；

Name=tie_field_get_name (field)；

Tie2ver_write_flop_instance (fp, name, 32)；

}

An example is write for each immediate operation number decoder logic

* * * * */

static void

Tie2ver_write_immediate_instance (FILE*fp, tie_t*prog)

{

Char*key, * value；

St_table*operand_table；

St_generator*gen；

Tie_t*operand, * field；

Operand_table=tie2ver_program_get_operand_table (prog)；

St_foreach_item (operand_table, gen ， &key ， &value)

If ((tie_type_t) value==TIE_ARG_IN)

Operand=tie_program_get_operand_by_name (prog, key)；

if(operand！=0 && tie_operand_is_immediate (operand))

Tie2ver_write_one_operand_instance (fp, prog, operand)；

Else if (operand==0)

Field=tie_program_get_field_by_name (prog, key)；

if(field！=0)

Tie2ver_write_one_field_instance (fp, prog, field)；

}

" prog " is printed to TIE file

* * * * */

void

Tie2ver_write_verilog (FILE*fp, tie_t*prog)

{

tie_t*semantic；

/ * write tie primitives*/

Fprintf (fp, COMMENTS)；

Fprintf (fp, TIE_ENFLOP)；

Fprintf (fp, TIE_FLOP)；

Fprintf (fp, TIE_ATHENS_STATE)；

/ * write each semantic block as a verilog module*/

ASSERT (tie_get_type (prog)==TIE_PROGRAM)；

Tie_program_foreach_semantic (prog, semantic)

if(tie_get_predefined(semantic))continue；

Tie2ver_write_semantic (fp, semantic)；

}end_tie_program_foreach_semantic；

/ * write each immediate operand as a verilog module*/

Tie2ver_write_immediate (fp, prog)；

/ * write the top_level Verilog module*/

Tie2ver_write_top_module (fp, prog)；

Tie2ver_write_wire_declaration (fp, prog)；

Tie2ver_write_flop (fp, prog)；

Tie2ver_write_immediate_instance (fp, prog)；

Tie2ver_write_semantic_instance (fp, prog)；

Tie2ver_write_state_instance (fp, prog)；

Tie2ver_write_selection_logic (fp, prog)；

Fprintf (fp, " endmodule n ")；

}

* * * * * *

" prog " is printed to TIE file

* * * * */

void

Tie2ver_write_instruction (FILE*fp, tie_t*prog)

{

tie_t*inst；

Int first=1；

Tie2ver_program_foreach_instruction (prog, inst)

if(first){

Fprintf (fp, " %s ", tie_instruction_get_name (inst))；

First=0；

}else{

Fprintf (fp, " %s ", tie_instruction_get_name (inst))；

}

}end_tie2ver_program_foreach_instruction；

}

/ *

* Local Variables:

* mode:c

* c-basic-offset:4

* End:

*/

Adnexa E

#include″tie.h″

#define COMMENTS "/* Do not modify.This is automatically

Generated.*/"

#define tie2gcc_program_foreach_instruction (_ prog, _ inst)

Tie_t*_iclass；\

Tie_program_foreach_iclass (_ prog, _ iclass)

if(tie_get_predefined(_-iclass))continue；\

Tie_iclass_foreach_instruction (_ iclass, _ inst)

#define end_tie2gcc_program_foreach_instruction \

}end_tie_iclass_foreach_instruction；\

}end_tie_program_foreach_iclass；\

}

Set up and return global program → for the independent variable form of user-defined instructions.

The form returned is not contained in each independent variable used in predefined instructions.

* * * * */

Static st_table*

Tie2gcc_program_get_operand_table (tie_t*prog)

{

Static st_table*tie2gcc_program_args=0；

Tie_t*inst；

Char*key, * value；

St_table*arg_table；

St_generator*gen；

If (tie2gcc_program_args==0)

Tie2gcc_program_args=st_init_table (strcmp, st_strhash)；

Tie2gcc_program_foreach_instruction (prog, inst)

Arg_table=tie_instruction_get_operand_table (inst)；

St_foreach_item (arg_table, gen ， &key ， &value)

St_insert (tie2gcc_program_args, key, value)；

}

st_free_table(arg_table)；

}end_tie2gcc_program_foreach_instruction；

}

return tie2gcc_program_args；

}

* * * * * *

Produce function and independent variable explanation

* * * * */

static void

Tie2gcc_write_function (FILE*fp, tie_t*inst, tie_t*args)

{

Tie_t*arg；

Char*c；

C=" "；

Fprintf (fp, " n#define %s (", tie_instruction_get_name (inst))；

Tie_args_foreach_arg (args, arg)

if(tie_get_type(arg)！=TIE_ARG_OUT)

Fprintf (fp, " %s%s ", c, tie_arg_get_name (arg))；

C=", "；

}

}end_tie_args_foreach_arg；

Fprintf (fp, ") n ")；

}

Return the list of each independent variable in " args ", first export args.The row returned, table

Should callee release.

* * * * */

Ls_t*

Tie2gcc_args_get_ordered (tie_t*args)

{

Tie_t*arg；

Ls_t*arglist；

Arglist=ls_alloc ()；

Tie_args_foreach_arg (args, arg)

if(tie_get_type(arg)！=TIE_ARG_IN)

Ls_append (arglist, arg)；

}

}end_tie_args_foreach_arg；

Tie_args_foreach_arg (args, arg)

if(tie_get_type(arg)！=TIE_ARG_OUT)

Ls_append (arglist, arg)；

}

}end_tie_args_foreach_arg；

return arglist；

}

Write out an ASM statement

* * * * */

static void

tie2gcc_write_one_asm(

FILE*fp, tie_t*prog, tie_t*inst, tie_t*args, int value)

{

Tie_t*arg, * operand, * state；

Tie_type_t type, ptype；

Ls_t*arglist；

Char*t, s, c, * name, * n；

int i；

/ * write the asm statement*/

Fprintf (fp, " asm volatile (" %s t ",

tie_instruction_get_name(inst))；

I=0；

Tie_args_foreach_arg (args, arg)

Fprintf (fp, " %s%%%d ", i==0？" ": ", ", i)；

i++；

}end_tie_args_foreach_arg；

Fprintf (fp, " " ")；

Ptype=TIE_UNKNOWN；

Arglist=tie2gcc_args_get_ordered (args)；

Ls_foreach_data (tie_t*, arglist, arg)

Name=tie_arg_get_name (arg)；

Operand=tie_program_get_operand_by_name (prog, name)；

if(operand！=0)

State=tie_operand_get_state (operand)；

if(state！=0)

N=tie_state_get_name (state)；

If (strcmp (n, " AR ")==0}{

C=' a '；

Else if (strcmp (n, " FR ")==0)

C=' f '；

Else if (strcmp (n, " DR ")==0)

C=' d '；

Else if (strcmp (n, " BR ")==0)

C=' b '；

}else{

DIE (" Internal Error:invalid state n ")；

}

}else{

C=' i '；

}

}else{

C=' i '；

}

Type=tie_get_type (arg)；

If (ptype==TIE_UNKNOWN && type==TIE_ARG_IN)

Fprintf (fp, ": ")；

}

S=type==ptype？', ': ': '；

T=type==TIE_ARG_IN？" ":, "="；

Fprintf (fp, " %c " %s%c " (%s) ", s, t, c, name)；

Ptype=type；

}end_ls_foreach_data；

ls_free(arglist)；

Fprintf (fp, ")；″)；

}

* * * * * *

Produce at line function for " inst "

* * * * */

static void

Tie2gcc_write_asm (FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)

{

Tie_t*arg, * out_arg；

/ * declear output variable and find the immediate operand*/

Fprintf (fp, " ({ ")；

Out_arg=0；

Tie_args_foreach arg (args, arg)

If (tie_get_type (arg)==TIE_ARG_OUT)

Fprintf (fp, " iht%s；", tie_arg_get_name (arg))；

Out_arg=arg；

}

}end_tie_args_foreach_arg；

Tie2gcc_write_one_asm (fp, prog, inst, args, _ 1)；

/ * return the results*/

if(out_arg！=0)

Fprintf (fp, " %s；", tie_arg_get_name (out_arg))；

}

Fprintf (fp, " }) n ")；

}

* * * * * *

For " inst " produce one grand

* * * * */

static void

Tie2gcc_write_inst (FILE*fp, tie_t*prog, tie_t*inst, tie_t*args)

{

Tie2gcc_write_function (fp, inst, args)；

Tie2gcc_write_asm (fp, prog, inst, args)；

}

Producing gcc header file, it will be included into application code, in order to uses user's definition

Instructions.

* * * * */

void

Tie2gcc_write_gcc (FILE*fp, tie_t*prog)

{

Tie_t*iclass, * ilist, * inst, * args；

ASSERT (tie_get_type (prog)==TIE_PROGRAM)；

Fprintf (fp, " %s n ", COMMENTS)；

Tie_program_foreach_iclass (prog, iclass)

if(tie_get_predefined(iclass))continue；

Ilist=tie_iclass_get_inst_list (iclass)；

Args=tie_iclass_get_-io_args (iclass)；

Tie_inst_list_foreach_instruction (ilist, inst)

Tie2gcc_write_inst (fp, prog, inst, args)；

}end_tie_inst_list_foreach_instruction；

}end_tie_program_foreach_iclass；

}

Write out each function right value with each numerical value immediately of test

* * * * */

static void

Tie2gcc_write_operand_check_one (FILE*fp, char*name)

{

Fprintf (fp, " nint n ")；

Fprintf (fp, " tensilica_%s (int v) n ", name)；

Fprintf (fp, " { n ")；

Fprintf (fp, " tensilica_insnbuf_type insn；\n″)；

Fprintf (fp, " int new_v；\n″)；

Fprintf (fp, " if (！Set_%s_field (insn, v)) return O；N ", name)；

Fprintf (fp, " new_v=get_%s_field (insn)；N ", name)；

Fprintf (fp, " return new_v==v；\n″)；

Fprintf (fp, " } n ")；

}

* * * * *

* * * * */

void

Tie2gcc_write_operand_check (FILE*fp, tie_t*prog)

{

St_table*arg_table；

St_generator*gen；

Char*key, * value；

Arg_table=tie2gcc_program_get_operand_table (prog)；

St_foreach_item (arg_table, gen ， &key ， &value)

If ((tie_type_t) value==TIE_ARG_IN)

If (strcmp (key, " art ")！=0&&strcmp (key, " ars ")！=0)

Tie2gcc_write_operand_check_one (fp, key)；

}

Adnexa F

/ *

* TIE user_register routines

*/

/ * $ Id*/

/ *

* These coded instructions, statements, and computer programs are

* Confidential Proprietary Information of Tensilica Inc.and may not

be

* disclosed to third parties or copied in any form, in whole or in

Part,

* without the prior written consent of Tensilica Inc.

*/

#include<math.h>

#include″tie.h″

#include″tie_int.h″

typede fstruct ureg_struct{

int statef；

int statet；

int uregf；

int uregt；

int ureg；

Char*name；

}ureg_t；

* * * * * *

Return the index of " ureg "

* * * * */

int

Tie_ureg_get_index (tie_t*ureg)

{

ASSERT (tie_get_type (ureg)==TIE_UREG)；

return tie_get_integer(tie_get_first_child(ureg))；

}

* * * * * *

Return the expression formula of " ureg "

* * * * */

Tie_t*

Tie_ureg_get_expression (tie_t*ureg)

{

Tie_t*index；

ASSERT (tie_get_type (ureg)==TIE_UREG)；

Index=tie_get_first_child (ureg)；

return tie_get_next_sibling(index)；

}

* * * * * *

Produce the character string of the constant index of an expression " ureg "

* * * * */

static char ureg_index[10]；

Char*tie_ureg_get_index_constant (tie_t*ureg)

{

Sprintf (ureg_index, " 8 ' d%d ", tie_ureg_get_index (ureg))；

return ureg_index；

}

* * * * *

Produce the st field for RUR instruction

* * * * */

static void

Tie_program_generate_st_field (tie_t*program)

{

Tie_t*field；

Field=tie_alloc (TIE_FIELD)；

Tie_append_child (field, tie_create_identifier (" st "))；

Tie_append_child (field, tie_create_identifier (" s "))；

Tie_append_child (field, tie_create_identifier (" t "))；

Tie_program_add (program, field)；

}

Produce RUR operation code

* * * * */

static void

Tie_program_generate_rur_opcode (tie_t*program)

{

Tie_t*opcode, * encode；

Opcode=tie_alloc (TIE_OPCODE)；

Tie_append_child (opcode, tie_create_identifier (" RUR "))；

Encode=tie_alloc (TIE_ENCODING)；

Tie_append_child (opcode, encode)；

Tie_append_child (encode, tie_create_identifier (" op2 "))；

Tie_append_child (encode, tie_create_constant (" 4 ' b1110 "))；

Encode=tie_alloc (TIE_ENCODING)；

Tie_append_child (opcode, encode)；

Tie_append_child (encode, tie_create_identifier (" RST3 "))；

Tie_program_add (program, opcode)；

}

* * * * * *

Produce WUR operation code

* * * * */

static void

Tie_program_generate_wur_opcode (tie_t*program)

{

Tie_t*opcode, * encode；

Opcode=tie_alloc (TIE_OPCODE)；

Tie_append_child (opcode, tie_create_identifier (" WUR "))；

Encode=tie_alloc (TIE_ENCODING)；

Tie_append_child (opcode, encode)；

Tie_append_child (encode, tie_create_identifier (" op2 "))；

Tie_append_child (encode, tie_create_constant (" 4 ' b1111 "))；

Encode=tie_alloc (TIE_ENCODING)；

Tie_append_child (opcode, encode)；

Tie_append_child (encode, tie_create_identifier (" RST3 "))；

Tie_program_add (program, opcode)；

}

Produce RUR iclass

* * * * */

static void

Tie_program_generate_rur_iclass (tie_t*program)

{

Tie_t*iclass, * ilist, * args, * arg, * state；

Char*name；

Iclass=tie_alloc (TIE_ICLASS)；

Tie_append_child (iclass, tie_create_identifier (" rur "))；

Ilist=tie_alloc (TIE_INST_LIST)；

Tie_append_child (iclass, ilist)；

Tie_append_child (ilist, tie_create_identifier (" RUR "))；

Args=tie_alloc (TIE_ARG_LIST)；

Tie_append_child (iclass, args)；

Arg=tie_alloc (TIE_ARG_OUT)；

Tie_append_child (args, arg)；

Tie_append_child (arg, tie_create_identifier (" arr "))；

Arg=tie_alloc (TIE_ARG_IN)；

Tie_append_child (args, arg)；

Tie_append_child (arg, tie_create_identifier (" st "))；

Args=tie_alloc (TIE_ARG_LIST)；

Tie_append_child (iclass, args)；

Tie_program_foreach_state (program, state)

if(tie_get_predefined(state))continue；

Arg=tie_alloc (TIE_ARG_IN)；

Tie_append_child (args, arg)；

Name=tie_state_get_name (state)；

Tie_append_child (arg, tie_create_identifier (name))；

}end_tie_program_foreach_state；

Tie_program_add (program, iclass)；

}

Produce WUR operation code

* * * * */

static void

Tie_program_generate_wur_iciass (tie_t*program)

{

Tie_t*iclass, * ilist, * args, * arg, * state；

Char*name；

Iclass=tie_alloc (TIE_ICLASS)；

Tie_append_child (iclass, tie_create_identifief (" wur "))；

Ilist=tie_alloc (TIE_INST_LIST)；

Tie_append_child (iclass, ilist)；

Tie_append_child (ilist, tie_create_identifier (" WUR "))；

Args=tie_alloc (TIE_ARG_LIST)；

Tie_append_child (iclass, args)；

Arg=tie_alloc (TIE_ARG_IN)；

Tie_append_child (args, arg)；

Tie_append_child (arg, tie_create_identifier (" art "))；

Arg=tie_alloc (TIE_ARG_IN)；

Tie_append_child (args, arg)；

Tie_append_child (arg, tie_create_identifier (" sr "))；

Args=tie_alloc (TIE_ARG_LIST)；

Tie_append_child (iclass, args)；

Tie_program_foreach_state (program, state)

if(tie_get_predefined(state))continue；

Arg=tie_alloc (TIE_ARG_INOUT)；

Tie_append_child (args, arg)；

Name=tie_state_get_name (state)；

Tie_append_child (arg, tie_create_identifier (name))；

}end_tie_program_foreach_state；

Tie_program_add (program, iclass)；

}

* * * * * *

A group selection signal is produced for each ureg

* * * * */

static void

Tie_program_generate_selection_signals (tie_t*prog, tie_t*stmt, char

* fname)

{

Tie_t*ureg, * wire, * assign, * equal, * id；

Intindex, max_index, width；

char wname[80]；

Max_index=0；

Tie_program_foreach_ureg (prog, ureg)

Index=tie_ureg_get_index (ureg)；

Max_index=MAX (max_index, index)；

}end_tie_program_foreach_ureg；

Width=(int) ceil (log (max_index+1)/log (2))；

Tie_program_foreach_ureg (prog, ureg)

Index=tie_ureg_get_index (ureg)；

Wire=tie_alloc (TIE_WIRE)；

Sprintf (wname, " ureg_sel_%d ", index)；

Tie_append_child (wire, tie_create_integer (0))；

Tie_append_child (wire, tie_create_identifier (wname))；

Tie_append_child (stmt, wire)；

Assign=tie_alloc (TIE_ASSIGNMENT)；

Tie_append_child (assign, tie_create_identifier (wname))；

Tie_append_child (stmt, assign)；

Equal=tie_alloc (TIE_EQ)；

Sprintf (wname, " %d ' d%d ", width, index)；

Id=tie_create_identifier (fname)；

Tie_append_child (id, tie_create_integer (width_1))；

Tie_-append_child (id, tie_create_integer (0))；

Tie_append_child (equal, id)；

Tie_append_child (equal, tie_create_constant (wname))；

Tie_append_child (assign, equal)；

}end_tie_program_foreach_ureg；

}

Return RUR for " ureg " and all each ureg before it and select logic

* * * * */

Static tie_t*

Tie_program_rur_semantic_recur (ls_handle_t*ureg_handle)

{

Tie_t*and, * node, * or, * rep；

Node=tie_program_rur_semantic_recur (handle)；

Tie_append_child (assign, node)；

ls_free(ureg_list)；

Tie_program_add (program, semantic)；

}

All members of " ureg " are sent into " list "

* * * * */

static void

Tie_ureg_exp_get_components (tie_t*exp, ls_t*list)

{

Tie_t*child；

If (tie_get_type (exp)==TIE_ID)

Ls_prepend (list, exp)；

}

Tie_foreach_child (exp, child)

Tie_ureg_exp_get_components (child, list)；

}end_tie_foreach_child；

}

* * * * * *

Take a status list and be sent to ur mapping

* * * * */

static void

Tie_state_list_insert (ls_t*list, ureg_t*ur)

{

Ureg_t*item；

Ls_handle_t*handle；

Handle=0；

Ls_foreach_handle (list, handle)

Item=(ureg_t*) ls_handle_get_data (handle)；

if(item->statef<ur->statet){

break；

}

}end_ls_forea_handle；

If (handle==0)

Ls_append (list, ur)；

}else{

Ls_insert_before (handle, ur)；

}

* * * * * *

Take a status list and be sent to ur mapping

Tie_t*ureg=(tie_t*) ls_handle_get_data (ureg_handle)；

Ls_handle_t*ureg_next；

char sname[80]；

And=tie_alloc (TIE_BITWISE_AND)；

Rep=tie_alloc (TIE_REPLICATION)；

Tie_append_child (and, rep)；

Tie_append_child (rep, tie_create_integer (32))；

Sprintf (sname, " ureg_sel_%d ", tie_ureg_get_index (ureg))；

Tie_append_child (rep, tie_create_identifier (sname))；

Tie_append_child (and, tie_dup (tie_ureg_get_expression (ureg)))；

Ureg_next=ls_handle_get_next_handle (ureg_handle)；

If (ureg_next==0)

return and；

}else{

Node=tie_program_rur_semantic_recur (ureg_next)；

Or=tie_alloc (TIE_BITWISE_OR)；

Tie_append_child (or, and)；

Tie_append_child (or, node)；

return or；

}

/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * *

Produce RUR semantic chunk

* * * * */

static void

Tie_program_generate_rur_semantic (tie_t*program)

{

Tie_t*ureg, * semantic, * ilist, * statement, * assign, * node；

Ls_t*ureg_list；

Ls_handle_t*handle；

Semantic=tie_alloc (TIE_SEMANTIC)；

Tie_append_child (semantic, tie_create_identifier (" rur "))；

Ilist=tie_alloc (TIE_INST_LIST)；

Tie_append_child (ilist, tie_create_identifier (" RUR "))；

Tie_append_child (semantic, ilist)；

Statement=tie_alloc (TIE_STATEMENT)；

Tie_append_child (semantic, statement)；

Tie_program_generate_selection_signals (program, statement, " st ")；

Assign=tie alloc (TIE_ASSIGNMENT)；

Tie_append_child (statement, assign)；

Tie_append_child (assign, tie_create_identifier (" arr "))；

Ureg_list=ls_alloc ()；

Tie_program_foreach_ureg (program, ureg)

Ls_append (ureg_list, ureg)；

}end_tie_program_foreach_ureg；

Handle=ls_get_first_handle (ureg_list)；

* * * * */

static void

Tie_state_get_ur_mapping (tie_t*prog, tie_t*state, tie_t*ureg, ls_t

* list)

{

Tie_t*exp, * child, * s, * id；

Int num, uregf, uregt, statef, statet；

Ls_t*id_list；

Char*sname, * iname；

Ureg_t*ur；

Exp=tie_ureg_get_expression (ureg)；

Num=tie_ureg_get_index (ureg)；

Sname=tie_state_get_name (state)；

Id_list=ls_alloc ()；

Tie_ureg_exp_get_components (exp, id_list)；

Uregt=uregf=-1；

Ls_foreach_data (tie_t*, id_list, id)

Iname=tie_get_identifier (id)；

Child=tie_get_first_child (id)；

/ * compute the next uregf and uregt*/

If (child==0)

S=tie_program_get_state_by_name (prog, iname)；

ASSERT(s！=0)；

Statet=0；

Statef=tie_state_get_width (s)-1；

}else{

Statef=tie_get_integer (child)；

Child=tie_get_next_sibling (child)；

If (child==0)

Statet=statef；

}else{

Statet=tie_get_integer (child)；

}

Uregt=uregf+1；

Uregf=uregt+ (statef-statet)；

If (strcmp (iname, sname)==0)

Ur=ALLOC (ureg-t, 1)；

Ur-> statef=statef；

Ur-> statet=statet；

Ur-> uregf=uregf；

Ur-> uregt=uregt；

Ur-> ureg=num；

Ur-> name=" art "；

Tie_state_list_insert (list, ur)；

}

}end_ls_foreach data；

}

* * * * * *

Space is filled in state-to-ur mapping table

* * * * */

static void

Tie_state_fill_gap (tie_t*state, ls_t*list)

{

Int width, statet, statef；

Ls_handle_t*handle；

Ureg_t*ur, * gap；

Char*name；

Width=tie_state_get_width (state)；

Name=tie_state_get_name (state)；

Statet=statef=width；

Ls_foreach_handle (list, handle)

Ur=(ureg_t*) ls_handle_get_data (handle)；

if(ur->statef<(statet-1)){

Gap=ALLOC (ureg_t, 1)；

Gap-> statef=statet-1；

Gap-> statet=ur-> statef+1；

Gap-> uregf=gap-> uregt=gap-> ureg=-1；

Gap-> name=0；

Ls_insert_before (handle, gap)；

}

Statet=ur-> statet；

Statef=ur-> statef；

}end_ls_foreach_handle；

Handle=ls_get_last_handle (list)；

Ur=(ureg_t*) ls_handle_get_data (handle)；

if(ur->statet>0){

Gap=ALLOC (ureg_t, 1)；

Gap-> statef=ur-> statet-1；

Gap-> statet=0；

Gap-> uregf=gap-> uregt=gap-> ureg=-1；

Gap-> name=0；

Ls_insert_after (handle, gap)；

}

* * * * * *

Produce WUR semantic chunk

* * * * */

static void

Tie_program_generate_wur_semantic (tie_t*program)

{

Tie_t*ureg, * semantic, * ilist, * statement, * assign, * cond；

Tie_t*state, * concat, * id；

Ureg_t*ur；

Char*sname, selname [80]；

Ls_t*list；

Semantic=tie_alloc (TIE_SEMANTIC)；

Tie_append_child (program, semantic)；

Tie_append_child (semantic, tie_create_identifier (" wur "))；

Ilist=tie_alloc (TIE_INST_LIST)；

Tie_append_child (ilist, tie_create_identifier (" WUR "))；

Tie_append_child (semantic, ilist)；

Statement=tie_alloc (TIE_STATEMENT)；

Tie_append_child (semantic, statement)；

Tie_program_generate_selection_signals (program, statement, " sr ")；

Tie_program_foreach_state (program, state)

if(tie_get_predefined(state))continue；

Sname=tie_state_get_name (state)；

List=ls_alloc ()；

Tie_program_foreach_ureg (program, ureg)

Tie_state_get_ur_mapping (program, state, ureg, list)；

}end_tie_program_foreach_ureg；

Tie_state_fill_gap (state, list)；

Assign=tie_alloc (TIE_ASSIGNMENT)；

Tie_append_child (statement, assign)；

Tie_append_child (assign, tie_create_identifier (sname))；

Concat=tie_alloc (TIE_CONCATENATION)；

Tie_append_child (assign, concat)；

Ls_foreach_data (ureg_t*, list, ur)

if(ur_>name！=0)

Cond=tie_alloc (TIE_CONDITIONAL)；

Tie_append_child (concat, cond)；

Sprintf (selname, " ureg_sel_%d ", ur_ > ureg)；

Id=tie_create_identifier (selname)；

Tie_append_child (cond, id)；

Id=tie_create_identifier (ur_ > name)；

Tie_append_child (id, tie_create_integer (ur-> uregf))；

Tie_append_child (id, tie_create_integer (ur-> uregt))；

Tie_appemd_-child (cond, id)；

Id=tie_create_identifier (sname)；

Tie_append_child (id, tie_create_integer (ur-> statef))；

Tie_append_child (id, tie_create_integer (ur-> statet))；

Tie_append_child (cond, id)；

}else{

Id=tie_create_identifier (sname)；

Tie_append_child (id, tie_create_integer (ur-> statef))；

Tie_append_child (id, tie_create_integer (ur-> statet))；

Tie_append_child (concat, id)；

}

}end_ls_foreach_data；

ls_free(list)；

}end_tie_program_foreach_state；

}

* * * * * *

Produce WUR semantic chunk

* * * * */

void

tie_program_generate_rurwur(tie_t*program)

{

Tie_t*ureg；

Int-num=0；

Tie_program_foreach_ureg (program, ureg)

num++；

}end_tie_program_foreach_ureg；

If (num==0)

return；

}

tie_program_generate_st_field(program)；

tie_program_generate_rur_opcode(program)；

tie_program_generate_wur_-opcode(program)；

tie_program_generate_rur_iclass(program)；

tie_program_generate_wur_iclass(program)；

tie_program_generate_rur_semantic(program)；

tie_program_generate_wur_semantic(program)；

}

Adnexa G

150

//define a new opcode for BYTESWAP based on

// -a predefined instruction field op2

// -a predefined opcode CUSTO

//refer to Xtensa ISA manual for descriptions of op2 and CUSTO

Opcode BYTESWAP op2=4 ' b0000 CUSTO

//declare a state ACC used to accumulate byte-swapped data

state ACC 32

//declare a mode bit SWAP to control the swap

state SWAP 1

//use " RUR ar, 0 " and " WUR ar, 0 " to move data between AR and ACC

user_register 0 ACC

//use " RUR ar, 1 " and " WUR ar, 1 " to move data between AR and SWAP

user_register 1 SWAP

//define a new instruction class that

// -reads data from ars(predefined to be AR[s])

// -uses and writes state ACC

// -uses state SWAP

Iclass bs{BYTESWAP}{in ars}{inout ACC, in SWAP}

//semantic definition of byteswap

// Accumulates to ACC the byte-swapped ars(AR[s])or

// ars depending on the SWAP bit

semantic bs{BYTESWAP}{

Wire [31:0] ars_swap={ars [7:0], ars [15:8], ars [23:16], ars [31:24] }；

Assign ACC=ACC+ (SWAP？Ars_swap:ars)；

}

Adnexa H

#define PARAMS(_arg)_arg

typedef signed int int32_t；

typedef unsigned int u_int32_t；

Typedef void* xtensa_isa；

Typedef void* xtensa_operand；

typedef int xtensa_opcode；

#define XTENSA_UNDEFINED-1

typedef u_int32_t xtensa_insnbuf_word；

Typedef xtensa insnbuf_word * xtensa insnbuf；

typedef enum{

Xtensa_encode_result_ok,

Xtensa_encode_result_align,

Xtensa_encode_result_not_in_table,

Xtensa_encode_result_too_low,

Xtensa_encode_result_too_high,

xtensa_encode_result_not_ok

}xtensa_encode_result；

Typedef u_int32_t (* xtensa_immed_decode_fn) PARAMS ((u_int32_t val))；

Typedef xtensa_encode_result (* xtensa_immed_encode_fn)

PARAMS ((u_int32_t*valp))；

Typedef u_int32_t (* xtensa_get_field_fn) PARAMS ((const xtensa_insnbuf

insn))；

Typedef void (* xtensa_set_field_fn) PARAMS ((xtensa_insnbuf insn,

u_int32_t val))；

Typedef int (* xtensa_insn_decode_fn) PARAMS ((const xtensa_insnbuf

insn))；

typedef struct xtensa_operand_internal_struct{

char operand_kind；

char inout；

xtensa_get_field_fn get_field；

xtensa_set_field_fn set_field；

xtensa_immed_encode_fn encode；

xtensa_immed_decode_fn decode；

}xtensa_operand_internal；

typede fstruct xtensa_iclass_internal_struct{

int num_operands；

Xtensa_operand_internal**operands；

}xtensa_iclass_internal；

typedef struct xtensa_opcode_internal_struct{

Const char*name；

int length；

xtensa_insnbuf encoding_template；

Xtensa_iclass_internal*iclass；

}xtensa_opcode_internal；

typedef structopname_lookup_entry_struct{

Const char*key；

xtensa_opcode opcode；

}opname_lookup_entry；

typedef struct xtensa_isa_internal_struct{

int insn_size；

int insnbuf_size；

int num_opcodes；

Xtensa_opcode_internal**opcode_table；

int num_modules；

Int*module_opcode base；

Xtensa_insn_decode_fn*module_decode_fn；

Opname_lookup_entry*opname_lookup_table；

}xtensa_isa_internal；

externu_int32_tget_r_field(const xtensa_insnbuf insn)；

Extern void set_r_field (xtensa_insnbuf insn, u_int32_t val)；

extern u_int32_t get_s_field(const xtensa_insnbuf insn)；

Extern void set_s_field (xtensa_insnbuf insn, u_int32_t val)；

extern u_int32_t get_sr_field(const xtensa_insnbuf insn)；

Extern void set_sr_field (xtensa_insnbuf insn, u_int32_t val)；

extern u_int32_t get_t_field(const xtensa_insnbuf insn)；

Extern void set_t_field (xtensa_insnbuf insn, u_int32_t val)；

Extern xtensa_encode_result encode_r (u_int32_t*valp)；

extern u_int32_t decode_r(u_int32_t val)；

Extern xtensa_encode_result encode_s (u_int32_t*valp)；

extern u_int32_t decode_s(u_int32_t val)；

Extern xtensa_encode_result encode_sr (u_int32_t*valp)；

extern u_int32_t decode_sr(u_int32_t val)；

Extern xtensa_encode_result encode_t (u_int32_t*valp)；

extern u_int32_t decode_t(u_int32_tval)；

static u_int32t get_st_field(insn)

const xtensa_insnbuf insn；

{

u_int32_t temp；

Temp=0；

Temp |=((insn [0] & 0 × f00)>>8)<<4；

Temp |=((insn [0] & 0 × f0)>>4)<<0；

return temp；

}

Static void set_st_field (insn, val)

xtensa_insnbuf insn；u_int32_tval；

{

Insn [0]=(insn [0] & 0 × fffff0ff) | ((val & 0 × f0) < < 8)；

Insn [0]=(insn [0] & 0 × ffffff0f) | ((val & 0 × f) < < 4)；

}

static u_int32t decode_st(u_int32_t val)

{

return val；

}

Static xtensa_encode_result encode_st (u_int32_t*valp)

{

If ((* valp > > 8)！=0)

return xtensa_encode_result_too_high；

}else{

return xtensa_encode_result_ok；

}

Static xtensa_operand_internal aor_operand={

' a ',

' > ',

Get_r_field,

Set_r_field,

Encode_r,

decode_r

}；

Static xtensa_operand_internal ais_operand={

' a ',

' < ',

Get_s_field,

Set_s_field,

Encode_s,

decode_s

}；

Static xtensa_operand_internal ait_operand={

' a ',

' < ',

Get_t_field,

Set_t_field,

Encode_t,

decode_t

}；

Static xtensa_operand_internal iisr_operand={

' i ',

' < ',

Get_sr_field,

Set_sr_field,

Encode_sr,

decode_sr

}；

Static xtensa_operand_internal iist_operand={

' i ',

' < ',

Get_st_field,

Set_st_field,

Encode_st,

decode_st

}；

Static xtensa_operand_internal*bs_operand_list []=

　　&ais_operand

}；

Static xtensa_iclass_internal bs_iclass={

1,

&bs_operand_list[0]

}；

Static xtensa_operand_internal*rur_operand_list []=(

&aor_operand,

&iist_operand

}；

Static xtensa_iclass_internal rur_iclass={

2,

&rur_operand_list[0]

}；

Static xtensa_operand_internal*wur_operand_list []=

&ait_operand,

&iisr_operand

}；

Static xtensa_iclass_internal wur_iclass={

2,

&wur_operand_list[0]

}；

Static xtensa_insnbuf_word BYTESWAP_template []={ 0x60000}；

Static xtensa_opcode_internal BYTESWAP_opcode={

" byteswap ",

3,

&BYTESWAP_template [0],

&bs_iclass

}；

Static xtensa_insnbuf_word RUR_template []={ 0xe30000}；

Static xtensa_opcode_internal RUR_opcode={

" rur ",

3,

&RUR_template [0],

&rur_iclass

}；

Static xtensa_insnbuf_word WUR_template []={ 0xf30000}；

Static xtensa_opcode_internal WUR_opcode={

" wur ",

3,

&WUR_template [0],

&wur_iclass

}；

Static xtensa_opcode_internal*opcodes []=

&BYTESWAP_opcode,

&RUR_opcode,

&WUR_opcode

}；

Xtensa_opcode_internal**get_opcodes () { return & opcodes [0]；}

const int get_num_opcodes(){return3；}

#define xtensa_BYTESWAP_op 0

#define xtensa_RUR_op1

#define xtensa_WUR_op2

int decode_insn(const xtensa_insnbuf insn)

{

If ((insn [0] & 0 × ff000f)==0 × 60000) return xtensa_BYTESWAP_op；

If ((insn [0] & 0 × ff000f)==0 × e30000) return xtensa_RUR_op；

If ((insn [0] & 0 × ff000f)==0 × f30000) return xtensa_WUR_-op；

return XTENSA_UNDEFINED；

}

Adnexa I

typedef unsigned u32；

typedef struct u64str{unsigned int lo；unsigned int hi；}u64；

extern u32 state32(inti)；

extern u64 state64(inti)；

Extern void set_state32 (int i, u32v)；

Extern void set_state64 (int i, u64v)；

Extern void set_ar (int i, u32v)；

extern u32 ar(int i)；

extern void pc_incr(int i)；

extern int au×32_fetchfirst(void)；

extern void pipe_use_ifetch(intn)；

extern void pipe_use_dcache(void)；

extern void pipe_def_ifetch(int n)；

extern int arcode(void)；

Extern void pipe_use (int n, int v, int i)；

Extern void pipe_def (int n, int v, int i)；

struct state_tbl_entry{

Const char * name；

int numbits；

}；

#define STATE_ACC 0

#define STATE_SWAP 1

#define NUM_STATES 2

Struct state_tbl_entrylocal_state_tbl [NUM_STATES+1]=

" ACC ", 32},

" SWAP ", 1},

{ " ", 0}

}；

Extern " C " structstate_tbl_entry*get_state_tbl (void)；

Structstate_tbl_entry*get_state_tbl (void)

{

return & local_state_tbl[0]；

}

/ * constant table ai4const */

Static const unsigned CONST_TBL_AI4CONST []=

0 × ffffffff,

0 × 1,

0 × 2,

0 × 3,

0 × 4,

0 × 5,

0 × 6,

0 × 7,

0 × 8,

0 × 9,

0 × a,

0 × b,

0 × c,

0 × d,

0 × e,

0×f

}；

/ * constant table b4const*/

Static const unsigned CONST_TBL_B4CONST []=

0 × ffffffff,

0 × l,

0 × 2,

0 × 3,

0 × 4,

0 × 5,

0 × 6,

0 × 7,

0 × 8,

0 × a,

0 × c,

0 × 10,

0 × 20,

0 × 40,

0 × 80,

0×100

}；

/ * constant table b4constu*/

Static const unsigned CONST_TBL_B4CONSTU []=

0 × 8000,

0 × 10000,

0 × 2,

0 × 3,

0 × 4,

0 × 5,

0 × 6,

0 × 7,

0 × 8,

0 × a,

0 × c,

0 × 10,

0 × 20,

0 × 40,

0 × 80,

0×100

}；

/ * constant table d01tab*/

Static const unsigned CONST_TBL_D01TAB []=

0,

0×1

}；

/ * constanttable d23tab*/

Static const unsigned CONST_TBL_D23TAB []=

0 × 2,

0×3

}；

/ * constant table i4plconst*/

Static const unsigned CONST_TBL_I4P1CONST []=

0 × 1,

0 × 2,

0 × 3,

0 × 4,

0 × 5,

0 × 6,

0 × 7,

0 × 8,

0 × 9,

0 × a,

0 × b,

0 × c,

0 × d,

0 × e,

0 × f,

0×10

}；

/ * constant table mip32const*/

Static const unsigned CONST_TBL_MI P32CONST []=

0 × 20,

0 × 1f,

0 × 1e,

0 × 1d,

0 × 1c,

0 × 1b,

0 × 1a,

0 × 19,

0 × 18,

0 × 17,

0 × 16,

0 × 15,

0 × 14,

0 × 13,

0 × 12,

0 × 11,

0 × 10,

0 × f,

0 × e,

0 × d,

0 × c,

0 × b,

0 × a,

0 × 9,

0 × 8,

0 × 7,

0 × 6,

0 × 5,

0 × 4,

0 × 3,

0 × 2,

0×1

}；

void

BYTESWAP_func (u32_OPND0_, u32_OPND1_, u32_OPND_2_, u32_OPND3)

{

Unsigned ars=ar (_ OPND0_)；

U32 ACC=state32 (STATE_ACC)；

U32S WAP=state32 (STATE_SWAP)；

unsigned_tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

unsigned ACC_ns；

unsigned ars_swap；

SWAP_ps=SWAP；

ACC_ps=ACC；

Ars_swap=(((ars & 0 × ff))<<24) | ((((ars>>8) & 0 × ff))<<

16)|((((ars>>16) & 0×ff))<<8)|(((ars>>24) & 0×ff))；

if(SWAP_ps){

_ tmp0=ars_swap；

)else{

_ tmp0=ars；

}

ACC_ns=ACC_ps+_tmp0；

ACC=ACC_ns；

Set_state32 (STATE_ACC, ACC)；

pc_incr(3)；

}

void

RUR_func (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)

{

unsigned arr；

Unsigned st=_OPND1_；

U32 ACC=state32 (STATE_ACC)；

U32 SWAP=state32 (STATE_SWAP)；

unsigned_tmp1；

unsigned_tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

SWAP_ps=SWAP；

ACC_ps=ACC；

If (st==1)

_ tmp0=SWAP_ps；

}else{

_ tmp0=0；

}

If (st==0)

_ tmp1=ACC_ps；

}else{

_ tmp1=_tmp0；

}

Arr=_tmp1；

Set_ar (_ OPND0_, arr)；

pc_incr(3)；

}

void

WUR_func (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32_OPND3_)

{

Unsigned art=ar (_ OPND0_)；

Unsigned sr=_OPND1_；

U32 ACC=state32 (STATE_ACC)；

U32 SWAP=state32 (STATE_SWAP)；

unsigned _tmp1；

unsigned _tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

unsigned SWAP_ns；

unsigned ACC_ns；

unsigned ureg_sel_0；

unsigned ureg_sel_1；

SWAP_ps=SWAP；

ACC_ps=ACC；

Ureg_sel_0=sr==0；

Ureg_sel_1=sr==1；

if(ureg_sel_0){

_ tmp0=art；

}else{

_ tmp0=ACC_ps；

}

ACC_ns=_tmp0；

if(ureg_sel_1){

_ tmp1=(art & 0 × 1)；

}else{

_ tmp1=(SWAP_ps & 0 × 1)；

}

SWAP_ns=_tmp1；

ACC=ACC_ns；

SWAP=SWAP_ns；

Set_state32 (STATE_ACC, ACC)；

Set_state32 (STATE_SWAP, SWAP)；

pc_incr(3)；

}

Void BYTESWAP_sched (u32 op0, u32 op1, u32 op2, u32 op3)

{

int ff；

int cond；

Ff=au × 32_fetchfirst ()；

if(ff){

pipe_use_ifetch(3)；

}

Pipe_use (arcode (), op0,1)；

if(！ff){

pipe_use_ifetch(3)；

}

pipe_use_dcache()；

pipe_def_ifetch(-1)；

}

Void RUR_sched (u32 op0, u32 op1, u32 op2, u32 op3)

{

int ff；

int cond；

Ff=au × 32 fetchfirst ()；

if(ff){

pipe_use_ifetch (3)；

}

if(！ff){

pipe_use_ifetch (3)；

}

pipe_use_dcache ()；

Pipe_def (arcode (), op0,2)；

pipe_def_ifetch (-1)；

}

Void WUR_sched (u32 op0, u32 op1, u32 op2, u32 op3)

{

int ff；

int cond；

Ff=au × 32_fetchfirst ()；

if (ff){

pipe_use_i fetch (3)；

}

Pipe_use (arcode (), op0,1)；

if(！ff){

pipe_use_ifetch (3)；

}

pipe_use_dcache ()；

pipe_def_ifetch (-1)；

}

Typedef void (SEMFUNC) (u32_OPND0_, u32_OPND1_, u32_OPND2_, u32

_OPND3_)；

struct isafunc_tbl_entry{

Const char * opname；

SEMFUNC * semfn；

SEMFUNC * schedfn；

}；

Static struct isafunc_tbl_entrylocal_fptr_tbl []=

" byteswap ", BYTESWAP_func, BYTESWAP_sched},

" rur ", RUR_func, RUR_sched},

" wur ", WUR_func, WUR_sched},

" ", 0,0}

}；

Extern " C " struct isafunc_tbl_entry*get_isafunc_tbl (void)；

Struct isafunc_tbl_entry * get_isafunc_tbl (void)

{

return & local_fptr_tbl[0]；

}

Adnexa J

/ * does not modify.This automatically generates.*/

#define BYTESWA

P(ars)\

(asm volatile (" BYTESWAP %0 ":: " a " (ars))；})

#define RUR(st)\

({int arr；Asm volatile (" RUR %0, %1 ": "=a " (arr): " i " (st))；

arr；})

#define WUR (art, sr)

(asm volatile (" WUR %0, %1 ":: " a " (art), " i " (sr))

Adnexa K

#ifdef TIE_DEBUG

#define BYTESWAP TIE_BYTESWAP

#define RUR TIE_RUR

#define WUR TIE_WUR

#endif

typedef unsigned u32；

#define STATE32_ACC 0

#define STATE_ACC STATE32_ACC

#define STATE32_SWAP 1

#define STATE_SWAP STATE32_SWAP

#define NUM_STATE32 2

static u32 state32table[NUM_STATE32]；

Static char*state32_name_table [NUM_STATE32]=

" ACC ",

″SWAP″

}；

static u32 state32(int rn){return state32_table[rn]；}

Static void set_state32 (int rn, u32s) { state32_table [rn]=s；}

static int num_state32(void){return NUM_STATE32；}

Static char*state32_name (int rn) { return state32_name_table [rn]；}

void

BYTESWAP(unsigned ars)

{

U32 ACC=state32 (STATE_ACC)；

U32 SWAP=state32 (STATE_SWAP)；

unsigned_tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

unsigned ACC_ns；

unsigned ars_swap；

SWAP_ps=SWAP；

ACC_ps=ACC；

Ars_swap=(((ars & 0 × ff))<<24) | ((((ars>>8) & 0 × ff))<<

16)|((((ars>>16)& 0×ff))<<8)|(((ars>>24)& 0×ff))；

if(SWAP_ps){

_ tmp0=ars_swap；

}else{

_ tmp0=ars；

}

ACC_ns=ACC_ps+_tmp0；

ACC=ACC_ns；

Set_state32 (STATE_ACC, ACC)；

}

unsigned

RUR(unsigned st)

{

unsigned arr；

U32 ACC=state32 (STATE_ACC)；

U32 SWAP=state32 (STATE_SWAP)；

unsigned_tmp1；

unsigned_tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

SWAP_ps=SWAP；

ACC_ps=ACC；

If (st==1)

Tmp0=SWAP_ps；

}else{

Tmp0=0；

}

If (st==0)

_ tmp1=ACC_ps；

}else{

_ tmp1=tmp0；

}

Arr=_tmp1；

return arr；

}

void

WUR (unsigned art, unsigned sr)

{

U32 ACC=state32 (STATE_ACC)；

U32 SWAP=state32 (STATE_SWAP)；

unsigned_tmp1；

unsigned_tmp0；

unsigned SWAP_ps；

unsigned ACC_ps；

unsigned SWAP_ns；

unsigned ACC_ns；

unsigned ureg_sel_0；

unsigned ureg_sel_1；

SWAP_ps=SWAP；

ACC_ps=ACC；

Ureg_sel_0=sr==0；

Ureg_sel_1=sr==1；

if(ureg_sel_0){

Tmp0=art；

}else{

_ tmp0=ACC_ps；

}

ACC_ns=_tmp0；

if(ureg_sel_1){

Tmp1=(art& 0 × 1)；

}else{

_ tmp1=(SWAP_ps & 0 × 1)；

}

SWAP_ns=_tmp1；

ACC=ACC_ns；

SWAP=SWAP_ns；

Set_state32 (STATE_ACC, ACC)；

Set_state32 (STATE_SWAP, SWAP)；

}

#ifdef TIE_DEBUG

#unde fBYTESWAP

#undef RUR

#undef WUR

#endif

Adnexa L

//Do not modify this automatically generated file.

Module tie_enflop (tie_out, tie_in, en, clk)；

Parameter size=32；

output [size-1∶0] tie_out；

input [size-1∶0] tie_in；

input en；

input clk；

reg[size-1∶0] tmp；

Assign tie_out=tmp；

always @(p@osedge clk) begin

if (en)

Tmp≤#1tie_in；

end

endmodule

Module tie_flop (tie_out, tie_in, clk)；

Parameter size=32；

output [size-1∶0] tie_out；

input [size-1∶0] tie_in；

input clk；

reg [size-1∶0] tmp；

Assign tie_out=tmp；

always @(posedge clk) begin

Tmp≤#1 tie_in；

end

endmodule

Module tie_athens_state (ns, we, ke, kp, vw, clk, ps)；

Parameter size=32；

input[size-1∶0]ns； //next state

input we； //write enable

input ke； //Kill E state

input kp； //Kill Pipeline

input vw； //Valid W state

input clk； //clock

output [size-1∶0] ps；//presentstate

wire [size-1∶0] se； //state at E stage

wire[size-1∶0]sm； //state at M stage

wire[size-1∶0]sw； //state at W stage

wire[size-1∶0]sx； //state at X stage

wire ee； // write enable for EM register

wire ew； // write enable for WX register

Assign se=kp？Sx: ns；

Assign ee=kp l we &～ke；

Assign ew=vw &～kp；

Assign ps=sm；

Tie_enflop # (size) state_EM (.tie_out (sm) .tie_in (se) .en (ee),

.clk(clk))；

Tie_flop # (size) state_MW (.tie_out (sw) .tie_in (sm) .clk (clk))；

Tie_enflop # (size) state_WX (.tie_out (sx) .tie_in (sw) .en (ew),

.clk(clk))；

endmodule

Module bs (ars, ACC_ps, SWAP_ps, ACC_ns, ACC_we, BYTESWAP)；

input [31∶0] ars；

input [31∶0] ACC_ps；

input [0∶0] SWAP_ps；

output [31∶0] ACC_ns；

output ACC_we；

input BYTESWAP；

wire [31∶0] ars_swap；

Assign ars_swap={ars [7: 0], ars [15: 8], ars [23: 16], ars [31: 24] }；

Assign ACC_ns=(ACC_ps)+((SWAP_ps)？(ars_swap): (ars))；

Assign ACC_we=1 ' b1 & BYTESWAP；

endmodule

Module rur (arr, st, ACC_ps, SWAP_ps, RUR)；

output [31∶0] arr；

input [31∶0] st；

input [31∶0] ACC_ps；

input [0∶0] SWAP_ps；

input RUR；

Assign arr=((st)==(8 ' d0))？(ACC_ps): (((st)==(8 ' d1))？

(SWAP_ps): (32 ' b0))；

endmodule

Module wur (art, sr, ACC_ps, SWAP_ps, ACC_ns, ACC_we, SWAP_ns, SWAP_we, WUR)；

input [31∶0] art；

input [31∶0] sr；

input [31∶0] ACC_ps；

input [0∶0] SWAP_ps；

output [31∶0] ACC_ns；

output ACC_we；

output [0∶0] SWAP_ns；

output SWAP_we；

input WUR；

wire ureg_sel_0；

Assign ureg_sel_O=(sr)==(8 ' h0)；

wire ureg_sel_1；

Assign ureg_sel_1=(sr)==(8 ' h1)；

Assign ACC_ns={ (ureg_sel_0)？(art [31: 0]): (ACC_ps [31: 0]) }；

Assign SWAP_ns={ (ureg_sel_1)？(art [0: 0]): (SWAP_ps [0: 0]) }；

Assign ACC_we=1 ' b1 & WUR；

Assign SWAP_we=1 ' b1 & WUR；

endmodule

Module UserInstModule (clk, out_E, ars_E, art_E, inst_R, Kill_E,

KillPipe_W, valid_W, BYTESWAP_R, RUR_R, WUR_R, en_R)；

input clk；

output [31∶0] out_E；

input [31∶0] ars_E；

input [31∶0] art_E；

input [23∶0] inst_R；

input en_R；

Input Kill_E, killPipe_W, valid_W；

input BYTESWAP_R；

input RUR_R；

input WUR_R；

wire BYTESWAP_E；

wire RUR_E；

wire WUR_E；

wire [31∶0]arr_E；

Wire [31: 0] sr_R, sr_E；

Wire [31: 0] st_R, st_E；

Wire [31: 0] ACC_ps, ACC_ns；

wire ACC_we；

Wire [0: 0] SWAP_ps, SWAP_ns；

wire SWAP_we；

wire [31∶0]bs_ACC_ns；

wire bs_ACC_we；

wire bs_select；

wire [31∶0]rur_arr；

wire rur_select；

wire [31∶0]wur_ACC_ns；

wire wur_ACC_we；

wire [0∶0]wur_SWAP_ns；

wire wur_SWAP_we；

wire wur_select；

Tie_enflop# (1) fBYTESWAP (.tie_out (BYTESWAP_E) .tie_in (BYTESWAP_R),

.en (en_R) .clk (clk))；

Tie_enflop# (1) fRUR (.tie_out (RUR_E) .tie_in (RUR_R) .en (en_R),

.clk(clk))；

Tie_enflop# (1) fWUR (.tie_out (WUR_E) .tie_in (WUR_R) .en (en_R),

.clk(clk))；

Assign sr_R={{inst_R [11: 8] }, { inst_R [15: 12] } }；

Tie_enflop# (32) fsr (.tie_out (sr_E) .tie_in (sr_R) .en (en_R),

.clk(clk))；

Assign st_R={{inst_R [11: 8] }, { inst_R [7: 4] } }；

Tie_enflop# (32) fst (.tie_out (st_E) .tie_in (st_R) .en (en_R),

.clk(clk))；

bs ibs(

.ars (ars_E),

.ACC_ps (ACC_ps),

.SWAP_ps (SWAP_ps),

.ACC_ns (bs_ACC_ns),

.ACC_we (bs_ACC_we),

.BYTESWAP(BYTESWAP_E))；

rur irur(

.arr (rur_arr),

.st (st_E),

.ACC_ps (ACC_ps),

.SWAP_ps (SWAP_ps),

.RUR(RUR_E))；

wur iwur(

.art (art_E),

.sr (sr_E),

.ACC_ps (ACC_ps),

.SWAP_ps (SWAP_ps),

.ACC_ns (wur_ACC_ns),

.ACC_we (wur_ACC_we),

.SWAP_ns (wur_SWAP_ns),

.SWAP_we (wur_SWAP_we),

.WUR(WUR_E))；

tie_athens_state#(32)iACC(

.ns (ACC_ns),

.we (ACC_we),

.ke (Kill_E),

.kp (killPipe_W),

.vw (valid_W),

.clk (clk),

.ps(ACC_ps))；

tie_athens_state#(1)iSWAP(

.ns (SWAP_ns),

.we (SWAP_we),

.ke (Kill_E),

.kp (killPipe_W),

.vw (valid_W),

.clk (clk),

.ps(SWAP_ps))；

Assign bs_select=BYTESWAP_E；

Assign rur_select=RUR_E；

Assign wur_select=WUR_E；

Assign arr_E={32{1 ' b0}} & { 32{bs_select}}

| rur_arr & {32{rur_select}}

| {32{1′b0}} & {32{wur_select}}；

Assign out_E=arr_E；

Assign ACC_ns=bs_ACC_ns&{32{bs_select}}

| {32{1′b0}} & {32{rur_select}}

| wur_ACC_ns & {32{wur_select}}；

Assign ACC_we=bs_ACC_we & bs_select

|1′b0 & rur_select

|wur_ACC_we & wur_select；

Assign SWAP_ns={1{1 ' b0}} & { 1{bs_select}}

| {1{1′b0}} & {1{rur_select}}

| wur_SWAP_ns & {1{wur_select}}；

Assign SWAP_we=1 ' b0 & bs_select

|1′b0 & rur_select

|wur_SWAP_we & wur_select；

endmodule

Adnexa M

You need to insert the information of necessity for this part

*/

/ * Set the search path to include the library directories*/

SYNOPSYS=get_unix_variable (" SYNOPSYS ")

Search_path=SYNOPSYS+/libraries/syn

/ * Set the path and name of target library*/

Search_path=<...>+search_path

Target_library=<nameofthelibrary>

/ * Constraint information*/

OPERATING_CONDITION=<name of the operating condition>

WIRE_LOAD=<name of the wire-load model>

BOUNDARY_LOAD=<library name>/<smallest inverter name>/<input pin

name>

DRIVE_CELL=<alargeFF name>

DRIVE_PIN=<Q pin name of the FF>

DRIVE_PIN_FROM=<clock pin name of the FF>

/ * target rocessor clock period*/

CLOCK_PERIOD=<target clock period>

*

You need not make any change below

*/

Link_library={ " * " }+target_library

Symbol_library=generic.sdb

/ * prepare workdir for hdl compiler*/

Hdlin_auto_save_templates=" TRUE "

define_design_lib WORK-path workdir

sh mkdir -p workdir

read -f verilog./prim.v

read -f erilog./ROOT.v

current_design UserInstModule

link

set_operating_conditions OPERATING_CONDITION

set_wire_load WIRE_LOAD

create_clock clk-period CLOCK_PERIOD

set_dont_touch_network clk

Set_load{2*load_of (BOUNDARY_LOAD) } all_outputs ()

Set_load{2*load_of (BOUNDARY_LOAD) } all_inputs ()

set_driving_cell-cellDRIVE_CELL-pin DRIVE_PIN-from_pin

DRIVE_PIN_FROM all_inputs()

Set_max_delay 0.5*CLOCK_PERIOD-from all_inputs ()-to find (clock,

clk)

Set_max_delay 0.5*CLOCK_PERIOD-from find (clock, clk)-to

all_outputs()

Set_max_delay 0.5*CLOCK_PERIOD-from all_inputs ()-toall_outputs ()

set_drive-rise O clk

set_drive-fall O clk

compile-ungroup_all

report_timing

report_constraint-all_viol

report_area

Claims

1., for designing a system for configurable processor, this system includes:

For producing the device of the description of the hardware embodiments of processor based on configuration instruction, wherein said configuration instruction bag Include: for determining whether some feature is included binary system within a processor and selects part and for certain of given processor The parameter of the parameter of a little predetermined characteristic selects part；And

For producing the device of the special SDK of this hardware embodiments based on configuration instruction,

Wherein, configuration instruction includes at least one extension explanation of the expansible characteristic of processor, and this extension explanation appointment is included in Article one, user-defined instruction and a kind of embodiment for this instruction.

System the most according to claim 1, wherein, described SDK runs on this processor for generation Code.

System the most according to claim 1, wherein, SDK includes one section of compiler, and it is suitable for configuration Illustrate, for application being compiled as the code that can be performed by processor.

System the most according to claim 1, wherein, SDK includes a paragraph assembly program, and it is adapted to configuration Illustrate, for application is collected as the code that can be performed by processor.

System the most according to claim 1, wherein, SDK includes a segment linker, and it is adapted to configuration Illustrate, for connecting the code that can be performed by processor.

System the most according to claim 1, wherein, SDK includes one section of decompiler, and it is adapted to join Put explanation, for the code that can be performed by processor is carried out dis-assembling.

System the most according to claim 1, wherein, SDK includes one section of debugging routine, and it is adapted to configuration Illustrate, for the code that can be performed by processor is debugged.

System the most according to claim 7, wherein, debugging routine has common interface and configuration, for instruction-set simulation Program and hardware embodiments.

System the most according to claim 1, wherein, SDK includes one section of instruction-set simulation program, and it adapts to In configuration instruction, for the code that can be performed by processor is simulated.

System the most according to claim 9, wherein, instruction-set simulation program can simulate the execution of the code being modeled, In order to measure the one or more performance specifications within the execution cycle.

11. systems according to claim 10, wherein, performance specification spy based on specific configurable microarchitecture Levy.

12. systems according to claim 10, wherein, instruction-set simulation program can configure holding of the program that is modeled OK, add up with the configuration of record standard, be included in each function being modeled performed periodicity.

13. systems according to claim 1, wherein, hardware embodiments describes at least one including in the following: Detailed HDL hardware embodiments describes；Synthesis manuscript；Place and route manuscript；PLD manuscript；Testboard； Diagnostic test for checking；The manuscript of operational diagnostics test on one section of simulation program；And testing tool.

14. systems according to claim 1, wherein, the device described for producing hardware embodiments includes:

For producing the device of the hardware description language description that hardware embodiments describes from configuration instruction；

Describe based on hardware description language, for synthesizing the device of the logic for hardware embodiments；And

Based on the logic synthesized, for each element is laid out and connects up being formed on chip the device of circuit.

15. systems according to claim 14, the device described for producing hardware embodiments also includes:

For verifying the device of the timing of circuit；And

For determining the device of the area of circuit, cycle time and power consumption.

16. systems according to claim 1, also include the device for producing configuration instruction.

17. systems according to claim 16, wherein, for producing the configuration ginseng that user is made by the device of configuration instruction The selection of number responds.

18. systems according to claim 16, wherein, for producing the device of configuration instruction for producing based on processor The explanation of design object.

19. systems according to claim 1, wherein, configuration instruction includes at least the one of the revisable characteristic of processor Item parameter declaration.

20. systems according to claim 19, wherein, at least one parameter declaration is specified and is included functional unit, Yi Jizhi in A few processor instruction running this functional unit.

21. systems according to claim 19, wherein, at least one parameter declaration specifies the one affecting processor state The including in of structure, get rid of and one in feature.

22. systems according to claim 21, wherein, described structure is that register file and parameter declaration are specified at this The number of depositor in register file.

23. systems according to claim 21, wherein, described structure is instruction cache.

24. systems according to claim 21, wherein, described structure is data caching.

25. systems according to claim 21, wherein, described structure is write buffering memory.

26. systems according to claim 21, wherein, described structure is in the ROM on chip and the RAM on chip One.

27. systems according to claim 19, wherein, at least one parameter declaration specifies a kind of feature of semanteme, and it controls Data and at least one explanation in instruction within a processor.

28. systems according to claim 19, wherein, at least one parameter declaration specifies one to perform characteristic, and it controls The execution of instruction within a processor.

29. systems according to claim 19, wherein, the debugging characteristic of at least one parameter declaration given processor.

30. systems according to claim 19, wherein, configuration instruction includes a parameter declaration, and it is specified from predetermined spy Levy, the imparting of the size of processor elements or number and numerical value at least selects one of which.

31. systems according to claim 1, also include the device of the suitability for assessing configuration instruction.

32. systems according to claim 31, wherein, the device for assessment includes interactive assessment instrument.

33. systems according to claim 31, wherein, the device for assessment is used for assessing being described by configuration instruction The ardware feature of reason device.

34. systems according to claim 31, wherein, the device for assessment is used for the property assessed according to processor Energy characteristic assesses the suitability of configuration instruction.

35. systems according to claim 34, also include the device for providing information, and it is special according to the performance assessed Property carries out the amendment of configuration instruction.

36. systems according to claim 34, wherein, needed for Performance Characteristics includes realizing this processor on one chip Area, power that processor is consumed and processor clock speed at least one.

37. systems according to claim 31, wherein, for assessment device be used for according to processor assessed soft Part characteristic assesses the suitability of configuration instruction.

38. according to the system described in claim 37, and wherein, the device for assessment passes through by described by configuration instruction Performing a set of benchmark on reason device, therefrom to required code size and periodicity, at least one makes assessment, Thus interactively provide a user with suitability assessment.

39. systems according to claim 31, wherein, are used for the device of assessment to by the processor described by configuration instruction Every ardware feature and every software feature make assessment.

40. systems according to claim 1, wherein, for producing the device of the description of the hardware embodiments of processor Performance and the cost behavior of hardware are provided the most simultaneously, and for producing the device of SDK together with for producing process The device that device hardware embodiments describes is used for producing software application performance information, in order to modify configuration instruction.

41. systems according to claim 1, wherein, for producing the device of the description of the hardware embodiments of processor The performance of hardware and the characteristic of cost are provided the most simultaneously, and for producing the device of SDK together with for producing place The device that reason device hardware embodiments describes is used for producing software application performance information, in order to be extended configuration instruction.

42. systems according to claim 1, wherein, the device of the hardware description for producing processor provides the most simultaneously The performance of hardware and the characteristic of cost, and for producing the device of SDK together with real for producing processor hardware The device that scheme of executing describes is for producing software application performance information, in order to the description of configuration instruction；And be used for producing place The device of the hardware description of reason device provides the performance of hardware and the characteristic of cost, and for producing the device of SDK Together with the device for producing the description of processor hardware embodiment for producing software application performance information, in order to configuration is said Bright extension is described.

43. systems according to claim 1, also include that the basic configuration by extensible processor generates the one of processor Plant the device of configuration.

44. systems according to claim 1, wherein, additional instruction is specified in extension explanation.

45. systems according to claim 1, wherein, include advising to user for producing the device of SDK The possible user being suitable at least one application defines the device of instruction.

46. systems according to claim 1, wherein, SDK includes one section of compiler, fixed to produce user Justice instruction.

47. systems according to claim 46, wherein, described compiler can optimize the generation defining instruction containing user Code.

48. systems according to claim 1, wherein, SDK include every at least one: use can be produced The assembly program of family definition instruction；The simulation program of execution using personal code work that user defines instruction can be simulated；And It is able to verify that user defines the instrument of user's embodiment of instruction.

49. systems according to claim 46, wherein, compiler can automatically produce additional instruction.

50. systems according to claim 1, wherein:

A kind of new feature is specified in extension explanation, and this feature has the function designed by user with abstract form；And

New feature is also redefined by the device described for producing hardware embodiments, and is integrated into detailed Among hardware embodiments describes.

51. systems according to claim 50, wherein, extension explanation is the statement in instruction set architecture language, It is used to specify a kind of operation code assignment and a kind of instruction semantic.

52. systems according to claim 51, wherein, the device described for producing hardware embodiments includes from instruction Architecture language definition produces the device of instruction decoding logic.

53. systems according to claim 52, wherein, for produce hardware embodiments describe device also include based on Instruction set architecture language definition, for producing the signal specifying register operand purposes for instruction interlocking and hang-up logic Device.

54. systems according to claim 50, wherein, include referring to for generation for producing the device of SDK Making the device of coding/decoding method, above-mentioned coding/decoding method is for being adapted among the instruction-set simulation program of configuration instruction.

55. systems according to claim 50, wherein, include for producing volume for producing the device of SDK The device of code table, above-mentioned coding schedule for be adapted to configuration instruction, produce processor object code a paragraph assembly program it In.

56. systems according to claim 50, wherein, the device described for producing hardware embodiments is additionally operable to as newly Feature generate the hardware description of data path, the specific streamline system knot of the hardware of above-mentioned data path and this processor Structure is consistent.

57. systems according to claim 44, wherein, extra-instruction does not increases new state to processor.

58. systems according to claim 44, wherein, extra-instruction increases state to processor.

59. systems according to claim 1, wherein, configuration instruction includes that being described language by instruction set architecture describes Specified is at least some of.

60. systems according to claim 59, wherein, the device described for producing hardware embodiments includes from instruction Architecture language automatically produces the device of instruction decoding logic in describing.

61. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK Architecture language automatically produces the device of a paragraph assembly program kernel in describing.

62. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK Architecture language automatically produces the device of one section of compiler in describing.

63. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK Architecture language automatically produces the device of one section of disassembler in describing.

64. systems according to claim 59, wherein, include from instruction collective for producing the device of SDK Architecture language automatically produces the device of one section of instruction-set simulation program in describing.

65. systems according to claim 1, wherein, the device described for producing hardware embodiments includes hardware Embodiment describes and in the device of SDK, an at least one of part carries out pretreatment, in order to according to configuration The device described hardware embodiments respectively and software tool is modified is described.

66. systems according to claim 65, wherein, for the device of pretreatment according to configuration instruction to hardware embodiment party Case describes and one of them a expression formula of SDK is estimated, and replaces this expression with a numerical value Formula.

67. systems according to claim 66, wherein, this expression formula includes iteration structure, construction of condition and data base In inquiry at least one.

68. systems according to claim 1, wherein, configuration instruction includes at least one parameter declaration, in order to designated treatment The characteristic revised of device.

69. systems according to claim 68, wherein, can revise characteristic is the amendment that core is described, and in core One in the optional feature do not specified in explanation.

70. systems according to claim 1, wherein, configuration instruction includes the binary of at least one given processor The parameter declaration of selectable properties, the processor characteristic that at least one available parameter is specified.

71. 1 kinds are used for the method designing configurable processor, and the method includes:

According to configuration instruction, the hardware embodiments producing processor describes, and wherein said configuration instruction includes: be used for determining certain Whether a little features are included binary system within a processor and select part and some predetermined characteristic for given processor The parameter of parameter selects part；And

According to configuration instruction, produce the SDK being exclusively used in this hardware embodiments,

72. 1 kinds of systems being used for designing configurable processor, this system includes:

For producing the device of the configuration instruction containing user's definable part, user's definable part of configuration instruction includes:

About the explanation of user-defined processor state, and

At least one user defines instruction and relevant user-defined function, and this function includes from user-defined processor state Reading and at least one in the write of user-defined processor state；And

For producing the device that the hardware embodiments of processor describes based on configuration instruction, wherein the hardware of processor is implemented Scheme includes defining instruction execution unit for the user performing user-defined instruction.

73. according to the system described in claim 72, and wherein, the hardware embodiments of processor describes and includes for performing at least one Bar user defines instruction and for realizing the description of the control logic needed for user-defined processor state.

74. according to the system described in claim 73, wherein:

The hardware embodiments of processor describes the streamline that an instruction performs；And

Control logic to include with each several part that every one-level of the streamline of instruction execution is relevant.

75. according to the system described in claim 74, wherein:

Hardware embodiments describes the description including the circuit for suspended market order execution；And

Control logic to include for preventing by the circuit of the instruction modification user's definition status stopped.

76. according to the system described in claim 75, wherein, controls logic and includes defining instruction at least one user, use In performing, instruction sends, operand bypasses and operand writes the circuit of at least one operation in the middle of enable.

77. according to the system described in claim 74, and wherein, hardware embodiments description is included in the streamline of instruction execution For realizing the depositor of user's definition status in many levels.

78. according to the system described in claim 74, wherein:

Hardware embodiments describes and includes such status register, and they produce each output function number wherein being different from The pipeline stages of pipeline stages is written into；

Hardware embodiments describes to specify and walks around such write and enter follow-up instruction, and these instructions are written to shape in confirmation Before state depositor, quote the state of user's definition processor.

79. according to the system described in claim 72, wherein:

Configuration instruction includes a predetermined portions beyond user's definitional part；And

The predetermined portions illustrated includes an instruction being easy to user's definition status is stored in memorizer, and one is easy to from depositing Reservoir takes out the instruction of user's definition status.

80. according to the system described in claim 79, also includes the device producing software, be used for using described in be easy to user fixed Justice state is stored in instruction contexts switching user's definition status of memorizer.

81. according to the system described in claim 72, also includes producing at least one device following:

One paragraph assembly program, collects for user-defined processor state and at least one user are defined instruction；

One section of compiler, is compiled for user-defined processor state and at least one user are defined instruction；

One section of simulation program, is simulated for user-defined processor state and at least one user define instruction； And

One section of debugging routine, debugs for user-defined processor state and at least one user are defined instruction.

82. according to the system described in claim 72, also includes producing a paragraph assembly program, for user-defined processor State and at least one user define instruction and collect；One section of compiler, for user-defined processor state And at least one user define instruction and be compiled；One section of simulation program, for user-defined processor state and At least one user defines instruction and is simulated；And one section of debugging routine, for user-defined processor state and At least one user defines the device that instruction carries out debugging.

83. according to the system described in claim 72, and wherein, user's definitional part of explanation includes specifying user's definition status At least one statement of size and index.

84. systems described in 3 according to Claim 8, wherein, user's definitional part of explanation includes depositing with at a processor User's definition status in device and specify relevant at least one attribute of the encapsulation of user's definition status.

85. according to the system described in claim 72, wherein, user's definitional part of explanation include specify user's definition status with At least one statement of the mapping relations of processor depositor.

86. according to the system described in claim 72, and wherein, the device described for producing hardware embodiments includes user Definition status is automatically mapped to the device of the depositor of processor.

87. according to the system described in claim 72, and wherein, user's definitional part of explanation includes illustrating that a class user is fixed Justice instruction and at least one statement of the impact on user's definition status thereof.

88. according to the system described in claim 72, and wherein, user's definitional part of explanation includes in order to user's definition status Give at least one assignment statement of a numerical value.

89. 1 kinds of systems being used for designing configurable processor, this system includes:

The kernel software instrument of the SDK being exclusively used in this explanation is produced for illustrating according to instruction set architecture； And

User-defined instruction module, for according to user-defined instruction, produces at least one module, and this module is for core Use during user defines instruction implemented by heart software tool, and wherein the hardware embodiments of configurable processor includes using Instruction execution unit is defined in the user performing user-defined instruction.

90. systems described in 9 according to Claim 8, wherein, kernel software instrument includes producing the generation run on a processor The software tool of code.

91. systems described in 9 according to Claim 8, wherein, at least one module is implemented as dynamic link library.

92. systems described in 9 according to Claim 8, wherein, at least one module is implemented as a table.

93. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of compiler, and it uses user The instruction module of definition, for being compiled as generation that is that use user-defined instruction and that can be executed by processor by application Code.

94. according to the system described in claim 93, and wherein, at least one module includes by compiler for defining user The module that is compiled of instruction.

95. systems described in 9 according to Claim 8, wherein, kernel software instrument includes a paragraph assembly program, and it uses user The module of definition, for collecting application as code that is that use user-defined instruction and that can be executed by processor.

96. according to the system described in claim 95, and wherein, at least one module includes by assembly program for by assembler language Command mappings is the module of user-defined instruction.

97. according to the system described in claim 96, wherein:

This system also includes that kernel instruction set illustrates, in order to the instruction that non-user defines to be described；And

Kernel instruction set illustrates, is used for application compilation being the code that can be executed by processor by assembly program.

98. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of instruction-set simulation program, is used for The code that simulation can be executed by processor.

99. according to the system described in claim 98, and wherein, at least one module includes that one is modeled program for every User defines the simulation program module performing to be simulated of instruction.

100. according to the system described in claim 99, and wherein, the module that the program that is modeled uses includes for defining user The data that are decoded of instruction.

101. according to the system described in claim 100, wherein, when instruction can not be decoded as predefined instruction, and this mould Plan program uses a module, and the instruction to using this simulation program module is decoded.

102. systems described in 9 according to Claim 8, wherein, kernel software instrument includes one section of debugging routine, and it uses user Code that is that use user-defined instruction and that can be executed by processor is debugged by the module of definition.

103. according to the system described in claim 102, and wherein, at least one module includes that a debugged program is for by machine Device instruction decoding is the module of assembly instruction.

104. according to the system described in claim 102, and wherein, at least one module includes that a debugged program will be for converging Compile instruction and be converted to the module of character string.

105. according to the system described in claim 102, wherein:

Kernel software instrument includes one section of instruction-set simulation program, the code that can be performed by processor for simulation；And

Debugging routine is for communicating with simulation program, in order to obtain the information about user's definition status for debugging.

106. systems described in 9 according to Claim 8, wherein, according to different kernel instruction set explanations, a single user Definition instruction can be used by multiple kernel software instrument without modification.

107. 1 kinds of systems being used for designing configurable processor, this system includes:

The kernel software work of the SDK being exclusively used in this explanation is produced for explanation based on instruction set architecture Tool；

For producing the user-defined instruction module of the group of at least one module based on user-defined instruction, its quilt Kernel software instrument is used for realizing every user-defined instruction, and wherein the hardware embodiments of processor includes for performing use The user of the instruction of family definition defines instruction execution unit；And

Storage device, stores the group that instruction module defined by the user produces for simultaneously, and each of which group both corresponds to use One different set of family definition instruction.

108. according to the system described in claim 107, and wherein, at least one module is implemented as dynamic link library.

109. according to the system described in claim 107, and wherein, at least one module is implemented as a table.

110. according to the system described in claim 107, and wherein, kernel software instrument includes one section of compiler, and it uses use Family definition instruction module, for by compiling of application be use user-defined instruction and can be held by processor The code of row.

111. according to the system described in claim 110, and wherein, at least one module includes by compiler for fixed to user The module that the instruction of justice is compiled.

112. according to the system described in claim 107, and wherein, kernel software instrument includes a paragraph assembly program, and it uses use The instruction module of family definition, for by application compilation for that use user-defined instruction and can be executed by processor Code.

113. according to the system described in claim 112, and wherein, at least one module includes by assembly program for the language that will collect Speech command mappings is the module of user-defined instruction.

114. according to the system described in claim 107, and wherein, kernel software instrument includes one section of instruction-set simulation program, uses In the code that simulation can be executed by processor.

115. according to the system described in claim 114, wherein, at least one module include one be modeled program for The module that the implementation status of family definition instruction is simulated.

116. according to the system described in claim 115, and wherein, the module that the program that is modeled uses includes for defining user The data that instruction is decoded.

117. according to the system described in claim 116, wherein, when instruction can not be decoded as predefined instruction, and this mould Plan program uses a module, and the instruction to using this simulation program module is decoded.

118. according to the system described in claim 107, and wherein, kernel software instrument includes one section of debugging routine, and it uses use The module of family definition, debugs code that is that use user-defined instruction and that can be executed by processor.

119. according to the system described in claim 118, and wherein, at least one module includes that debugged program is for referring to machine Order is decoded as the module of assembly instruction.

120. according to the system described in claim 118, and wherein, at least one module includes that debugged program is for referring to compilation Order is converted to the module of character string.

121. 1 kinds of systems being used for designing configurable processor, this system includes:

Explanation based on instruction set architecture, for producing the soft-hearted part of polykaryon of the SDK being exclusively used in this explanation Instrument；

Illustrating based on user-defined instruction set, for producing the user-defined instruction module of at least one module, it is by one Group kernel software instrument is used for realizing user-defined instruction, and wherein the hardware embodiments of processor includes for performing user The user of the instruction of definition defines instruction execution unit.

122. according to the system described in claim 121, and wherein, at least one module is implemented as dynamic link library.

123. according to the system described in claim 121, and wherein, at least one module is implemented as a table.

124. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of compiler, It uses user-defined instruction module, for that application is compiled as using user-defined instruction and can be processed The code that device performs.

125. according to the system described in claim 124, and wherein, at least one module includes by compiler for fixed to user The module that the instruction of justice is compiled.

126. according to the system described in claim 121, and wherein, least one set kernel software instrument includes a paragraph assembly program, It uses user-defined instruction module, for will apply that compilation is the user-defined instruction of use and can be processed The code that device performs.

127. according to the system described in claim 126, and wherein, at least one module includes by assembly program for the language that will collect Speech command mappings is the module that user defines instruction.

128. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of instruction-set simulation Program, the code can being executed by processor for simulation.

129. according to the system described in claim 128, wherein, at least one module include one be modeled program for The module performing to be simulated of family definition instruction.

130. according to the system described in claim 129, and wherein, the module that the program that is modeled uses includes for defining user The data that instruction is decoded.

131. according to the system described in claim 130, wherein, when instruction can not be decoded as predefined instruction, and this mould Plan program uses a module, and the instruction to using this simulation program module is decoded.

132. according to the system described in claim 121, and wherein, least one set kernel software instrument includes one section of debugging routine, It uses user-defined module, adjusts code that is that use user-defined instruction and that can be executed by processor Examination.

133. according to the system described in claim 132, and wherein, at least one module includes

One debugged program for being decoded as the module of assembly instruction by machine instruction.

134. according to the system described in claim 132, and wherein, at least one module includes that a debugged program will be for converging Compile instruction and be converted to the module of character string.