Present patent application requires following application with right of priority: the U.S. Provisional Patent Application that applies on May 13rd, 1999, application number is U.S.Serial No.60/134,253, title is " Method And Apparatus For Synthesizing And Implementing IntergratedCircuit Designs ", with on the October 14th, 1999 that applies for of common pending trial, application number is U.S.Serial No.09/418,663, title is the patented claim of " Method And Apparatus For ManagingThe Configuration And Functionality Of A Semiconductor Design ", it is U.S.Serial No.60/102 that this application requires to apply on October 14th, 1998 application number, 271, and have same title the right of priority of U.S. Provisional Patent Application.
2. description of Related Art
RISC (or claiming Reduced Instruction Set Computer) processor is widely known by the people in computing technique.Risc processor has usually that utilization significantly simplifies---than non-RISC (being commonly referred to " CISC ") processor---and the essential characteristic of instruction set.Usually, the risc processor machine instruction is not whole microcodingizations, but can directly carry out, need not decoding, thereby tangible high-level efficiency is provided on travelling speed.And the further simplification (comparing) of permission again in the design of processor of this " fairshaped " instruction process ability, thereby littler silicon chip and lower manufacturing cost are provided with non-RISC device.
Risc processor also has such feature usually: (i) pack into/the storing memory architecture (that is only pack into and save command has inlet to storer, other instruction is then operated via the internal register in the processor); (ii) processor and program compiler combination; (iii) pipelining.
In case pipelining is that a kind of by the sequence of operation in the processor is divided into discrete parts---might promptly be carried out simultaneously efficiently by these discrete parts---is to improve the technology of processor performance.In the processor of typical pipelining, corresponding to the arithmetic element of processor arithmetical operation (as ADD, MULTIPLY, DIVIDE or the like) usually by " segmentation ", thereby a clock in office is in the cycle, and a specific part that can make operation is finished to certain portions of this unit.Fig. 1 represents an exemplary processor architecture with arithmetic element of such segmentation.So these unit can carry out computing to different result of calculation in arbitrary given clock period.For example in first clock period, two number A and B are fed to multiplication unit 10 and are partly handled by the first 12 of this unit.In the second clock cycle, came from the partial results that A and B multiply each other and be sent to second portion 14, and first 12 received two new numbers (such as C and D) this moment, began to handle.Net result is after the initial start-up period, and each clock period is all finished multiplication operation by arithmetic element 10.
Pipeline depth can be different according to architecture.Here in the context, " degree of depth " speech means the number that comes across the separate stage in the streamline.Generally speaking, the stage of a streamline, the more working procedure was faster, though also can be difficult to further programme---and the streamline effect is apparent concerning the programmer.The processor of most of pipelinings is three stages (instruction fetch, decoding, carry out) or quadravalence section (fetch operand is carried out or also be can be instruction fetch for for example instruction fetch, decoding, decoding/fetch operand, execution writes back), but also uses more or less stage.
Although aforementioned " discrete method " of computing in the processor arranged, the instruction in the streamline of original technique processor generally is continuous.Specifically, the instruction in stage generally with minimum empty slot, NOP sign indicating number and so on directly thereupon with the instruction in the later phases.And the instruction in a later phases gets clogged (for example the information from an extract operation is waited in the instruction of in the operation phase), and then preceding slightly in this streamline, the stage after a while also all gets clogged.In this way, streamline is easy to move in " lock-step " mode basically.
When the instruction set of a pipelining processor of exploitation, must consider some " risks ".For example, so-called " structure " or " contention for resource " the same resource of overlap instruction contention (for example bus, register, or other functional unit) of taking a risk to result from, these take a risk to solve with one or more streamline obstructions usually.So-called " data " streamline takes a risk to betide the situation of read/write collision, and this conflict can change the order of storer or register access." control " takes a risk then generally to be produced by transfer in the program flow or similar variation.
The pipelining processor need be interlocked usually and be taken a risk to solve many these classes.For example, consider this situation: one slightly before a successor instruction (n+1) in the flow line stage need come from the result of the instruction n of a later phases.A simple answer to the problems referred to above is to calculate with the operand that the one or more clock period postpones to be in the decode phase.Yet a result of this delay, the execution time that is a given instruction of processor is partly determined by the instruction around this instruction in the streamline.This just makes the code optimization of processor complicated, because make the programmer determine that in coding the interlocking situation is normally difficult.
Can in processor, realize interlocking with " scoreboard "; In this method, for additional one of each processor register, in order to designator as content of registers; Specifically, indicate whether that (i) content of registers has been updated and therefore can use that perhaps (ii) content is carrying out being write such change by other register.This scoreboard also is used for generating interlocking in some architecture, and this interlocking prevents to carry out the instruction that will depend on the scoreboard content of registers, till scoreboard indicates this register to use.This method is called as " hardware " interlocking, and this is because the inspection by scoreboard purely of this interlocking, be called via the hardware in the processor.This interlocking generates " obstruction ", and they have hindered the execution (thereby having blocked streamline) that relies on instruction for data, till register can be used.
In addition, also NOP (blank operation operational code) can be inserted in the coding, to postpone corresponding flow line stage when needed.The shortcoming of complicacy that this a kind of method in back is called as " software " interlocking, has code length and the program of strengthening---having used the program that requires the interlocking instruction---.With regard to its coding structure, used the design of software interlock equally often can not fully optimize in a large number.
Another important consideration is program jump or " redirect " in the processor design.All processors are all supported certain type transfer instruction.In brief, shift the situation that program flow is interrupted or changes that refers to.Other operation---for example circulation is provided with and the subroutine call instruction---is also interrupted or reprogramming stream in a similar manner." jump delay slot " speech is illustrated in the streamline through being usually used in, and is in the transfer in the decoding---or claiming redirect---time slot afterwards.When etc. to be transferred/when load is finished, carry out and to shift (or packing into) instruction afterwards.That transfer can be is with good conditionsi true the or numerical value of one or more parameters (that is based on) or unconditional.Also can be absolute (for example based on absolute memory address), or relative (for example based on relative address and do not rely on arbitrary specific memory address).
Transfer can have profound influence to pipeline system.A transfer instruction be inserted into and by the instruction decode stage of processor decode (indicating this processor must begin to carry out other access) preceding, the next instruction word in this instruction sequence promptly has been removed and has inserted in this streamline.An answer for this problem is further extract operation of instruction word and time-out---or title blocks---that removing is taken out, and is performed until transfer instruction to finish, as shown in Figure 2.But this method is owing to carry out the needs of transfer instruction in some instruction cycles, and is equal to employed pipeline depth in the processor design usually.This result is harmful to for processor speed and efficient, because processor can not carry out other operation during this.
In addition, also can use the delay transfer method.In this method, when transfer instruction arrived decode phase, streamline was not removed, and generally is before this transfer is performed, and carried out the come across streamline subsequent instructions of last stage slightly.Therefore when transfer instruction was decoded, this transfer promptly showed to such an extent that be to postpone with the number of required instruction cycle of all subsequent instructions in the execution pipeline.Transfer method shifts with the above-mentioned multicycle and compares, and has improved pipeline efficiency, yet has also increased the complicacy (also having reduced programmer's understanding) of basic coding.
Based on above-mentioned, processor designer and programmer must be relative with noninterlocked architecture, carefully weigh corresponding to the compromise proposal of utilizing hardware or software interlock.And, must consider in instruction set the reciprocation of transfer instruction (and delay or multicycle transfer) with selected interlocking pattern.
To streamline and interlocking, need a kind of improving one's methods, this method had both been optimized the processor pipeline performance, and provided additional coding dirigibility for the programmer simultaneously.And, to add to advance in the processor design with the more pipeline stage (many multistage streamlines even), the benefit of improving track performance and code optimization in this processor can be multiplied.In addition, with certain ad hoc fashion, comprehensive these improved pipeline processors designs and the ability of using existing synthesis tool easily, also has obvious benefit for designer and programmer.
Detailed description of the Invention
Now accompanying drawing is numbered, all same numbering refers to same part in the accompanying drawing.
Mean any integrated circuit or other can finish the electron device of single job according at least one instruction word at this used " processor " speech, comprise that the ARC user that---but being not limited to---such as present assignee is produced disposes the such Reduced Instruction Set Computer of computing machine (RISC) processor, central processing unit (CPU), and digital signal processor (DSP).
In addition, those of ordinary skills will appreciate that " stage " speech used herein refers to each successive stages in the pipeline processor, and promptly the stage 1 refers to first-class last pipeline stages, and the stage 2 refers to second flow line stage, or the like.Although following discussion is carried out with regard to one three stage streamline (being instruction fetch, decoding and execute phase), yet it should be understood that the method and apparatus that announces in this place can be widely used in having one or one or more processor structure system greater or less than triphasic streamline is arranged.
Carry out with regard to VHSIC hardware description language (VHDL) although it will also be appreciated that following discussion, yet also can use other hardware description language as Verilog , equally successfully describe each embodiment of the present invention.And, although used an exemplary comprehensive engine of Synopsy ---as Design Compiler 1999.05 (DC99)---herein so that each embodiment that proposes is carried out comprehensively, but also can use other comprehensive engine---as can be from Cadence Design Systems, the Buildgares that Inc. buys." ieee standard 1076.3-1997 ", IEEE Standard VHDL Synthesis Packages has stipulated the language that a kind of industry is accepted, be used for regulation hardware definition language base design and integration capability---perhaps a those of ordinary skill in the art wishes and can be used this.
At last, will be appreciated that, the specific embodiment of---it is comprehensive that this logic uses above-mentioned comprehensive engine and VHSIC hardware description language to carry out by present assignee---although following description has illustrated logic, this class embodiment suffers restraints in some aspects, yet these embodiment only are exemplary and illustrative for design process of the present invention.
Pipeline segmentation (" tearing ")
Architecture of the present invention comprises a free-pouring substantially streamline.If a stage in this streamline gets clogged, if then also get clogged with the last stage---they comprise instruction.Although but, make later phases in the streamline (i.e. " downstream ")---do not applied interlocking in addition---and still have some advantages if continue with the last stage obstruction.These advantages comprise, and---except that other advantage---some instruction process that (i) is able to therefrom to continue in streamline has caused comparing with " obstruction " whole piece streamline, better handling property; (ii) handle the ability that the sign instruction is set that is positioned at the streamline later phases continuously, guaranteed therefrom that in redirect or transfer instruction---execution of these instructions can be indicated the influence of state---is provided with sign before carrying out, (iii) make the scoreboard load be able to send request in a later phases of streamline to storer, certain that depends on that this instruction of packing into then is held in streamline is the last stage slightly.This is packed into and must be allowed to send, otherwise promptly can cause deadlock state.
Draw attention to, corresponding to the continuous processing that the sign instruction is set, the applicant handles simultaneously with the application, title is in the U.S. Patent application of " method and the device that are used for the redirect control of pipelining processor ", a kind of method and device have been announced, be used for interlocking to the sign instruction is set with follow-up redirect/transfer instruction, these redirect/transfer instructions can be provided with the influence of the sign instruction sign that is provided with.
An example as said method, consider a processor with three stage streamlines (take out, decoding is carried out), one of them instruction was got clogged in the stage 2, but was allowed to " tear apart " and continue its stroke by all the other stages of streamline downwards from last stage slightly in the instruction in stage 3.Fig. 3 illustrates this principle (supposition does not apply interlocking).
Now, describe and use streamline of the present invention to tear notion to control the method for a multistage streamline referring to Fig. 4.First step 402 of method 400 comprises instruction set of generation, and this instruction set comprises a plurality of instruction words that will move on processor.This instruction set is stored on the chip that its type is widely known by the people in the art in the program storage device (as a program RAM or ROM storer) usually, though also can use the equipment of other type, comprises memory chip.The generation of instruction set self is widely known by the people equally in the art, just on scope it is improved, and has comprised that streamline tears function, can describe this improvement in more detail below.
Below in step 404, instruction set (program) is by---particularly---programmable counter (PC) takes out from memory device successively with specified order, and runs on the processor, and the instruction of being taken out obtains handling in each stage of streamline in proper order.Note that in the context of a risc processor only have to pack into/the addressable program's memory space of save command, therefore, in such processor, can use a plurality of distributors physically to receive and keep taking from the command information of program storage.A kind of like this packing into/storage system structure in processor and the use of register architecture are well-known in the art, so be not further described.
In step 406, take out by logical block in the congestion condition of streamline in stage, to determine whether to have taken place conflict, this conflict is normally in order to visit certain data value or other resource with signal combination for these logical blocks.A detection that example is this condition of this step: just a register that reads for certain order register is marked as " having gone up scoreboard ", and meaning processor must wait for, until this register till new value is upgraded by one.Another example is that certain state machine has generated blocking period when carrying out in multicycle operation (as a displacement and add take advantage of).
In step 408, the existence of effective instruction is checked in the streamline N+1 stage (N=has called the stage No. in the stage of blocking through step 406 here).Here in the context, one " effective instruction " refers to one not because any former thereby be marked as engineering noise (step 410) and formerly (N) stage has successfully been finished the instruction of processing (step 412).For example, in an embodiment corresponding to the applicant's ARC Core, " p3iv " signal (i.e. " stages 3 instruction effectively ") is used to promptly represent that the stage 3 of streamline comprises an effective instruction.Instruction in stage 3 comprises because some former thereby may be invalid:
When instruction during shift-in stage 2 this instruction be marked as invalid (p2iv=' 0 '), and therefore continued as 3 o'clock its shift-in stages invalid;
2. to tear logical tab a previous cycle by streamline be invalid to the instruction in the stage 3, but replaced by an instruction from the 3 shift-in stages 2 of stage subsequently.
Note that " stopping " condition that draws from step 410 comes from condition " invalid=as to be ", this is because only just can tear when effective instruction occurring simultaneously in stage 2 and stage 3.
Note this situation: the instruction that comes across the stage 2 is confirmed as can not finishing processing (above second) in step 412, and can finish processing in the instruction in stage 3, must allow the instruction in stage 3 break away from streamline (or moving on to next stage) and with the stage 3 be labeled as be in invalid, to fill up the interval of each step 414.Another kind method is that a NOP or other dummy instruction are injected the stage 3, and the stage 3 is labeled as effectively.Maybe this stage is labeled as invalidly if do not insert this blank, then instruction---this instruction can not be finished when handling in the stage 2, promptly in stages 3 processing---promptly can be carried out once more in the next instruction cycle, and this is undesirable.
Please further note, interlocking for the applicant ARC Core corresponding to " v6 " embodiment---this is described in detail in that the applicant handles simultaneously with the application, the U.S. Patent application of title for " method and the device that are used for the redirect control of pipelining processor ", if jump instruction and stage 3 comprise one the sign instruction is set, then the stage 2 of streamline promptly can block.Tear function to be used for v6 redirect interlocking so need streamline of the present invention.
At last, in step 418, the effective instruction that comes across the stage 3 (and a follow-up phase that has in 5 or the more multistage streamline) is according to next time cycle and be performed, and keeps simultaneously coming across the stage 2, blocking the instruction in this stage.Please note that according to the subsequent clock cycle processing to the instruction that gets clogged in the stage 2 can occur, this depends on the state of the obstruction/interlocking signal that causes obstruction.In case should lose efficacy by obstruction/interlocking signal, the processing of the instruction that then should get clogged in the stage promptly can begin in the forward position in next instruction cycle.
The exemplary code of below selecting from the application's appendix I is used for combining aforesaid to realize " tearing " function with the applicant ARC Core (three stage streamline variants):
n_p3iv<=ip3iv WHEN ien3=‘0’ ELSE ‘0’ WHEN ien2=‘0’ANDien3=‘1’ ELSE ip2iv;p3ivreg;PROCESS(ck,clr) BEGIN IF clr=‘1’THEN ip3iv<=‘0’; ELSIF(ck‘EVENT AND ck=‘1’)THEN ip3iv<=n_p3iv; END IF; END PROCESS;
But to recognize, be different from the coding mode that goes out mentioned herein that---no matter being used for same still other processor---can be used for also realizing that streamline of the present invention tears function.
Streamline assembles (" picking up ") again during obstruction
Tear outside the notion at above-mentioned streamline, the present invention also handles reverse situation with mechanism; Promptly when occurring empty slot or blank between each stage, allow the continuation processing of last stage slightly or " picking up " of streamline to arrive later phases, otherwise streamline is " being torn ".This function is also referred to as " pipeline conversion startup ".
As an example of above-mentioned notion, please consider the situation of aforesaid three stage streamlines, one of them instruction was got clogged in the stage 3, and stage 2 is empty or instruction/length of comprising cancellation word (later be referred to as " not using time slot ") immediately here.Use the function of picking up of the present invention, by make stage 1 instruction is continued to handle, until finishing---when finishing this instruction enter 2, one of stages newly instruct and enter the stage 1---and the permission stage 1 is picked up the stage 2 according to the clock edge.Use this processing, cancelled any empty slot or blank between stage that gets clogged 3 and stage 1.Fig. 5 illustrates this notion.
Referring to Fig. 6, the method for utilizing " picking up " technology of the present invention and controlling a multistage processor pipeline has been described.In first step 602 of this method 600, determine the validity of the instruction on certain phase one (stage 2 in institute's example).Pick up in the context at streamline, effective instruction is defined as simply when it and enters its current generation when (stage 2 in institute's example), is not marked as invalid instruction.If it is invalid through step 602 to instruct, then the pipeline conversion enabling signal promptly is placed in " very " through step 602, as following institute is discussed in detail.Described this pipeline conversion enabling signal steering order word enters the stage 2 from the stage 1 conversion.If the instruction in the stage 3 can not be finished processing, promptly can occur streamline in this incident and " pick up ".Invalid time slot in stage 2 promptly can be replaced by the instruction that moves ahead from the stage 1, and the instruction on the stage 3 promptly can remain in the stage 3.
If the instruction in the stage 2 is effectively through step 602, in the stage 2, finishes the ability of this effective instruction of processing and promptly determine in step 604 subsequently.If this effective instruction can not be finished processing and shift out the stage 2 at next cycle, conversion starting signal promptly is placed in " puppet " through step 606, thereby pipeline conversion was lost efficacy.This has just prevented that effective, pending instruction from being substituted (Fig. 1) by the instruction that moves ahead from previous stage.Secondly if this effective instruction in the stage 2 can be finished processing, promptly determine whether in the stage 2, to have one to interrupt pseudoinstruction and waiting for that a unsettled instruction fetch finishes processing in step 608.As like this really, then conversion starting signal promptly is changed to " puppet " once more, thereby this effective instruction that has hindered once more in the stage 2 is replaced, and this is because effectively (but not finishing) instruction can not advance to the stage 3 in following one-period.If this effective instruction in the stage 2 can be finished processing in following one-period, and do not wait for unsettled instruction fetch, then conversion starting signal promptly is changed to " very " through step 610, thereby permission stage 1 instruction proceeds to the stage 2---thereupon with the shift-in stage 3 while of the instruction in the stage 2.
So according to above-mentioned logic, when processor moved, the pipeline conversion enabling signal always was changed to " very ", unless work as: (i) effective instruction in the stage 2 is former thereby can not finish because of certain; Perhaps (ii) suppose in the stage 2, to have an interruption waiting for that a unsettled instruction fetch finishes.Note that then conversion starting signal promptly is changed to " very " and the instruction shift-in stage 2 in the permission stage 1 if an illegal command in the stage 2 is held (particularly because the secondary stricture on the stage 3).Therefore, this invalid stages 2 instruction will be replaced by this effective stage 1 instruction.
" picking up " of the present invention or pipeline conversion enabling signal (en1) can---in one embodiment---utilize the following exemplary code in this place (selecting from appendix II) and generate:
ien1<=‘0’WHEN en=‘0’ OR(p2int=‘1’AND ien2=‘0’) OR(p2int=‘1’AND ien2=‘0’) ELSE ‘1’;
Also please note, method can combine with other method of streamline control and interlocking (perhaps individually or jointly) is torn and picked up to streamline of the present invention, those methods have especially comprised to be handled with the application simultaneously the applicant, the method of being announced in the U.S. Patent application of title for " method and the device that are used for the redirect control of pipelining processor ", and handle simultaneously with the application the applicant, the method of being announced in the U.S. Patent application of title for " method and the device that are used for the jump delay slot control of pipelining processor ", this two application is submitted to therewith together, and the two includes into, draw fully at this and to be reference.In addition, various register coding modes---encode as " loose " register, that this coding is described in is that the applicant handles simultaneously with the application, title is the U.S. Patent application of " being used for loose register Methods for Coding and device in the pipelining processor ", this application is submitted to together therewith, and includes, draws fully at this and be reference---and can tear and/or pick up invention with streamline as described herein and be used in combination.
Integrated approach
Referring now to Fig. 7,, describes and tear and/or pick up function, logic is carried out comprehensive method 700 in conjunction with aforesaid streamline.The generalized method of this comprehensive integration circuit logic has a customization (i.e. " soft ") instruction set, being published in the applicant handles, is U.S.Patent ApplicationSerial No.09/418 on October 14th, 1999 submission, application number with the application, 663, title is the patented claim of " being used for the structure of managing semiconductor design and the method and the device of function ", here this patented claim is included fully, draws and be reference.
Though following description is with regard to running on algorithm on computing machine or other the similar treating apparatus or computer program and carry out, recognize that other hardware environment (comprises microcomputer, workstation, the computing machine of networking, " supercomputer ", and mainframe computer) also can be used for carrying out this method.In addition, if the part of computer program or more parts also may be implemented on the hardware or firmware with respect to software---be ready that this class alternative is fully in the skill of computer technician.
Beginning in first step 702, obtains user's input according to project organization.Specifically, be chosen as the module or the function of this design, and add, subtract with the need or generate design-related instruction by the user.For example, in signal processing applications, (MAC) instruction is normally useful to make CPU comprise one single " take advantage of and add up ".In the present invention, the instruction set of comprehensive Design is changed, make it in comprising aforesaid streamline tear and/or pick up pattern (or another comparable streamline control structure system).The technology bank position of each VHDL file is also stipulated in step 702 by the user.---logic function for instance---that the technology bank file stores all and for the relevant information in the necessary unit of overall treatment, comprises in the present invention.I/O timing, and all related constraints.In the present invention, each user can stipulate his library name and position, thereby has increased more flexibility.
Secondly in step 703, the user reaches the HDL functional block that has the function storehouse and create customization based on user's input of defined in the step 702.
In step 704, determine the design level aggregated(particle) structure based on user input and above-mentioned library file.Hierarchy file, new library file and program-described file sequentially generate based on this design level aggregated(particle) structure.Here " program-described file " speech is used in reference to UNIX program-described file function commonly used or is the similar functions of the known computer system of computer programming personnel.This program-described file function resides in the computer system other program or algorithm, is performed according to specified order.In addition, it also further is appointed as the name and the position of the necessary data file of the designated program of successful operation and other data.But the present invention who note that here to be announced can utilize the file structure that is different from " program-described file " type to produce required function.
Generate among the embodiment of process at program-described file of the present invention, interactively ask the user to import information about designing via display prompts, type (for example total equipment or system architecture) as " foundation ", the external memory system data bus, different expansion types, cache types/size, or the like.Can use many other input information structure and information sources, however still consistent with the present invention.
In step 706, the user operates in the program-described file of step 704 generation with generating structure HDL.Functional block discrete during this structure HDL will design combines, to make a complete design.
In step 708, operate in the manuscript that step 706 generates then, for simulated program generates a program-described file.The user also moves this manuscript to generate a comprehensive manuscript in step 708.
At this moment making decision in program, is comprehensively or this design (step 710) of emulation.As selecting emulation, the user carries out emulation with regard to using the design and the simulated program description document that generate in step 712.In addition, as select comprehensively, the user carries out comprehensively with regard to the design of using comprehensive manuscript and generated in step.After having finished comprehensive/emulation manuscript, whether appropriate in step 716 assessment design.For example, a comprehensive engine can generate a specific physical layout of this design, and it satisfies the performance condition of overall design process but does not satisfy the die size requirement.In the case, the designer promptly can change control documents, storehouse or other composition that can influence die size.The results set of this design information promptly is used to move once more comprehensive manuscript subsequently.
If the design that generates is acceptable, then this design process is promptly finished.If the design that generates is unacceptable, each step process of process that then starts from step 702 re-executes, until obtaining an acceptable design.By this way, but method 700 is an iteration.
Referring now to Fig. 8 a-8b,, an embodiment (comprising " p3iv " signal with reference to the VHDL of appendix I) of exemplary gate logic has been described, this gate logic has carried out comprehensively with Synopsy Design Compiler and the method for above-mentioned Fig. 7.Note, during the combined process that is used to generate Fig. 8 a logic is carried out, stipulated a LSI 10k 1.0 μ m, technology, and design is not imposed restriction.Fig. 8 b has been used same process; But on the path from len3 to the clock, retrained design.Appendix III has comprised the coding of the exemplary gate logic that is used to generate Fig. 8 a-8b.
Referring to Fig. 8 c-8d, an embodiment (comprising " ien1 " signal with reference to the VHDL of appendix II) of exemplary gate logic has been described, this gate logic has carried out comprehensively with the method for Fig. 7.Note, during the combined process that is used to generate Fig. 8 c logic is carried out, stipulated a LSI 10k1.0 μ m technology, and design has not been imposed restriction.Fig. 8 d has been used same process; But retrained design to prevent to use the AND-OR door.The appendix IV has comprised the coding of the exemplary gate logic that is used to generate Fig. 8 c-8d.
Fig. 9 has represented the processor of an exemplary pipelining, and this processor is with 1.0 μ m explained hereafter, and has comprised that described in front streamline tears and pick up pattern here.As shown in Figure 9, processor 900 is the CPU device of an ARC microprocessor class, and it especially has a processor core 902, on-chip memory 904, and an external interface 906.This device is produced with the VHDL design of customization, and this design obtains with method 900 of the present invention, comprehensively be a logic level expression formula with it subsequently, be reduced to a physical device that uses compiling, layout and production technology---these technology are widely known by the people---in semiconductor technology then.
One skilled in the art will realize that, the processor of Fig. 9 can comprise any common peripherals that gets, serial communication apparatus for example, parallel port, timer, counter, high current driver, modulus (A/D) converter, digital-to-analogue (D/A) converter, interrupt handler, lcd driver, storer and other similar device.In addition, processor also can comprise the circuit component of user's special use or application specific.The present invention is not limited to peripherals and other type that can use this method and install the circuit component that is made up, quantity, or complicacy.Otherwise any restriction that physical capability applied by existing semiconductor technology all can improve in time.Therefore can expect,, may use integrated complicacy of the present invention and quality and will further improve with the progress of semiconductor technology.
Also note that many IC designs use microprocessor chip or dsp chip at present.But DSP only can be required for the DSP function (as limited pulse response analysis or speech coding) of limited quantity, or is used for the quick DMA architecture of IC.Here the present invention who is announced can support many DSP command functions, and its local fast ram system provides the immediate access to data.By the method that will be published in this be applied to the CPU of IC and DSP function the two, can save considerable cost.
In addition, please note foregoing method (and corresponding computer programs) production technology easily here relatively simply comprehensively to be adapted to again upgrade, 0.18 or 0.1 micron technology for example---but not when using " hard " original microtechnology system, in order to adapt to the processing that this class technology will adopt tediously long costliness usually.
Referring now to Figure 10,, an embodiment of the computing equipment of tearing/pick up signal, integrated logic that can correspondingly be published in this is described.This computing equipment 1000 comprises a motherboard 1001, and this motherboard has a central processing unit (CPU) 1002, random access memory (RAM) 1004, and Memory Controller 1005.A memory device 1006 (as hard disk drive or CD-ROM) also is provided, input equipment 1007 (as keyboard or mouse), with display device 1008 (as CRT, plasma or TFT display), and necessary bus is to support the operation of main frame and peripherals parts.The form that aforesaid VHDL describes and comprehensive engine is expressed formula with a computer program object code is stored in RAM 1004 and/or memory device 1006, and to be used by CPU 1002 during design synthesis, the latter is known by the people in computing technique.User's (not shown) is at system's run duration, by by procedure display and input equipment 1007, the project organization standard is imported into synthesizer and the integrated logic design.Be stored in the memory device 1006 so that retrieval later on by the comprehensive design of the process that program generated, be shown in graphic display device 1008, or output to an external unit via a serial or parallel interface 1012, as printer, data storage device, other peripherals---if necessary.
Be applied to novel characteristics on each embodiment though above detailed description has shown, described and pointed out the present invention, yet will be appreciated that those skilled in the art can make various omissions, replacement or change and not depart from the present invention the equipment of being explained or the form and the details of process.This description never mean the restriction and only should be with its explanation as General Principle of the present invention.Scope of the present invention should be determined with reference to claims.Appendix I is used to streamline to tear and generates the VHDL of integrated logic
library ieee;use ieee.std_logic_1164.all;entity v007a isport(ck:in std_ulogic; clr:in std_ulogic; ien2:in std_ulogic; ien3:in std_ulogic; ip2iv:in std_ulogic; p3iv:out std_ulogic);end v007a;architecture synthesis of v007a is signal n_p3iv:std_ulogic; signal ip3iv:std_ulogic;begin n_p3iv<=ip3iv WHEN ien3=‘0’ ELSE ‘0’WHEN ien2=‘0’AND ien3=‘1’ ELSE ip2iv;p3ivreg:PROCESS(ck,clr)BEGIN IF clr=‘1’THEN ip3iv<=‘0’; ELSIF(ck‘EVENT AND ck=‘1’)THEN ip3iv<=n_p3iv; END IF;END PROCESS; p3iv<=ip3iv;end synthesis;
Appendix II is used to streamline to pick up and generates the VHDL of integrated logic
library ieee;use ieee.std_logic_1164.all;entity v007b isport(en:in std_ulogic; p2int:in std_ulogic; ien2:in std_ulogic; ip2iv:in std_ulogic; ien1:out std_ulogic);end v007b;architecture synthesis of v007b isbegin ien1<=‘0’WHEN en=‘0’ OR(p2int=‘1’AND ien2=‘0’) OR(ip2iv=‘1’AND ien2=‘0’) ELSE ‘1’;end svnthesis;
Appendix III is used to the comprehensive manuscript of tearing logic and generating the sample synoptic diagram
/* Analyze VHDL */analyze-library user-format vhdl vhdl/v007a.vhdl/* Unconstrained logic */elaborate-library user v007acompilewrite-format db-hierarchy-output db/v007a_uc.dbcreate_schematic-schematic_viewplot-output v007a_uc.psremove_design-all/* Constrained logic */elaborate-library user v007acreate_clock-name"ck"-period 10-waveform{05}ckset_input_delay-clock ck 8 ien3compilewrite-format db-hierarchy-output db/v007a_c.dbcreate_schematic-schematic_viewplot-output v007a_c.ps
Appendix IV is used to the comprehensive manuscript of picking up logic and generating the sample synoptic diagram
/* Analyze VHDL */analyze-library user-format vhdl vhdl/v007b.vhdl/* Unconstrained logic */elaborate-library user v007bcompilewrite-format db-hierarchy-output db/v007b_uc.dbcreate_schematic-schematic_viewplot-output v007b_uc.psremove_design-all/* Constrained logic */elaborate-library user v007bset_max_area 0set_dont_use find(cell,lsi_10k/AO*)compile-map_effort highwrite-format db-hierarchy-output db/v007b_c.dbcreate_schematic-schematic_viewplot-output v007b_c.ps